Skip to main content

Python SDK for Tesseract Models

Project description

Tesseract Python SDK

Build codecov This is an SDK for developing Tesseract models in Python.

Test a Tesseract Image

To run tests that will ensure that your model container will run correctly in tesseract you can use the validation cli. To run with a basic setup you just need to run:

tesseract-sdk validate <image-name>:<tag>

This will look at the model info in your model code and generate a random array for input. It will then spin up the container and attempt to send data into the model. If data is returned from the model then it will validate the the shape and dtypes are correct. Thats all you need for simple models.

For more complicated models or models where you would like to test with real data you need to create a configuration file for testing. The configuration file just lets the validator know about things like where the local data to be loaded is, and which bands should be included. The resulting arrays or features will be written out to PNG and geojson respectively. An example config is shown below:

{
    "image": "my-tesseract-model:v0.0.1",
    "test_data": {
        "job_id": "my-job-id",
        "project": "my-project",
    },
    "asset_bands": [
        {
            "asset_name": "modis",
            "bands": [0,1,2,5],
        },
        {
            "asset_name": "sentinel",
            "bands": [2,4]
        }
    ],
    "args": {
        "model-arg-1": "value1",
        "model-arg-2": "value2"
    },
    "output_asset_bands": [
        {
            "asset_name": "model_output_1",
            "bands": [0, 1, 2]
        },
        {
            "asset_name": "model_output_2",
            "bands": [0]
        }
    ],
    "save_output": false
}

image: The docker image to validate.

test_data: This can either be a dictionary with a job id and project to get data directly from a Tesseract job, or path to a zarr file. If reading directly from a Tesseract job, the dict should have only the keys job_id and project. To read from a zarr file directly, pass in the path or URL as a string. This can be a local file or a remote zarr file (for example in google storage) so long as the credentials are available. Optional: if not provided, random data will be created.

asset_bands: A list of asset bands like the inputs to a Tesseract Job. Each asset_band in the list should contain the keys "asset_name" and "bands". The "asset_name" must exist in the input zarr file or Tesseract job and "bands" should be a list of integers corresponding to bands in the asset. Optional: If not provided, will use all bands from all input datsets.

args: Any arguments that need to be passed to the model inference function. Optional: If not provided no args are supplied to the model.

output_asset_bands: For each model output, the bands that should be used to output an image. This should be either 1 or 3 bands. For each item in the list, a PNG image will be created so that the model outputs can be quickly inspected to ensure that the model looks like it is working correctly. Unlike asset_bands, you can have multiple outputs here with the same name. This can be useful if you want to output several images for one asset i.e. 3 images with one band each instead of one 3 band image. Optional: If not provided, no output images will be generated.

save_output: If True, will write the model output as bytes that can be read in with numpy. Files will be named by the name of the output with a '.dat' extension. Optional: Defaults to false.

To run the validator with a configuration file, simply pass it to the utility:

tesseract-sdk validate -f valid_config.json

Contributing

To contribute to the project you must first install the package using the dev option.

pip install .[dev]

IMPORTANT: Before creating a PR make sure to update the protobuf files. The PR checks will fail if you do not. To update the protobuf files run the following commands:

make protoc-python
make copy-protos
make check-protos

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tesseract_sdk-0.8.4-py3-none-any.whl (31.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page