Skip to main content

Supplements to the python SPark ETL libRary (SPETLR) for Databricks.

Project description

spetlr-tools

Table of Contents

Description

SPETLR-tools is a library that provides a set of tools for working with Databricks Lakehouses. These tools include test fixtures and development utilities that are not part of the runtime tools in SPETLR.

Visit the official SPETLR webpage: https://spetlr.com/

Purpose of SPETLR-tools

SPETLR-tools is designed to support SPETLR in various scenarios, including:

  • Test tools in pytest:
    • Examples: Dataframe validation checks, Data format checking, ...
  • Helpers for investigating data:
    • Examples: Extract schema from binary encoded columns, Get the difference between two dataframes , ...
  • SPETLR-tools CLI:
    • Examples: Submit pytests to Databricks cluster, Automated Azure Token extraction, ...

SPETLR-tools vs. SPETLR

  • SPETLR-tools: Tested in a Python interpreter and per january 2024 also integration tested using on-cluster job tests.
  • SPETLR-tools: Github workflow have an very simple Azure Deployment
  • SPETLR: Fully unit and integration tested - library ready for production use
  • SPETLR-tools: Supports deployment and testing
    • Use only in test_requirements.txt

Installation

Install SPETLR from PyPI:

PyPI version PyPI

pip install spetlr-tools

Development Notes

To prepare for development, please install following additional requirements:

  • Java 8
  • pip install -r test_requirements.txt

Then install the package locally:

python setup.py develop

Testing

Local tests

After installing the dev-requirements, execute tests by running:

pytest tests

These tests are located in the ./tests/unit folder and only require a Python interpreter. Pull requests will not be accepted if these tests do not pass. If you add new features, please include corresponding tests.

CLI and Cluster tests

During the pre-integration workflow (.gitub/workflows/pre-integration.yml) spetlr-tool supported CLI are (should) be tested.

General Project Info

Github top language Github stars Github forks Github size Issues Open PyPI spetlr badge

Contributing

Feel free to contribute to SPETLR-tools. Any contributions are appreciated - not only new features, but also if you find a way to improve SPETLR-tools.

If you have a suggestion that can enhance SPETLR-tools, please fork the repository and create a pull request. Alternatively, you can open an issue with the "enhancement" tag.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/NewSPETLRToolsFeature)
  3. Commit your Changes (git commit -m 'Add some SEPTLRToolsFeature')
  4. Push to the Branch (git push origin feature/NewSPETLRToolsFeature)
  5. Open a Pull Request

Build Status

Post-Integration

Releases

Releases to PyPI is an Github Action which needs to be manually triggered.

Release PyPI spetlr badge

Contact

For any inquiries, please use the SPETLR Discord Server.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spetlr-tools-0.1.65.tar.gz (33.3 kB view hashes)

Uploaded Source

Built Distribution

spetlr_tools-0.1.65-py3-none-any.whl (45.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page