Skip to main content

Library to help ETL using pyspark

Project description

Sparta

Library to help ETL using Pyspark.

Sparta is a simple library to help you work on ETL builds using PySpark.

Important Sources

Installation

Install the latest version with pip install pysparta

Documentation

Sparta

Modules

Extract

This is a module with functions for extracting and reading data.

Example

from sparta.extract import read_with_schema

schema = 'epidemiological_week LONG, date DATE, order_for_place INT, state STRING, city STRING, city_ibge_code LONG, place_type STRING, last_available_confirmed INT'
path = '/content/sample_data/covid19-e0534be4ad17411e81305aba2d9194d9.csv'
df = read_with_schema(path, schema, {'header': 'true'}, 'csv')

Transformation

This is a module with data transformation functions

Example

from sparta.transformation import drop_duplicates

cols = ['longitude','latitude']
df = drop_duplicates(df, 'population', cols)

Load

This is a module with load and write functions.

Example

from sparta.load import create_hive_table

create_hive_table(df, "table_name", 5, "col1", "col2", "col3")

Others

This is a module with several functions that can help in ETL work.

Example

from sparta.secret import get_secret_aws

get_secret_aws('Nome_Secret', 'sa-east-1')

Supported PySpark / Python versions

Sparta currently supports PySpark 3.0+ and Python 3.7+.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysparta-0.4.2.tar.gz (14.2 kB view hashes)

Uploaded Source

Built Distribution

pysparta-0.4.2-py3-none-any.whl (16.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page