Spark ETL Python Documentation Status Updates

A Python package that provides helpers for cleaning, deduplication, enrichment, etc. in Spark


  • TODO


In order to be able to develop on this package:

  1. Create a virtual environment
  2. Install pip-tools: pip install pip-tools
  3. Run pip-sync requirements_dev.txt requirements.txt

To update dependencies, add them to (if they are needed to run the package) or Then run pip-compile or pip-compile


This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.