Steve L. Nyemba aa926f77a3 Merge pull request #20 from lnyemba/v2.2.0 | 3 months ago | |
---|---|---|
bin | 6 months ago | |
info | 3 months ago | |
notebooks | 3 months ago | |
transport | 3 months ago | |
.gitignore | 1 year ago | |
README.md | 5 months ago | |
requirements.txt | 7 years ago | |
setup.py | 5 months ago |
This project implements an abstraction of objects that can have access to a variety of data stores, implementing read/write with a simple and expressive interface. This abstraction works with NoSQL, SQL and Cloud data stores and leverages pandas.
Mostly data scientists that don't really care about the underlying database and would like a simple and consistent way to read/write and move data are well served. Additionally we implemented lightweight Extract Transform Loading API and command line (CLI) tool. Finally it is possible to add pre/post processing pipeline functions to read/write
Within the virtual environment perform the following :
pip install git+https://github.com/lnyemba/data-transport.git
- read/write from over a dozen databases
- run ETL jobs seamlessly
- scales and integrates into shared environments like apache zeppelin; jupyterhub; SageMaker; ...
Unlike older versions 2.0 and under, we focus on collaborative environments like jupyter-x servers; apache zeppelin:
1. Simpler syntax to create reader or writer
2. auth-file registry that can be referenced using a label
3. duckdb support
We have available notebooks with sample code to read/write against mongodb, couchdb, Netezza, PostgreSQL, Google Bigquery, Databricks, Microsoft SQL Server, MySQL ... Visit data-transport homepage