Pandas extension that measures privacy risk
![]() |
6 年 前 | |
---|---|---|
notebooks | 6 年 前 | |
src | 6 年 前 | |
README.md | 6 年 前 |
This project is intended to compute an estimated value of risk for a given database.
1. Pull meta data of the database and create a dataset via joins
2. Generate the dataset with random selection of features
3. Compute risk via SQL using group by
The following are the dependencies needed to run the code:
pandas
numpy
pandas-gbq
google-cloud-bigquery
*Generate The merged dataset
python risk.py create --i_dataset <in dataset|schema> --o_dataset <out dataset|schema> --table <name> --path <bigquery-key-file> --key <patient-id-field-name> [--file ]
python risk.py compute --i_dataset --table --path --key