6 年之前 · 6918f80eb4
--- a/README.md
+++ b/README.md
@@ -2,6 +2,31 @@
 
				 
			
 
				 This project is intended to compute an estimated value of risk for a given database.
			
 
				 
			
 
				-    1. Pull meta data of the database and create a dataset via joins
			
 
				+    1. Pull meta data of the database  and create a dataset via joins
			
 
				     2. Generate the dataset with random selection of features
			
 
				-    3. Compute risk via SQL using group by
			
 
				+    3. Compute risk via SQL using group by
			
 
				+## Python environment
			
 
				+
			
 
				+    The following are the dependencies needed to run the code:
			
 
				+
			
 
				+        pandas
			
 
				+        numpy
			
 
				+        pandas-gbq
			
 
				+        google-cloud-bigquery
			
 
				+
			
 
				+        
			
 
				+## Usage
			
 
				+
			
 
				+    *Generate The merged dataset
			
 
				+    
			
 
				+    python risk.py create --i_dataset <in dataset|schema> --o_dataset <out dataset|schema> --table <name> --path <bigquery-key-file>  --key <patient-id-field-name> [--file ]
			
 
				+
			
 
				+    * Cmpute risk
			
 
				+
			
 
				+    python risk.py compute --i_dataset <dataset> --table <name> --path <bigquery-key-file>  --key <patient-id-field-name> 
			
 
				+## Limitations
			
 
				+    - It works against bigquery for now
			
 
				+    @TODO:    
			
 
				+        - Need to write a transport layer (database interface)
			
 
				+        - Support for referential integrity, so one table can be selected and a dataset derived given referential integrity
			
 
				+        - Add support for journalist risk