Run run_deposit.py to generate a metadata.csv and data.csv file.
Run prune_data.py to generate a data-pruned.csv file.
Run prune_metadata.py to generate a metadata-pruned.csv file.
Run tornadocash/run_exact_match_heuristic.py to generate exact_match_clusters_by_pool.json and exact_match_tx2addr.json_by_pool files.
Run tornadocash/run_gas_price_heuristic.py to generate gas_price_clusters_by_pool.json, gas_price_tx2addr_by_pool.json, gas_price_address_set_by_pool.json, and gas_price_metadata_by_pool.csv files.
Run tornadocash/run_same_num_txs_heuristic.py to generate same_num_txs_clusters_exact.json, same_num_txs_tx2addr_exact.json, same_num_txs_address_set_exact.json, and same_num_txs_metadata_exact.csv files.
Run heuristic_metadata.py to generate metadata-joined.csv.
Run run_nx.py to generate metadata-final.csv. This is the file that will be used to populate the PostgreSQL database.
Run combine_metadata.py to add clusters to metadata-pruned.csv.
Order of Operations for downloading data
Run a command in bq_commands.sh.
Create a table in Google buckets.
Run table2bucket.py to move table to bucket.
Run dl_bucket.py to download data from the bucket. This will exist in a lot of files.
Run sort_big_csv.py to merge files and / or merge sort the transactions by block number. First run with the --sort-only flag to sort each file locally. Then run with --merge-only to run external merge sort.