cic-internal-integration/apps/data-seeding/README.md
2021-06-07 18:02:04 +03:00

10 KiB

DATA GENERATION TOOLS

This folder contains tools to generate and import test data.

OVERVIEW

Three sets of tools are available, sorted by respective subdirectories.

  • eth: Import using sovereign wallets.
  • cic_eth: Import using the cic_eth custodial engine.
  • cic_ussd: Import using the cic_ussd interface (backed by cic_eth)

Each of the modules include two main scripts:

  • import_users.py: Registers all created accounts in the network
  • import_balance.py: Transfer an opening balance using an external keystore wallet

The balance script will sync with the blockchain, processing transactions and triggering actions when it finds. In its current version it does not keep track of any other state, so it will run indefinitly and needs You the Human to decide when it has done what it needs to do.

In addition the following common tools are available:

  • create_import_users.py: User creation script
  • verify.py: Import verification script
  • cic_meta: Metadata imports

REQUIREMENTS

A virtual environment for the python scripts is recommended. We know it works with python 3.8.x. Let us know if you run it successfully with other minor versions.

python3 -m venv .venv
source .venv/bin/activate

Install all requirements from the requirements.txt file:

pip install --extra-index-url https://pip.grassrootseconomics.net:8433 -r requirements.txt

If you are importing metadata, also do ye olde:

npm install

HOW TO USE

Step 1 - Data creation

Before running any of the imports, the user data to import has to be generated and saved to disk.

The script does not need any services to run.

Vanilla version:

python create_import_users.py [--dir <datadir>] <number_of_users>

If you want to use a import_balance.py script to add to the user's balance from an external address, use:

python create_import_users.py --gift-threshold <max_units_to_send> [--dir <datadir>] <number_of_users>

Step 2 - Services

Unless you know what you are doing, start with a clean slate, and execute (in the repository root):

docker-compose down -v

Then go through, in sequence:

Base requirements

If you are importing using eth and not importing metadata, then the only service you need running in the cluster is:

  • eth

In all other cases you will also need:

  • postgres
  • redis

EVM provisions

This step is needed in all cases.

RUN_MASK=1 docker-compose up contract-migration

After this step is run, you can find top-level ethereum addresses (like the cic registry address, which you will need below) in <repository_root>/service-configs/.env

Custodial provisions

This step is only needed if you are importing using cic_eth or cic_ussd

RUN_MASK=2 docker-compose up contract-migration

Custodial services

If importing using cic_eth or cic_ussd also run:

  • cic-eth-tasker
  • cic-eth-dispatcher
  • cic-eth-tracker
  • cic-eth-retrier

If importing using cic_ussd also run:

  • cic-user-tasker
  • cic-user-ussd-server
  • cic-notify-tasker

If metadata is to be imported, also run:

  • cic-meta-server

Step 3 - User imports

If you did not change the docker-compose setup, your eth_provider the you need for the commands below will be http://localhost:63545.

Only run one of the alternatives.

The keystore file used for transferring external opening balances tracker is relative to the directory you found this README in. Of course you can use a different wallet, but then you will have to provide it with tokens yourself (hint: ../reset.sh)

All external balance transactions are saved in raw wire format in <datadir>/txs, with transaction hash as file name.

If the contract migrations have been executed with the default "giftable" token contract, then the token_symbol in the import_balance scripts should be set to GFT.

Alternative 1 - Sovereign wallet import - eth

First, make a note of the block height before running anything.

To import, run to completion:

python eth/import_users.py -v -c config -p <eth_provider> -r <cic_registry_address> -y ../contract-migration/keystore/UTC--2021-01-08T17-18-44.521011372Z--eb3907ecad74a0013c259d5874ae7f22dcbcc95c <datadir>

After the script completes, keystore files for all generated accouts will be found in <datadir>/keystore, all with foo as password (would set it empty, but believe it or not some interfaces out there won't work unless you have one).

Then run:

python eth/import_balance.py -v -c config -r <cic_registry_address> -p <eth_provider> --token-symbol <token_symbol> --offset <block_height_at_start> -y ../keystore/UTC--2021-01-08T17-18-44.521011372Z--eb3907ecad74a0013c259d5874ae7f22dcbcc95c <datadir>

Alternative 2 - Custodial engine import - cic_eth

Run in sequence, in first terminal:

python cic_eth/import_balance.py -v -c config -p <eth_provider> -r <cic_registry_address> --token-symbol <token_symbol> -y ../keystore/UTC--2021-01-08T17-18-44.521011372Z--eb3907ecad74a0013c259d5874ae7f22dcbcc95c --head out

In another terminal:

python cic_eth/import_users.py -v -c config --redis-host-callback <redis_hostname_in_docker> out

The redis_hostname_in_docker value is the hostname required to reach the redis server from within the docker cluster, and should be redis if you left the docker-compose unchanged. The import_users script will receive the address of each newly created custodial account on a redis subscription fed by a callback task in the cic_eth account creation task chain.

Alternative 3 - USSD import - cic_ussd

If you have previously run the cic_ussd import incompletely, it could be a good idea to purge the queue. If you have left docker-compose unchanged, redis_url should be redis://localhost:63379.

celery -A cic_ussd.import_task purge -Q cic-import-ussd --broker redis://localhost:63379

Then, in sequence, run in first terminal:

python cic_eth/import_balance.py -v -c config -p <eth_provider> -r <cic_registry_address> --token-symbol <token_symbol> -y ../keystore/UTC--2021-01-08T17-18-44.521011372Z--eb3907ecad74a0013c259d5874ae7f22dcbcc95c out

In second terminal:

python cic_ussd/import_users.py -v -c config out

Step 4 - Metadata import (optional)

The metadata import scripts can be run at any time after step 1 has been completed.

Importing user metadata

To import the main user metadata structs, run:

node cic_meta/import_meta.js <datadir> <number_of_users>

Monitors a folder for output from the import_users.py script, adding the metadata found to the cic-meta service.

If number of users is omitted the script will run until manually interrupted.

Importing phone pointer

node cic_meta/import_meta_phone.js <datadir> <number_of_users>

If you imported using cic_ussd, the phone pointer is already added and this script will do nothing.

Importing pins and ussd data (optional)

Once the user imports are complete the next step should be importing the user's pins and auxiliary ussd data. This can be done in 3 steps:

In one terminal run:

python create_import_pins.py -c config -v --userdir <path to the users export dir tree> pinsdir <path to pin export dir tree>

This script will recursively walk through all the directories defining user data in the users export directory and generate a csv file containing phone numbers and password hashes generated using fernet in a manner reflecting the nature of said hashes in the old system. This csv file will be stored in the pins export dir defined as the positional argument.

Once the creation of the pins file is complete, proceed to import the pins and ussd data as follows:

  • To import the pins:

python cic_ussd/import_pins.py -c config -v pinsdir <path to pin export dir tree>

  • To import ussd data: python cic_ussd/import_ussd_data.py -c config -v userdir <path to the users export dir tree>

The balance script is a celery task worker, and will not exit by itself in its current version. However, after it's done doing its job, you will find "reached nonce ... exiting" among the last lines of the log.

The connection parameters for the cic-ussd-server is currently hardcoded in the import_users.py script file.

Step 5 - Verify

python verify.py -v -c config -r <cic_registry_address> -p <eth_provider> <datadir>

Included checks:

  • Private key is in cic-eth keystore
  • Address is in accounts index
  • Address has gas balance
  • Address has triggered the token faucet
  • Address has token balance matching the gift threshold
  • Personal metadata can be retrieved and has exact match
  • Phone pointer metadata can be retrieved and matches address
  • USSD menu response is initial state after registration

Checks can be selectively included and excluded. See --help for details.

Will output one line for each check, with name of check and number of errors found per check.

Should exit with code 0 if all input data is found in the respective services.

KNOWN ISSUES

  • If the faucet disbursement is set to a non-zero amount, the balances will be off. The verify script needs to be improved to check the faucet amount.

  • When the account callback in cic_eth fails, the cic_eth/import_users.py script will exit with a cryptic complaint concerning a None value.

  • Sovereign import scripts use the same keystore, and running them simultaneously will mess up the transaction nonce sequence. Better would be to use two different keystore wallets so balance and users scripts can be run simultaneously.

  • pycrypto and pycryptodome have to be installed in that order. If you get errors concerning Crypto.KDF then uninstall both and re-install in that order. Make sure you use the versions listed in requirements.txt. pycryptodome is a legacy dependency and will be removed as soon as possible.

  • Sovereign import script is very slow because it's scrypt'ing keystore files for the accounts that it creates. An improvement would be optional and/or asynchronous keyfile generation.

  • Running the balance script should be optional in all cases, but is currently required in the case of cic_ussd because it is needed to generate the metadata. An improvement would be moving the task to import_users.py, for a different queue than the balance tx handler.

  • cic_ussd imports is poorly implemented, and consumes a lot of resources. Therefore it takes a long time to complete. Reducing the amount of polls for the phone pointer would go a long way to improve it.

  • A strict constraint is maintained insistin the use of postgresql-12.