Easy-to-use dataset generator for applying machine learning on financial markets
- You can run it fast, and it is easy to use.
- There are no complexities and no database usage in this project. Even dependencies are a few.
- It is easy to modify and customize.
- This project generates practical datasets for data scientists.
- You can read the code for educational purposes.
- Clone the repository.
- Run
pip3 install -r requirements.txt
. - Put your Nasdaq Data Link API key in the
API_KEY
file. - Run
python3 main.py
.
This will generate train set and test set for you.
For the configuration, you can:
- Change
config.py
constants. - Define new indicators in
indicators.py
.
PAIR_NAMES_LIST_WITH_SOURCE
: What's your machine learning model input?TARGET_PAIR_NAME_WITH_SOURCE
: What's your machine learning model output?SMA_LENGTHS_LIST
: Do you want to generate a dataset with some moving averages?APPLY_FLIP_AUGMENTATION
andAPPLY_NOISE_AUGMENTATION
: Using data augmentationsAUGMENTATION_NOISE_INTERVAL
: Set the amount of augmentation noiseTRAIN_DATASET_NEW_SIZE_COEFFICIENT
: How much augmented data do you want?START_TIME
andEND_TIME
: The time interval for the datasetFORECAST_DAYS
: How many days is your target?USE_WMA_FOR_FORECAST_DAYS
: Do you want to use linear weighted moving average for your target?NUMBER_OF_CANDLES
: Number of candles your machine learning model needs as its inputTRAIN_CSV_FILE_PATH
,TEST_CSV_FILE_PATH
, andPREDICT_CSV_FILE_PATH
: Output CSV file pathsTEST_SET_SIZE_RATIO
: Test set size to whole dataset size ratioCSV_DELIMITER
: The delimiter in every generated CSV fileAPI_KEY_FILE_PATH
: Path to the Nasdaq Data Link API key file