Hello, guys!
01_Stocks1 (Working one) 4.3.knwf (859.4 KB)
The ideal workflow should predict the High, Low, and Close prices of stocks, futures, or crypto. Based on this, it should determine the best entry point for a trade and output results including leverage, entry price, stop-loss, and take-profit.
I ran a test check, and it turned out that the deviation from the real price was up to 10%.
I performed the check as follows: I downloaded data for a year, cut off the last month, added the next day (which I planned to predict), and launched a loop. With each iteration, the loop shifted forward by one day, so I obtained daily predictions for a month. However, they did not match the actual values at all!!!
If anyone has good ideas, that would be great. If we can put together a working process, we might even make some money from it!
How the process works
Initially, I took an existing workflow. The first part remained almost unchanged. All modifications were made in the Prediction Cycle.
I went through all the features to avoid data leakage. Now, all formulas use values with a lag.
Next comes feature selection for prediction. I use a genetic algorithm because it’s faster and provides one of the best results.
Then the prediction cycles begin:
- For High and Close, I use three primary models: Linear Regression, Random Forest, and GBM.
- The predictions are then combined and passed through two meta-models: Linear Regression and GBM.
- After that, they are combined again to find the best mix of predictions that minimizes deviation from the actual price.
- For Low, the process is the same, except that XGBoost replaces Random Forest.
After that, all predictions are combined, and a loop selects the best coefficients for Stop-Loss and Take-Profit. This Metanode was created just to check functionality. It will need to be refined into a more suitable approach.
The Problem
If I simply run historical data, the models perform perfectly. The deviation is minimal, the predictions are accurate—everything works well.
But as soon as I test predictions for the next day, everything falls apart, and the forecasts become completely inaccurate.
I don’t understand where there might be data leakage or where this issue is coming from. But I want to fix it to get better predictions.
I am also considering switching from crypto to something more technical because there are too many unpredictable movements in crypto markets.
Important Notes
On the new model, I created time-based splitting using a loop, because I suspect that X-Partitioner leaks future values to the model, making it easier to train.
Now, the first iteration selects the first 100 rows (50 for training, 50 for validation).
- In the next iteration, it selects the first 150 rows (100 for training, 50 for validation).
- This continues until all rows in the table have been processed.
This ensures that there is no data leakage, as the predictions only use past values and not future ones. However, the results still have not improved.
If anyone has suggestions or ideas, let’s collaborate and try to build a working tool to help us generate profits!