Stock Prediction Workflow

Hello, guys!
01_Stocks1 (Working one) 4.3.knwf (859.4 KB)

The ideal workflow should predict the High, Low, and Close prices of stocks, futures, or crypto. Based on this, it should determine the best entry point for a trade and output results including leverage, entry price, stop-loss, and take-profit.

I ran a test check, and it turned out that the deviation from the real price was up to 10%.
I performed the check as follows: I downloaded data for a year, cut off the last month, added the next day (which I planned to predict), and launched a loop. With each iteration, the loop shifted forward by one day, so I obtained daily predictions for a month. However, they did not match the actual values at all!!!

If anyone has good ideas, that would be great. If we can put together a working process, we might even make some money from it!

How the process works

Initially, I took an existing workflow. The first part remained almost unchanged. All modifications were made in the Prediction Cycle.
I went through all the features to avoid data leakage. Now, all formulas use values with a lag.

Next comes feature selection for prediction. I use a genetic algorithm because it’s faster and provides one of the best results.

Then the prediction cycles begin:

  • For High and Close, I use three primary models: Linear Regression, Random Forest, and GBM.
  • The predictions are then combined and passed through two meta-models: Linear Regression and GBM.
  • After that, they are combined again to find the best mix of predictions that minimizes deviation from the actual price.
  • For Low, the process is the same, except that XGBoost replaces Random Forest.

After that, all predictions are combined, and a loop selects the best coefficients for Stop-Loss and Take-Profit. This Metanode was created just to check functionality. It will need to be refined into a more suitable approach.

The Problem

If I simply run historical data, the models perform perfectly. The deviation is minimal, the predictions are accurate—everything works well.
But as soon as I test predictions for the next day, everything falls apart, and the forecasts become completely inaccurate.

I don’t understand where there might be data leakage or where this issue is coming from. But I want to fix it to get better predictions.

I am also considering switching from crypto to something more technical because there are too many unpredictable movements in crypto markets.

Important Notes

On the new model, I created time-based splitting using a loop, because I suspect that X-Partitioner leaks future values to the model, making it easier to train.

Now, the first iteration selects the first 100 rows (50 for training, 50 for validation).

  • In the next iteration, it selects the first 150 rows (100 for training, 50 for validation).
  • This continues until all rows in the table have been processed.

This ensures that there is no data leakage, as the predictions only use past values and not future ones. However, the results still have not improved.


If anyone has suggestions or ideas, let’s collaborate and try to build a working tool to help us generate profits! :rocket:

@Artem_Hriharyan

I’m not sure whether I understand your challenge correctly as I did not find your training data.

When you want to predict information from the future it is mandatory to have training data including all parameters which may influence the outcome. As I did not find the input data for training your model it is not clear to me how many of such parameters are included.
In your mentioned situation (using all training data and predict data from within the training data) the model has already the correct answer. So it should of course give you that one.
In the second situation (using only a part of the training data and predict other points outside) the quality of the answer is influenced by the “understanding” of the contribution of parameters to the output. Now it is required that the model really predicts and it seems the model does not have enough parameters to calculate the correct behaviour.

Maybe it’s possible that you share also the training data.

2 Likes

Thanks for the intrest!

01_Stocks1 (Working one) 4.3.knwf (857.9 KB)

I’ve added a table with all the starting data. So now all it’s needed is to open “Prediction cycle” metanode and run it. I have about 200 columns with data, and feature selection loop choses best ones.

Maybe you’ll have any ideas

Sorry for being nasty.

I tried to load the workflow, but I’ve encountered errors as we do not have the same KNIME version. I’m still on 5.4.0 as I do not have the time to update so often.

I checked the workflow for your mentioned table with “all” the starting data.
The one and only table I’ve seen is the list of stock symbols as output of the table creator. This list contains just one row with one cell (Symbols=“ETH/USD”).

Maybe you can point me to the right point? Or simply just upload the table with this data.