Hello,
im an absolute rookie writing my thesis about the AI-based prediction of air filter failure.
I am creating a batch of time series data with simulink. Currently each simulation run has 5 columns of variables, with the 5th being failure probability.
I want to train a model with the runs 1-40, and validate it with runs 41-50.
Each run is represented in a seperate sheet in a xlsx-file.
Can someone push me in the right direction on how to handle the multiple time series, to do that?
Edit:
The data is cleaned, no missing values etc⊠I think i can figure out how to train a model, what im missing is how to handle the batch processing / extraction of multiple sheets.
Hi,
thank you very much for the prompt help!
So if I understand correctly, your strategy would be to basically append the runs after each other into one data table each for training/valdiation?
The idea behind the post I linked is that you can choose what sheets to import from the .xlsx. That is what you needed, if I understood correctly, right? The looping appends the iteration, which is a plus if you want to know which series of data you are manipulating.
To train/validate a model I would use the Partitioning node but I am guessing that depends on your data and what you need. You have lots of examples in the forum and the Knime Hub if you want to train a model.
yes thats right.
The seperate time series have different durations, because the time series/simulation stops at filter failure. The predicted column is the equivalent of % failure probabilty.
I think i will manipulate the column to max at 5% and then normalize it to train a FFN. The model does not have to be super robust, its just for proof of concept.
I think i will not use the partition node, but rather create two different data sets from the runs i have with the loop you provided.
Best regards and thank you very much, you made my day a lot easier!