Analysing multiple time series sets

j_eymers · December 9, 2021, 10:12am

Hello,
im an absolute rookie writing my thesis about the AI-based prediction of air filter failure.
I am creating a batch of time series data with simulink. Currently each simulation run has 5 columns of variables, with the 5th being failure probability.

I want to train a model with the runs 1-40, and validate it with runs 41-50.
Each run is represented in a seperate sheet in a xlsx-file.
Can someone push me in the right direction on how to handle the multiple time series, to do that?

Edit:
The data is cleaned, no missing values etc… I think i can figure out how to train a model, what im missing is how to handle the batch processing / extraction of multiple sheets.

jorgemartcaam · December 9, 2021, 10:48am

Hi! And welcome

Maybe this post can help you.

Best of luck!

j_eymers · December 9, 2021, 11:02am

Hi,
thank you very much for the prompt help!
So if I understand correctly, your strategy would be to basically append the runs after each other into one data table each for training/valdiation?

jorgemartcaam · December 9, 2021, 11:09am

Hi,

The idea behind the post I linked is that you can choose what sheets to import from the .xlsx. That is what you needed, if I understood correctly, right? The looping appends the iteration, which is a plus if you want to know which series of data you are manipulating.
To train/validate a model I would use the Partitioning node but I am guessing that depends on your data and what you need. You have lots of examples in the forum and the Knime Hub if you want to train a model.

Best regards.

j_eymers · December 9, 2021, 11:19am

Hi,

yes thats right.
The seperate time series have different durations, because the time series/simulation stops at filter failure. The predicted column is the equivalent of % failure probabilty.
I think i will manipulate the column to max at 5% and then normalize it to train a FFN. The model does not have to be super robust, its just for proof of concept.

I think i will not use the partition node, but rather create two different data sets from the runs i have with the loop you provided.

Best regards and thank you very much, you made my day a lot easier!

system · December 16, 2021, 11:20am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.