Best 3 training methods to test

Hello, i am competely new at Knime. What would be the best 3 training method to test such a data like that? Thank you very much.

  1. unit number

  2. time, in cycles (EXPECTED VARIABLE)

  3. operational setting 1

  4. operational setting 2

  5. operational setting 3

  6. sensor measurement 1

  7. sensor measurement 2

  1. sensor measurement 26

Hi @Onuran welcome to the forum. Have you checked the AutoML components? AutoML (Regression) – KNIME Community Hub

1 Like

@Onuran in addition to what @iperez already suggested I have an overview about auto-machine-learning models

But maybe you start with some simple regression techniques like Linear Regression Learner – KNIME Community Hub

If you want to test more I have an updated collection of regression (numeric target) models with a comparison of their quality. That might give you an idea where to look further:

Maybe you could provide us with sample data without spelling any secrets. You might want to remove the unit number from the training data if it is not for additional characteristics that might help the model without leading to something like fingerprinting or simply learning which unites used what amount of time.

1 Like

@Onuran what you can do is import the data and start exploring. There are two useful Python packages “pandas-profiling” and “sweetviz” that you could use to start exploring.

The “times_in_cycles” has a maximum of 367 which sounds like maybe days in a year but might be something different. The numbers have a strong leaning to the ‘left’ so only few cycles reach larger numbers. There are various correlations between the measurements that might have to be explored, also what it is about the operational settings and what their relationship is with the times.

Some measurements only have a few values so they are considered categorical.

The Pandas Profiler offers further insights, like that the sensor data (and settings) seem to be highly correlated (which might be expected if this data is indeed sensor data).

The code in Python would look like this

import pandas as pd 
import numpy as np 
import pyarrow.parquet as pq

import pandas_profiling
import sweetviz as sv

# the data path
v_parquet_path = "sensor_data.parquet"

df    = pq.read_table(v_parquet_path).to_pandas()

# Generate a report
report = pandas_profiling.ProfileReport(df)

# use Sweetviz
report_sv = sv.analyze([df, "Sensor_Data_report"], target_feat="times_in_cycles")

You will find the imported data here. In the “/data/” folder of the workflow you find the reports.

If this is some sort of assignment these remarks might help you on the way. You might want to clarify what the task is (is it the exploration) and what the outcome is. To know what you can do and what methods to use might very much depend on your data.


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.