Process string transformation and math formula

Kenyx · December 22, 2021, 7:08am

Hi All,
I am a new knimer also university student from China, after the learning of self-paced online courses, I started a data mining internship. But I met some problems.
Currently, I was working on multiple parameter’s tolerance correlation. The tolerance is name from Tolerance, Tolerance #1, Tolerance #2 and so on. Its string format is like 2.2…4.2 (float number + delimiter… + float number). Then I need to use math formula to process the (practical result - lower bound and upper bound) / (lower bound and upper bound), later may try with the standard deviation.

Problem:

As there are more than 30 tolerance parameters, is there any easy method (like loops or flow variable) to process each Tolerance one by one,?

I try to use String manipulation, Cell splitter, Column splitter and Loops (Column list loop, Group loop, Chunk loop) but I still can not process.

Is there any easy method (like loops or flow variable) to process each Math formula?

Some links may help:

gonhaddock · December 22, 2021, 6:44pm

Hi @Kenyx
It is hard to imagine how it looks like the structure of your data. Many questions arise in my head:

Do you want to process one column at every iteration o a group of columns together?
Are the name of the Columns somehow nested and systematize?

Would it be great if you can provide us with some data (not necessarily the real one) arranged in the way of your source. Some example with your desired output would be great as well.

I have some ideas but not necessarily they meet your needs with this limited information.

BR

Kenyx · December 23, 2021, 1:50am

The structure of data sample screenshot is as the attachment.
One column and a group of columns are both acceptable.
The columns are systematized, each tolerance column is next to the parameter column.

mlauber71 · December 23, 2021, 4:02pm

@Kenyx I think it would help if you could provide us with a sample file and an explanation what you want to have as a result. You can identify blocks in an excel file hand handle them (eg. by giving the blocks IDs) as databases - like in this example:

But you will have to invest some thoughts in the structure and what parts of you data would be dynamic.

But maybe your solution is much simple you might just have to assign rules to a data table.

Kenyx · December 24, 2021, 1:27am

@mlauber71
Yes, as I was continue the previous work of previous intern, he analyzed dataset’s original parameter result by a metanode which consists 20 rule engine, each engine processes parameter with each tolerance by ‘within’ and ‘out of range’. Currently, I want to optimize the workflow and analyze more focusing on the effect of the outlier value of the preceding parameter on the subsequent parameter.
I will try your recommendation workflow today. Thanks.

gonhaddock · December 24, 2021, 2:04pm

Hi @Kenyx
I am getting a closer picture on your challenge. @mlauber71 is giving you some tips for gathering and cleaning up your data.
I’m sharing a workflow for prepared for a different post on past October. I think that it can be useful for your challenge as it covers some of the bullets that you mention:

Iterate over groups of data
Individual math or statistical analysis (bounds and binning) by groups.

BR

Kenyx · December 24, 2021, 3:35pm

Thank you @gonhaddock
Definitely iteration and group process will be best solution of large amount of parameter, but currently I haven’t totally understand flow variable, different kinds of loops usage and application yet. I will try out during weekend.
Merry Christmas~
Best regard,
Ken

Geo · December 25, 2021, 10:19am

you should unpivot the data, then process your variable(s) in long format, and finally pivot to get it back to wide format

Kenyx · December 25, 2021, 12:14pm

@Geo
Agree, I had the same idea at first too, change the wide format into long format. But the flow variable, loops and iteration trapped me, I have no idea about those configs and formation set up.
Otherwise, there will be one rule for one parameter, that will be same huge workload.

system · January 6, 2022, 1:50am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.