How to improve interpolation performance?

Hi,
I already asked a similar question some time ago and the hints did not helped me.

So, I now provide a sample workflow with ~ 300.000 datasets:

  • there are some dozen of measid (measurement ids), lets say force, torque, voltage etc.
  • the relevant data is the “SampleValue” that is e.g. 1 N(ewton) or 2 V(olts)
  • and we have this data for several postions “DimensionValue” like 10 meters, 20 meters, …

I want to a do a kind of interpolation where I have the measurments ids, the DimensionValue but no SampleValue. The DimensionValues are mostly equidistant so the interpolation of the “missing value” knode is sufficent. Keep in mind that you cannot to a ungrouped interpolation since missing values can be at the beginning or end of each signal and thus the interpolation would mix values of different measurement ids

The “interpolation” lasts about 10 s on my notebook and on our KNIME server and I would fancy a big improvment by factor 5 using software improvements and no
interpolation_optimisation_01.knwf (3.2 MB)
hardware improvements.

Do you have any ideas?

Hi
Loops are by default slow. For hardware limitations I would suggest garbage collector node but I guess this “destroys” your caching idea.
Best way would be to avoid the loops. Hopefully someone has an idea and can help
br

1 Like

@spider

I hope that I have understood your problem. The easiest way to interpolate values within a group is to use the column expression node with a simple script to linearly interpolate values as follows:

if (!isMissing(column("SampleValue"))) {
    column("SampleValue");
} else {
    lower = column("measid") == column("measid", -1) ? column("SampleValue", -1) : column("SampleValue", +1);
    upper = column("measid") == column("measid", +1) ? column("SampleValue", +1) : column("SampleValue", -1);
    (lower + upper) / 2;
}
  • The script tests if SampleValue is missing. If it is not then it returns the sample value.
  • If the value is missing then it tests if the previous row was in the same measurement set. If so, it stores that value as the lower value, if not it use the next value. The +1, -1 values in the column() expression are indexers to access values in the next/previous row.
  • The process is repeated to test if the following measurement is in the same measurement set.
  • The interpolated value is just the average of the two values.

Note: In the Advanced Tab of the column expression node you will need to set multi-row access to one row before/after the current row.
Note 2: If you expect more than one missing value in a sequence you will need to adapt the script to bridge larger gaps.

Hope this helps
DiaAzul
LinkedIn | Medium | GitHub

4 Likes

Hi DiaAzul,
thanks for your proposal. It is real faster by a factor of 5 or 6. But you are right that it only works if there are no following missing values in the next row(s) and my data can have the following missing values. If it happen a 0 is inserted into Samples and and wrong value is calculated ((0.5+0)/2).

I did a small variation of yours but it only avoid the “0” in the Samples-column and just copies the values in the end but it is also not ver usefull (see below).

if (!isMissing(column("SampleValue"))) {
    column("SampleValue");
} else {
                          lower = column("measid") == column("measid", -1) ? column("SampleValue", -1) : column("SampleValue", +1);
    if(isMissing(lower)) {lower = column("measid") == column("measid", -2) ? column("SampleValue", -2) : column("SampleValue", +2);}
    if(isMissing(lower)) {lower = column("measid") == column("measid", -3) ? column("SampleValue", -3) : column("SampleValue", +3);}
    
                          upper = column("measid") == column("measid", +1) ? column("SampleValue", +1) : column("SampleValue", -1);
    if(isMissing(upper)) {upper = column("measid") == column("measid", +2) ? column("SampleValue", +2) : column("SampleValue", -2);}
    if(isMissing(upper)) {upper = column("measid") == column("measid", +3) ? column("SampleValue", +3) : column("SampleValue", -3);}
    (lower + upper) / 2;
    //average(lower,upper)
}

In the three tables (and the attached workflow) you can see it:

  1. On the left my old/slow idea with the loops using the interpolation of the missing values node that does a usefull interpolation.
  2. DiaAzul’s idea but generating artifacts and 0
  3. My simple modification of DiaAzul’s Idea

Do you have any other ideas?

interpolation_optimisation_3.knwf (3.1 MB)