Distance Matrix Calculation fails when all value pairs are missing between two rows

When using the Distance Matrix Calculation node in v4.1.1 with a distance measure that handles missing values by ignoring them, I am still getting errors when two rows have no columns where both have values. The error returned is:

Execute failed: org.knime.distance.DistanceMeasurementException: Only cell pairs with missing values in row[...

Would it be possible to assign a maximum distance between these rows in such cases? Or return a NaN or some other value in the distance matrix that can be handled (replaced) by subsequent nodes?

I realized I can just create my own custom distance function which can perform this behavior. Awesome extensibility! Trying the Java Distance node first.

Still, seems like something that can easily be plugged into the Numeric Distances node as it already has basic missing value handling configuration. Thanks!

Following up again, as right away when digging into the Java Distance node I discover there is no way to use aggregation functions which will dynamically process columns. Instead, the function must explicitly specify the columns being operated on. This doesn’t work in a situation where the number of columns is variable between runs. Seems I am not the first to be frustrated by this (Java Distance node: generalization and iteration over all input columns). There still seem to be additional options. First, a custom function as per https://www.knime.com/wiki/distance-measure-developers-guide (which I am still reading). Second a custom node, perhaps extending the Numeric Distances node.

I modified the org.knime.distance.measure.numerical.lnorm.LNormDistance class such that instead of throwing the exception described above it returns Double.NaN. This seems to be working well with all downstream nodes so far, although NaN will propagate in scoring. I have everything I need to manipulate either the distance output or the downstream metrics, so this works for me.

2 Likes

Hi there @bfrutchey,

glad you made it work in your case. Wondering regarding this dynamic processing in Java Distance node and if you could prepare your expression in a flow variable and then simply use it in node.

Br,
Ivan

1 Like

Ivan, good idea. Probably could make a flow variable with the function needed for the Java Distance node. I have the custom distance extension working, and it was pretty trivial since I just copied your code though, so I will think about this next time. Thanks!

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.