Distance between two columns

Hi all,

I'm totally new to KNIME, please excuse me if the question is too silly but I've been unable to come up with a solution for this...

I have a table with two columns, each one containing n-dimensional vectors expressed as a collection of doubles. Is there any node capable to insert a third column into the table, containing the euclidean distance between the two vectors at each row?

Best regards, and thank you for your help,

--

Jorge

 

 

I think your best bet is to use a Java Snippet.  The collections will be treated as arrays:

double[] vec0= ; //Insert your first column reference here
double[] vec1= ; //Insert your second column reference here

double dist=0;
for(int i=0; i<3; i++){
    double delta=vec0[i] - vec1[i];
    dist += delta * delta;
}

c_distance = Math.sqrt(dist); //Add a result column via the dialog, and then add the expression after '=' to whatever the result item added at the end is

You can insert the references to you collection columns by putting the cursor immediately before the ';' on the top two lines and double-clicking the column name in the list of columns on the top left of the dialog.

You need to go to the bottom panel of the dialog to add a result column, which should then add something like 'c_distance =' (depending on what you called the column!) - and then complete that line as shown

Steve

Excellent, got it, many thanks for your time Steve!

Regards,

--

Jorge

I have a related question. I want to calculate the Levenshtein Distance between pairs of strings in two columns. That is, a separate calculation for the pair of values in each row. I can achieve this with the Distance Matrix followed by a messy workaround, but I really want to calculate just the distances for the specific pairs of strings and append the distance in a third column.

I gather that there is no node to do this, and that I probably have to use a Java snippet. Problem is, I don't know how to code in Java. So I'd be very grateful for some guidance about what to enter as the snippet (assuming, of course, that this is the right solution).

By the way, I'm aware of the String Matcher node, but it doesn't do what I want to do.

I've built the "ColumnDistance" node available in the Palladian extension just for that case: Calculating distances between two columns without having to create an entire matrix. As the handling is probably not totally intuitive, here's a simple example on how to use the node:

Legend! Thanks!! This does exactly what I need. Just wish I'd known about it sooner!

I’ve found this approach of using a Java snippet to be very slow and am looking for an alternative. I suspect creating a distance matrix would be faster. Will report back if it works out.

Never mind. Distance matrix doesn’t support collection columns (vectors), so this ends up being slow because splitting the collection column takes forever.

The Vernalis Distance (n-D) node will do this - see

Steve