Custom distance function: how can I make a distance matrix column?

Hello,

I am currently trying to develop a node calculating a distance between columns (not rows) and I'd like to ouput a new table with the column names and a distance matrix column (similar to the Distance Matrix Calculate Node's output table).

Is there a way to do so? I wasn't able to find the proper column type in the sources (only StringCell, IntCell, ListCell, etc.)...
Thanks in advance!

EDIT: someone posted a similar question here in 2013 but did note receive an answer :/

Best regards,

Lorie

As you already mentioned, KNIME's distance matrices store distances between rows and not columns. Therefore you cannot use the distance matrix column type. And I don't think this is necessary in your case, because the distance between two column is a single value that can simply be stored in an additional numeric column. Why do you want to have a distance matrix?

Thank you for your answer!

I'd like to use a distance matrix to store the distance between a given column and all the other columns, in order to test various clustering nodes that take such a matrix as an input, and to calculate silhouette coefficients (with an other custom node of mine).

My columns represent time series data and the distance I use is based on a Dynamic Time Warping algorithm. I could just transpose the datatable, but I find it less robust and easy-to-use when the time series vary in length: the columns of the transposed table would make no sense because they would not be automatically aligned (or would they?)

For readability, I'd prefer to output something like this:
Column Name | Distance Matrix
Series1            | 0 [0.0]
Series2            | 1 [12.3, 0.0]
Series3            | 3 [16.5, 5.7, 0.0]

For now, I only achieved to do this:
Column Name | Custom Matrix
Series1            | [0]
Series2            | [12, 0]
Series3            | [16, 5, 0]
with column specs saying "Collection (Collection of: Number (double precision))"

Do you think it's feasible?

Thanks again,

Lorie

 

As I said, the distance matrix type is for comparing rows and not columns. It will be of no use to any distance matrix nodes if you store column distances in it, because all nodes assume the values to be row distances. However, this will only work well if you don't have too many rows in the original table (<<10000).

What you did is store the column distances as collections of doubles. The distance matrix type is essentially the same however, there is additional meaning attached to the type (see above). If you write additional nodes that process your collections and they assume that these are column distances then you can proceed this way.

I understand that cannot output both the original data and the distance matrix column, and this is why I'd like to output only the column names/headers (as a single column, one "column name" by row) and the distance matrix. (I can output the original data with another port). This way, the values are presented as row distances.

(I'm very sorry if this was not clear enough in my last intervention, as I also talked about the length of the times series...)

Would I be able to do that?
If not, I have still a few things I can try (just working on transposed tables and hoping for the best, or writing the distance matrix to a file and then reading it with a Distance Reader Node...).

Thanks a lot for your time!

Lorie

 

 

Now I seem to get it. Basically you create a completely new table where each row corresponds to one column from the original table and the distance matrix contains the distances between the rows/original columns. Yes, this sounds reasonable. So what was your problem again? The column must be of type DistanceVectorDataCellFactory.TYPE and you should use the static methods in that class for creating the cells. This should do it.

Hello again!

That's exactly it! Unfortunately I can't seem to find this class in the sources nor in the online documentation. I have also tried browsing the svn repository.

Do I miss a special distance package? I can't find "org.knime.distmatrix" either...
(I'm using the free SDK knime 3.2 on Linux.)

Many thanks,

Lorie

(attachment: a temporary workaround :))

You need to add the "KNIME Distance Matrix" extension to your target platform and then add a dependency to org.knime.distmatrix.

Hello again!

I think that's what I did: I installed the extensions "KNIME Distance Matrix" and "KNIME Distance Matrix sources" with the "install new software" menu. Then I added the corresponding .jar to my project's buildpath. Now the "import org.knime.distmatrix" works, unfortunately it does not recognize this class: "DistanceVectorDataCellFactory".

I am missing something important?

Sorry for all the trouble, and thanks again,

Lorie

You don't need to (and actually must not) add any Jars to the build path. You must define a dependency to org.knime.distmatrix via the MANIFEST.MF.

Thanks a lot! It works fine now :)

Cheers

Hello,
I am creating a node that creates a new distance matrix column, but I have a problem when adding the rows to the output table, I get this error "The constructor DistanceVectorDataCell (int, boolean, double , int) is not visible, "and that’s what this default method has, could someone help me with this?
regards
Laura

You need to use the cell factory class DistanceVectorDataCellFactory - it will either have some sort of static method for creating the DataCell, or a constructor and then an instance method for creating the cell - or possibly both!

Steve

Hi Steve,
I use the cell factory class DistanceVectorCellFactory when I’m going to create the new output column, like this: allColSpecs [num-1] = new DataColumnSpecCreator (“Distance Matrix”, DistanceVectorDataCellFactory.TYPE) .createSpec ();
But when I’m going to put the output data in that new column, I do not recognize the DistanceVectorCellFactory only recognizes DistanceVectorCell with its parameters, and it tells me that I can not because it is not visible, when I go to the .class the constructor is private, so as I do to be able to use the DistanceVectorDataCell constructor (int, boolean, double, int)? I just did not understand his explanation very well. Thank you
Regards,
Laura

OK, DistanceVectorDataCellFactory must be available for you to get access to DistanceVectorDataCellFactory.TYPE to make your output. The code you need is:

//This is an example - you need your distances in a double[] (not Double[]!)
double[] dists=new double[]{3.2, 0.73, 8.7}; 
DataCell distCell = DoubleVectorCellFactory.createCell(dists);

Hope that helps?

Steve

1 Like

Hi Steve,
Thanks for your answer, partly it helps me because I do not have any error in the code, but when I execute, this is what comes up: Execute failed: Runtime class of object “” 3.2 “,” 0.73 “,” 8.7 " “(index 7) in row” Row 0 "is Double Vector (Collection of: Number (double)) and it does not comply with its supposed superclass Distance vector (Collection of: Number (double)), because when I create the column I think of it as DistanceVectorDataCellFactory .Type, and that comes out, instead if I create it as DoubleVectorCellFactory.Type, if it works correctly, but I need it from the other type. Could you help me with this?
Regards,
Laura

Sorry @Laura_Marzo - my mistake - that should have been either:

DataCell distCell = DistanceVectorDataCellFactory.createCell(dists, signature);

or:

DataCell distCell = DistanceVectorDataCellFactory.createCell(identifier, dists, signature);

It is not clear what identifier and signature represent (both are ints) as there doesnt look to be an accessible javadoc for this. I assume one of them is the index you see in the data cell view in the KNIME table. I’m afraid you will probably have to do some experimenting to see what happens!

Steve

1 Like

Hello Steve,
Thanks for your answer, it works for me in part, the column that creates me new is the one I wanted and it is the same as the one created by the Distance Matrix Calculate node, but when one my node with another node in KNIME (the MDS DistMatrix to give an example) tells me that my table does not contain a distance matrix column, but when I save and close the data flow and reopen it, then if it recognizes it, why can this be?
Regards,
Laura

Hello,
I am creating a node that shows me the matrix of distance between a column of geometry, and I have been reading what has been put in this forum, I saw this comment made in 2016: "You do not need to (and actually must not ) add any Jars to the build path You must define a dependency to org.knime.distmatrix via the MANIFEST.MF ".
I added the corresponding .jar (org.knime.dismatrix) to my project’s buildpath but I did not create any dependency on the MANIFEST.MF, and the node works for me, but I can not save the workflow because it shows me this error (see attached) and also the other nodes do not recognize the matrix column, besides having created it as DistanceVectorDataCellFactory.TYPE. Will this have something to do with it? And if so, how could the dependency create MANIFEST.MF?
Regards,
Laura

attached: error

Hi Steve,
I have a problem saving the workflow, because it tells me that you can not cast DistanceVectorDataCell to DistanceVectorDataCell, but I do not know where it is that it performs this casting, because the node runs well, the problem is at the time of save. Might you help me?
regards
Laura