I have an iterative workflow where a distance matrix is calculated using the Distance Matrix Calculate node. At each new iteration there are only few new data points, because most of them came from older iterations. Is there a way to avoid re-calculating all the distances of the old points at each iteration?
I can save the distance matrix but then I don’t know how to extend it with the new data points, calculating only the distances between the new data points and the old ones (and of course between the new data points themselves). I think to a sort of “distance matrix extension/concatenation”. Is this possible?
Thanks in advance for any hint.
Would the “Similarity Search Node” be a solution for you? It has two inputs, so you can chose what rows again what new rows to calculate the distance at every iteration and then concatenate the results to the already calculated matrix. The only inconvenience is that the output is not in a condensed format as it is for the “Distance Matrix Calculate Node”. Please let me know if you would like me to post an example.
@aworker, first of all thanks for your help! Yes, if you could post an example workflow, that would be very helpful.
My objective would be at each iteration to calculate only the distances between the new data points and the old ones (and of course between the new data points themselves).
Finally keeping the distance matrix in the “condensed format” (KNIME distance vectory data type) would be optimal.
Thanks for your warm thanks! It took a bit longer than I thought to put together a self contained example but here it is. I’m using a recursive loop that is adding a new row for which to calculate the distance against all the former rows and concatenate them to the former distance matrix. It could be many rows added at the same time and not necessarily the same number every time. Most probably you will need to adapt the solution to your needs. Let me know if you have any questions about it. Hope you like it
20200428_Pikairos Is it possible to extend a distance matrix.knwf (143.8 KB)
Hi Ael (@aworker),
Thank you very much for your example. I appreciated it. I think your usage of similarity search with a recursive loop strategy is good and would avoid to calculate the distances for the old points. Nevertheless I need the data in the “Distance matrix” format (KNIME distance vectory data type) and I found no way to pass from the extended (full matrix) you can calculate with the similarity search node to this data type. The opposite (meaning passing from Distance matrix format to full matrix) can be done with the node Distance Matrix Pair Extractor but the only way I found to obtain a Distance Matrix column is using the node Distance Matrix Calculate and this force to calculate the distances every time from the scratch. I think this may be a lack of KNIME.
Does anybody know a way to convert a full distance matrix to KNIME distance matrix data type?
I see the problem. I’ll complete the workflow with the solution in a few minutes. I’m working right now on it.
It took a bit longer
I have added a piece of workflow that shows how to convert a “Distance Matrix” organized as table into a “Distance Matrix” as required.
It will definitely need to be adapted to your specific needs but the essential should be here.
Hope this is of help
20200428 Pikairos Is it possible to extend a distance matrix V2.knwf (228.5 KB)
Thank you very much for your updated example workflow and the detailed comments/explanations in it. Without your help I wouldn’t have realized that this was possible without a specific node to do so. That’s great.
Thanks for your kind answer and glad to read you find it hepful. If so, would you mind please validate the answer as the solution? Good luck and plenty of success with your ongoing project !
All the best,
Hi Ael (@aworker),
Just 3 more questions regarding your workflow:
- Why the Distance Matrix Reader gives a warning saying “Matrix contains non-zero diagonal elements”?
- In the “type based aggregation” of the last Group By node of the workflow (i.e. GroupBy (6:70)) is set a Sum aggregation on double number data type. Nevertheless this aggregation doesn’t take place as the manual aggregation tab has the priority. Why that is set in that way?
- The row id of the last Group By node of the workflow (i.e. GroupBy (6:70)) (I mean the real table rowID and not the column called rowID) does’t follow a sequential order when more than 10 points are used. Is this a problem? Please see the attached workflow: 20200428 Pikairos Is it possible to extend a distance matrix V3.knwf (1.4 MB)
Thanks in advance
This is because the matrix doesn’t have explicit diagonal of zero values (when it is saved), but the matrix reader copes with it, so I do not think it is an issue so far because the matrix is correctly interpreted.
I believe the problem was with the last GroupBy node configuration. The right one is shown on this snapshot :
The previous configuration was not generic enough.
- This is because of the previous miss-configuration. Now it should be back to normal.
Please find below the amended solution (V4) :
20200430 Pikairos Is it possible to extend a distance matrix V4.knwf (1.3 MB)
Hope this fixes the problem. The solution has not obviously been thoroughly tested in enough situations. Please let me know if you find other exceptions. I’ll try to help too.
Great Ael, thanks for your time!
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.