I am very new to Knime. Currently, I am exploring OPTICS algorithm for clustering in Knime. I have with me a CSV file with column ‘Word’ containing a phrase and 1024 columns followed by called embedding_0…embedding_1023, which contain the BERT embedding of this particular phrase.
I have around ~4900 such rows with me for which I would like to run OPTICS algorithm on.
For running OPTICS I am using combination of two nodes, OPTICS cluster compute and OPTICS cluster assigner.
However, whenever I configure OPTICS cluster compute to run with distance selection cosine, the execution fails with “Execute failed: Encountered duplicate row ID “Row N””, where N is the number where it failed. When running with another distance selection such as levenshtein it works well.
I tried looking at the data itself but there are no duplicates in it. DBscan on the same data was able to produce an output.
This is how my workflow looks like: