Text Processing: get initial strings after distance calculation

Hi,

I have a lot of strings that identical, but differ by words. For example:
Ca Mg asparaginat
Ca asparaginat + Mg asparaginat
I created flow with Text Processing nodes, where cleaned strings from punctuation, short words and calculate cosine distance. As a result I have two columns with documents or row id’s and distance. I need in table with pair of the initial strings and distance between them. But I found, that row ID’s becomes new after Document Vector.
I attached small workflow with example. In my case flow more complex (I clean strings before bag of words creation), but structure is the same.
Please, could you help me create final table with initial strings and distance?
Thank you.
Sentence row ID.knwf (94,4 КБ)

Hi Max,

I applied the RowID and Joiner nodes to preserve the original IDs (old row id columns) across the workflow. Would you mind checking if everything looks correct in the below updated flow?

Best,

Alpay

Sentence row ID.AZ.knwf (30.8 KB)

Hi, Alpay,

Thank you for the help.
In fact, I need table from the last join with correct strings from beginning.
In my case distance between:
Ca Mg asparaginat
Ca asparaginat Mg asparaginat
must be equal to 0 and I use this as a test case. Now I see that joins still work with issues.
Unfortunately, I couldn’t use strings after preprocess for joins, because I will have potentially identical strings in the different rows. So, I couldn’t use this solution.

Hi @Max

I made 2 changes to your workflow Sentence row ID - HS.knwf (116.5 KB)
.

Added a Document Data Extractor node, where the document is converted back to a text string, and
Used that Text-string (as the Name-column) in the Distance Matrix Pair Extractor.

That gives this result:
afbeelding
gr. Hans

4 Likes

Hi @Max has Hans’ answer solved your problem?

1 Like