Unable to find TF-IDF option in the Vector Documentation Node

Hi fellow KNIME’rs!

Issue :

  1. Have product titles (text) in excel in a column
  2. Want to “Vectorize” these titles with Document Vector
  3. I’m Using chatgpt for directions, have installed KNIME textprocessing extension. Also dragged the Document Vector node & connected it with Strings to Document node.
  4. Unable to find these options on right clicking > Transformation: choose TF–IDF

Please do guide me if I’m missing any step :')

In what node do you expect to find the Transformation option?

Hi @jatin_dinesh and welcome to the forum.

I am afraid you have been a victim of LLM Hallucination in this case! :ghost:

To get TF-IDF, you first need a Bag of Words node upstream. You can then use the TF and IDF nodes separately afterward, and multiply their results together using a Math Formula node to obtain TF-IDF.

Here’s a sample workflow that might be useful to at least see how TF can follow Bag of Words:

2 Likes

@ScottF gave you the correct approach. Here’s a very simple example. Depending on your text you may want to add a Stop Word Filter. Also as shown below make sure to set the Title Column in the String to Document node to empty string. If you do have a separate title column you can use that, just don’t use the “text” column as both text and title. If you do it will count all your terms twice.


1 Like

Did my workflow solve your problem? If so, please mark “solved”. If not, please provide more details.

Hi @ScottF and @rfeigel ! thank you so much for your guidance. Unfortunately I haven’t been able to apply the same, I realised I lack the technical know-how of concepts and a strong grasp on KNIME as a tool to actually test if it solves my use case. I might probably have to consult a freelance professional to help me understand and build the solution. Thanks again for all your help. It did make sense as I found the Bag of Words node and TF-IDF nodes separately and had been a victim of AI hallucination as was told.

I thought I could provide more details but since I’m working on a Company problem it puts limits on how much details I can divulge. Also its kinda difficult to describe the problem statement as it has multiple layers of granularity and I do not have the technical vocabulary for the same :sob: