Want to “Vectorize” these titles with Document Vector
I’m Using chatgpt for directions, have installed KNIME textprocessing extension. Also dragged the Document Vector node & connected it with Strings to Document node.
Unable to find these options on right clicking > Transformation: choose TF–IDF
I am afraid you have been a victim of LLM Hallucination in this case!
To get TF-IDF, you first need a Bag of Words node upstream. You can then use the TF and IDF nodes separately afterward, and multiply their results together using a Math Formula node to obtain TF-IDF.
Here’s a sample workflow that might be useful to at least see how TF can follow Bag of Words:
@ScottF gave you the correct approach. Here’s a very simple example. Depending on your text you may want to add a Stop Word Filter. Also as shown below make sure to set the Title Column in the String to Document node to empty string. If you do have a separate title column you can use that, just don’t use the “text” column as both text and title. If you do it will count all your terms twice.
Hi @ScottF and @rfeigel ! thank you so much for your guidance. Unfortunately I haven’t been able to apply the same, I realised I lack the technical know-how of concepts and a strong grasp on KNIME as a tool to actually test if it solves my use case. I might probably have to consult a freelance professional to help me understand and build the solution. Thanks again for all your help. It did make sense as I found the Bag of Words node and TF-IDF nodes separately and had been a victim of AI hallucination as was told.
I thought I could provide more details but since I’m working on a Company problem it puts limits on how much details I can divulge. Also its kinda difficult to describe the problem statement as it has multiple layers of granularity and I do not have the technical vocabulary for the same