Quick question about finding synonyms

I have a database of about 1000 words. I want to be able to enter a target word and return the five words in my database which are closest in meaning.

There seem to be a number of ways to do that, but I’m looking for the simplest. Any recommendations will be welcomed.

Thanks, Richard

Hello @RIchardC

To be honest this is not a trivial problem, that can be solved in many ways.

The first one is to get a dictionary of synonyms and then apply join by search word to get the synonym(s) from the dictionary.

Another approach you can do is to apply word embeddings to your corpus. To do that you can either use Redfield NLP nodes or Redfield BERT nodes. The first option is a bit easier, since you can get the model out of the box for several languages, then you need to use Vectorizer node to convert words into vectors. Then you may use Cell Splitter node to unwind collections into separate columns. After that you can use Similarity Search node to look for the neighbors (synonyms) and you can use either Eucledian or Cosine distances as measure of closeness.

And finally you can use LLM (ChatGPT in particular). Knime has the nodes to both work with local LLM or to connect to ChatGPT. In that case you also have two options:

  • get the embeddings from these models and follow the steps above;

  • ask a direct question like “Give me 5 synonyms for word X from the list provided below [insert your list]”.

3 Likes

Thanks. I’ve been trying with ChatGPT but the results are still not great even after tweaking the prompt repeatedly. I’m quite emphatic that the results should only include words from my list, but it breaks that rule regardless. I might simply do a value lookup on the results to eliminate wayward words, or try the synonym route.

Working with ChatGPT can be like asking toddlers if they’ve been eating cookies and watching them spit out cookie crumbs as they say no.

Perhaps you can force it to only give you results from the list. You can try adding something like that to your prompt: Only include words from the provided list.

Even this isn’t working:
From the WordList provided below, select five words … Ensure your selections are strictly made from the provided WordList. Include any direct matches with the individual ReferenceWords only if they appear in the WordList. Double-check and remove any word that does not appear in the WordList.

I prepared a simple workflow for you that show how to use text embeddings (vectors) to do the similarity search in Knime. It uses medium English Spacy model, probably not the best one, but still the results are tolerable. Perhaps you can try using large model, or use LLM (ChatGPT) embeddings to do the same.

I hope this helps.

https://hub.knime.com/-/spaces/-/~CsZO_mD35ae58Dus/current-state/

2 Likes

That is awesome! Thank you so much for going above and beyond. I’m sure I will learn a lot from this. Cheers.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.