Text Analysis Using LLM Model in KNIME

Hello KNIME Community,

I hope this message finds you well.

I apologize for reaching out again on a similar topic. Previously, I posted a message seeking assistance with text analysis using a Spacy model to detect elements within the text. However, I have encountered some limitations with that approach, particularly regarding the accuracy of identifying redundant elements.

The current solution I am using involves calculating similarity distances, but it requires manual verification of over 5000 lines to determine if they are true duplicates. The challenge lies in the presence of redundant elements that are not stopwords. These elements can either increase or decrease the similarity score, making it difficult to ascertain whether they are true duplicates. As a result, I find myself having to manually verify potentially duplicated lines, which is quite time-consuming.

With a solution that could replicate the intelligent capabilities of OpenAI, it would greatly streamline the process of detecting duplicates more accurately. The texts I am analyzing can be structured differently, with variations in expressions, making it challenging to rely only on similarity distances.

Additionally, I am unable to utilize the OpenAI API as I do not have an account, which limits my ability to leverage its capabilities for more intelligent text analysis.

If anyone has experience or expertise in implementing such a solution within the KNIME platform, I would greatly appreciate any guidance or insights you could provide.

Thank you in advance for your help.

Best regards,

Hey there,

not entirely sure about your use case, but it is possible to use other LLMs within KNIME.

@mlauber71 wrote some articles on that on Medium which may help:

Chat with local Llama 3 Model via Ollama in KNIME Analytics Platform

KNIME, AI Extension and local Large Language Models (LLM)

This obviously requires that you have enough computing / GPU power to run a LLM capable of doing what you need locally. With Ollama you can fairly easily get access to latest models - you can run models with ~7B parameters locally with good performance with around 8GB of VRAM Iā€™d say on Windows.

1 Like