LLM prompter with local models issues

Hi, I am trying to work with A beginner’s guide to build your own LLM-based solutions | KNIME workflow. I use it as is, but try to change prompts and models. So, what I have.
I downlad several models from GPT4all and have following results:

  • GPT4All Falcon: gpt4all-falcon-newbpe-q4_0.gguf. LLM prompter crashed with: ERROR LLM Prompter 3:35 Execute failed: Error while sending a command.
  • Mistral Open Orca:mistral-7b-openorca.Q4_0.gguf. LLM prompter returns empty strings
  • Orca 2 (Medium):orca-2-7b.Q4_0.gguf. LLM prompter returns something. Some answers are the labels. But some answers looks like attempts to chat.
  • Mistral Instruct: mistral-7b-instruct-v0.1.Q4_0.gguf. It works. For two records I haven’t labels, but for others answers like “Label: negative”. Quality is bad, but it works.
    As a result I had some questions. Please, look on it:
  • What’s happened with GPT4All Falcon? Could I do something to make it work?
  • Mistral Open Orca - have I a chance to make it work?
  • Orca 2 (Medium). Could I prepare initial prompt and then, using context, made prompts with task. In the CPT4All application I made this in following way:

Prompt: I want you work as text evaluator. Assign to request a sentiment label. Labels can be one of [“positive”, “negative”, “neutral”]. My request: In a media advisory , the NTSB said that after subsequent testing , `` the train detection system intermittently failed . ‘’

Response: To evaluate this text, I will analyze its sentiment based on the words and context.

Step 1: Identify important aspects of the text

  • The media advisory is a formal announcement from an authority (NTSB)

  • The train detection system intermittently failed implies that there was a problem with the safety feature of the trains, which could have serious consequences

  • Subsequent testing indicates that this issue was not resolved easily or quickly

Step 2: Analyze sentiment based on aspects

  • The formal announcement suggests that the situation is important and needs to be communicated clearly

  • The intermittently failed train detection system implies a lack of reliability, which could cause concern for passengers and authorities alike

  • Subsequent testing indicates that this issue was not resolved easily or quickly shows frustration and dissatisfaction with the solution

Step 3: Assign sentiment label based on analysis

Based on my analysis, I would assign a negative sentiment label to this text. The words and context convey a sense of worry, inconvenience, and disappointment.

Prompt: Assign sentiment label to: In a media advisory , the NTSB said that after subsequent testing , `` the train detection system intermittently failed . ‘’

Response: Negative

Or, may be, somene can help with prompt?

  • May be someone can help me with Mistral Instruct to improve results? Now I have only 30% accuracy. I use prompt: “I want you to act as text evaluator. Assign to every next prompt a sentiment label. Labels can be one of [“positive”, “negative”, “neutral”]. My prompt:”
    At all, I have only one working LLM at this time from four. That’s crash for me.
    Thank you!

@Max these things come to my mind. Some LLMs do habe specific prompts they want to see. You might want to check them out and try to use them specifically in the context.

Most seem to work with more general prompt templates also, but maybe check this out. From testing some local LLMs it seems you have to do some experiments which prompts actually suit them. It might not be the same for every one. Some seem to respond well to very ‘restricted’ commands - ChatGPT in general is fine with more general commands and ‘figures it out’.

In general I found that LLMs are not that good at consistently bringing back a classification result (like you would like with a prediction model), I thought about providing them maybe with a collection of examples thru a Vector store *1) first and then telling them to use that as as pattern.

Then have you tried to use ChatGPT (4) for the task. Due to the effort of the developers this is generally much better than ‘just’ LLMs scanning and encoding the whole internet. More about the ‘Secret Sauce’ of OpenAI can be read in an article *2).


*1) Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All


*2) RLHF: Reinforcement Learning from Human Feedback (Chip Huyen)

Very same issue with gpt4all-falcon-newbpe-q4_0.gguf (i.e. “ERROR LLM Prompter 3:868:104 Execute failed: Error while sending a command”) when trying to execute the workflow JKISeason2-20 Ángel Molina – KNIME Community Hub

Managed to successfully run the workflow Creating a Local LLM Vector Store from PDFs with KNIME and GPT4All using all-MiniLM-L6-v2-f16.gguf (gpt4all-falcon-newbpe-q4_0.gguf KO) and the 24 page coffee machine manual, although with the following result:

Any hint anyone?
Cheers,
-Stef

Hi @Stef

I can confirm the behavior with the !!!!! as response with the exact same example. Despite a lot of troubleshooting I was unable to find the root cause of this. It seemed somewhat random. Often when I completely reloaded the entire WF and the .gguf it sometimes started working but not very consistently.

was there ever a solution? I’m having the same issue on the same workflow downloaded from the hub (reviews). I’ve downloaded falcon and orca and from the GPT4All LLM Connector, I’m getting a green signal, but the LLM Prompter give me the red circle / ‘x’ with the error: Execute failed: Error while sending a command.:

@ebarr I see the problem on my Apple Silicon. I opened a ticket with knime. it is possible GPT4All changed something in the back.

3 Likes

Thank you - I sent ticket as well. Received note that several others are experiencing the same thing and the engineers are looking in to it. Thank you.

2 Likes

@ebarr on my Windows machine it seems to work though.

for me as well but results are not the best in my tests. Are there any additional KNIME specific things when using GPT4all and an open source model in comparison to load the model as a local api endpoint and calling it? br

i’ve tested many other models downloaded from gpt4all. the chat gpt models don’t seem work - I"m asked to include myAPI/secret key, which I do, but the models don’t seem to download? what would I put into the path? the only thing that saves over (that I can see) is the .txt file of my secret key. Also, I’m finding that whenI put a prompt into the model itself, I get completely different (and better) results from the model. For example, I have a series of reviews that were broken into topics. I added all of the reviews with corresponding topic assignment (0-9) directly into ChatGPT 4 and asked to return a 3-5 word unique description summary of the various topics. The model returned very clean 3 word descriptors. However, when using the local models, the prompts seem to be too large (even though I can add directly into model prompt), and the descriptors are terrible. Could be the model selected, so I want to use GPT4, but can’t seem to locate the a downloaded version.

@ebarr ChatGPT (4) is not just a LLM where the whole internet has been complied (sort of) but a complex system to handle questions (prompts) and do additional things to give better answers. You can read more about this here (as mentioned before):

There is a reason why these models cost money and need GPU power. Local models often are not as good and you will have to put in extra effort to create proper prompts and prepare the data you put in. The main benefit of a local model is that your data will not be sent to a third party.

In the LLM examples given also make sure to check the prompt (role etc) and if you give context from a vector store how large that is and that you actually use the engineered prompt.

Currently centrally curated models that run on large platforms are vastly superior. That might change. GPT4All (native) on Apple Silicon is now really fast compared to a few months ago. I hope KNIME is keeping track with developments.

2 Likes