Adjusting context window for local LLM using GPT4ALL nodes

I’m trying to get Llama 3.1 8B Instruct 128k work with KNIME on my MacBook Pro M4 machine. When testing this model directly from the GPT4ALL GUI I’m able to get it to summarise documents for me after increasing the model’s context length parameter in GPT4ALL. However, when I try to do the same using KNIME I get an error message that the prompt size exceeds the context window and I can’t find anywhere to configure the right context window size. There is a possibility to configure the response length, but that does is not helping. Is there a way to tell KNIME that the context length for this model is larger?

1 Like

Hello @havarde,

you are right, we are currently missing the ability to configure the context size of the models and I created a ticket to implement it.

Best regards,
Adrian

3 Likes

Thanks, @nemad.

It would be great to get this changed, but I’ve also noticed that there may be other strange things happening between the KNIME nodes and models that you run locally.

The same locally installed model (Llama 3.1 8B Instruct 8k and 128k both) have very different behaviour if you access it with it via GPT4ALL, Ollama and KNIME for the same prompts.

Generally the results with KNIME are much more “chatty” and less easy to control. This makes it almost impossible to use KNIME to build workflows to automate working with these local models.

One example that I’ve tried is to use the model to translate short sentences (way below the context limit). This works perfectly with GPT4ALL and Ollama, but the respons via KNIME adds a lot of chat or other noice to the answer. Attempting to force the model to answer in JSON or experimenting with the chat template does not solve it either and I’ve given up and started to code directly in Python to get the necessary control.

Yes, GPT4All now provides better templating capabilities and if I understand correctly, some models do provide a template as part of their config.
I also created a ticket to migrate to the GPT4All Jinja templates.

Best regards,
Adrian

3 Likes

@havarde maybe you can take a look at these examples using GPT4All and Ollama with KNIME nodes. For me the key to getting re-usable JSON results were the use of the instruct LLM model and experimenting with the prompts giving specific instructions.

There is an older post by @MartinDDDD about the use of structured output that might be worth looking at

1 Like

It would be awesome for KNIME to support such structured output directly. I managed to do it by tweaking prompts and providing an explicit template. That did work most of the time.