LLM / AI / OpenAI workflow with file input

Hello .

I want to create a workflow with existing nodes and without any Java/Python-workarounds to ask a local LLM. Everything easy, as long as the input is text only. But I need to have the input both: A file (binary) and a text with the prompt. As far as I can see, there are no nodes fitting this purpose out of the box.
Did someone already made this? Is there a methodology where you can e.g. create something like a “multipart-form”, meaning you kind of join the file and the string to a mixed data stream (e.g. a JSON) and send this to the LLM?

Thanks for your thoughts and solutions on this already upfront.

Hey there,

as you already found the current GenAI extension does not support multi modal inputs, but is text only.

With regards to documents the use case of Retrieval Augmented Generation is covered, provided you are using an embedding model that is compatible with OpenAI API (which currently rules out local models hosted e.g. via Ollama).

I am not sure what the content of your binary file is and how you’d merge that etc, that said @roberto_cadili has some example workflows (which use python scripts - so may not be an option) to interact with embeddings and vision models:

I myself have developed an experimental extension that makes prompting vision models possible with a node that is part of this extension. You can find the extension with links to example workflows below:

3 Likes

Thanks, that’s exactly what I’m looking for. I just have to get the Ollama API working :joy:

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.