Feature request: Gen AI Extension - Support Structured Output & Vision Models

Hey there,

I’ve been a big fan of the Gen AI Extension and have been exploring different use cases - both in private as well as published on Medium / YouTube publicly.

It is crazy how quickly things evolve in the Gen AI space and there are two topics that are right now not supported by the Gen AI extension that I think can make a significant difference and enable A LOT of new use cases:

  1. Structured Outputs (OpenAI) / JSON Mode / Definition of response formats
  2. Prompting multimodal modals (Vision for sure, maybe audio)

Structured outputs enable data extraction from unstructured sources, workflow routing and combined with different extensions can probably push automation in KNIME to a new level

Vision models have come a very long way and I can see use cases from image classification, text recognition and probably not too far into the future reliable interpretation of charts / graphs…

So long story short: Please add this :slight_smile:

Hey @MartinDDDD

thanks for your input. I agree that these are very important features. These are both things we have on the list. Just as a side note: You can activate JSON Mode already using the System prompt.

All the best
Linus

3 Likes

Thanks for your comments :-).

Models definitely got better by adhering to requests in system message / user message to respond to JSON - I found though that for the OpenAI models it really helps if you can pass in a JSON Schema via response_format.

I can see how it’s a bit difficult to implement right now as e.g. other providers like groq etc. are compatible with OpenAI API, but treat response_format differently (i.e. don’t expect a full blown schema), but would still be great for that to work.

Looking forward to seeing the Gen AI Extension evolve :slight_smile: