OpenAI and knime

Data_consumer · December 13, 2023, 5:51pm

Hi,

I was wondering about the Open AI capabilities of KNIME.

I found some basic models that were able to utilize openAI API to submit own table data, but I was wondering is there are example models of submitting own pdf/word data to open AI and getting results for a chatbot from there :)?

AlexanderFillbrunn · December 13, 2023, 7:55pm

Hi,
Have you seen this workflow? It uses a PDF as knowledge base. First it creates OpenAI embeddings and a vector store for finding context to user queries, then formulates an answer using an OpenAI chat model.
Kind regards
Alexander

Data_consumer · December 13, 2023, 8:09pm

@AlexanderFillbrunn Thanks!! No, haven’t seen it before, will have a look!

mlauber71 · December 13, 2023, 8:31pm

@Data_consumer I adapted this very workflow to show how to use the new GPT4All nodes to build local vector stores from the pdf with local models:

Data_consumer · December 13, 2023, 8:53pm

@mlauber71 thanks! I will try this out the coming weekend!

Daniel_Weikert · December 14, 2023, 4:46pm

@AlexanderFillbrunn
Great example
How would you handle updating the vectorstore. E.g. data has been updated but the store is way to big to create it from scratch again. Now we only want to update the relevant vectors with new info
Also I assume the parser only extracts the text but not content in images and tables inside pdf files?
(I know these are advanced questions with probably no short simple answer but I curious to hear your thoughts if any)
br

AlexanderFillbrunn · December 15, 2023, 9:03am

Hi,
Yes, the parser only takes the real text, not text in images etc. For that you’d use something like Azure AI Document Intelligence, which does OCR.
For updating vector stores: these nodes are still pretty fresh and also pretty basic, so we do not have a “Vector Store Updater” yet. But I agree that this could be very useful. You could maybe use a Postgres DB with this plugin. But I have not seen it used in KNIME before.
Kind regards,
Alexander

mlauber71 · December 15, 2023, 5:44pm

@Daniel_Weikert I have discussed with some colleagues the use of extra packages to untangle images and tables from pdfs to use them in vector stores but this is a complicated business and pdf can be quite a complex format. And then the challenge is to interpret the content at the right place.

One thing to do could be to employ ChatGPT with this though I am not sure if the knime ports can handle the data formats.

system · December 22, 2023, 5:45pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.