I was wondering about the Open AI capabilities of KNIME.
I found some basic models that were able to utilize openAI API to submit own table data, but I was wondering is there are example models of submitting own pdf/word data to open AI and getting results for a chatbot from there :)?
Hi,
Have you seen this workflow? It uses a PDF as knowledge base. First it creates OpenAI embeddings and a vector store for finding context to user queries, then formulates an answer using an OpenAI chat model.
Kind regards
Alexander
@AlexanderFillbrunn
Great example
How would you handle updating the vectorstore. E.g. data has been updated but the store is way to big to create it from scratch again. Now we only want to update the relevant vectors with new info
Also I assume the parser only extracts the text but not content in images and tables inside pdf files?
(I know these are advanced questions with probably no short simple answer but I curious to hear your thoughts if any)
br
Hi,
Yes, the parser only takes the real text, not text in images etc. For that you’d use something like Azure AI Document Intelligence, which does OCR.
For updating vector stores: these nodes are still pretty fresh and also pretty basic, so we do not have a “Vector Store Updater” yet. But I agree that this could be very useful. You could maybe use a Postgres DB with this plugin. But I have not seen it used in KNIME before.
Kind regards,
Alexander
@Daniel_Weikert I have discussed with some colleagues the use of extra packages to untangle images and tables from pdfs to use them in vector stores but this is a complicated business and pdf can be quite a complex format. And then the challenge is to interpret the content at the right place.
One thing to do could be to employ ChatGPT with this though I am not sure if the knime ports can handle the data formats.