Vectorstore to Tool and Tool Concatenator node use strategy for more complex RAGs

I’m running into some snags in my efforts to create and combine vectorstores from different data sources. (I found some interesting insight for part of my problem from Roberto Cadili’s answer here on using multiple vectorstore to tool nodes to leverage different metadata: Seeking Guidance on Expanding KNIME AI Chatbot Demo to Include Multiple Data Columns in FAISS Vector Store - #3 by roberto_cadili)

However, my question has more to do with the rules in KNIME for combining diverse vectorstores, and whether that is even possible. I could be misunderstanding best practice in using these nodes for this situation.

A hypothetical workflow:

I want to create a RAG that includes two data sources: pubmed and orphanet.

I embed pubmed abstract text using the FAISS Vector Store Creator node, with PMID as the metadata. I save this as a pubmed.model file.

I also embed disease description text from Orphanet using the FAISS Vector Store Creator node, with Orphacode as the metadata. I save this as orphanet.model file.

Each model is read and fed into a different Vectorstore to Tool node, each one have a unique description of the tool and its purpose.

Question: Can these be concatenated into a single tool, if they possess different metadata?

I suspect not, based on my experience.

Is the correct solution to 1) combine the two data sources into tables that harmonize IDs in a single column, have a second column SOURCE that is metadata specifying where the record came from (levels = pubmed, orphanet), and finally a column TEXT containing the text from each source and then 2) embed?

Or is the solution to deal with distinct vectorstores sequentially, then bringing two separate responses back together in a next context, and have the LLM think about how best to integrate the two?

Hi @longoka,

Sorry for the long wait but I’m happy that you found previous threads useful.

In general, it should not be a problem to concatenate vector stores with different metadata using the Tool Concatenator node (Tool Concatenator – KNIME Community Hub). I did a quick check using this example workflow (OpenAI Agent Multiple Vectorstores – KNIME Community Hub), where in one vector store I included the “source” column, and in the second vector store I removed it. The concatenate operation worked as expected.
Question: are you using the same embedding model for both vector stores? If not, that may lead to issues.

Concerning your second question, I believe it’s a matter of use case/best practice, as none of them seem wrong to me. When the use case allows, my suggestion is to keep vector stores separate. This facilitate future maintenance, update and enrichment of solely the vector store that requires it.

Hope it helps :slight_smile:,
Roberto

5 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.