I’m running into some snags in my efforts to create and combine vectorstores from different data sources. (I found some interesting insight for part of my problem from Roberto Cadili’s answer here on using multiple vectorstore to tool nodes to leverage different metadata: Seeking Guidance on Expanding KNIME AI Chatbot Demo to Include Multiple Data Columns in FAISS Vector Store - #3 by roberto_cadili)
However, my question has more to do with the rules in KNIME for combining diverse vectorstores, and whether that is even possible. I could be misunderstanding best practice in using these nodes for this situation.
A hypothetical workflow:
I want to create a RAG that includes two data sources: pubmed and orphanet.
I embed pubmed abstract text using the FAISS Vector Store Creator node, with PMID as the metadata. I save this as a pubmed.model file.
I also embed disease description text from Orphanet using the FAISS Vector Store Creator node, with Orphacode as the metadata. I save this as orphanet.model file.
Each model is read and fed into a different Vectorstore to Tool node, each one have a unique description of the tool and its purpose.
Question: Can these be concatenated into a single tool, if they possess different metadata?
I suspect not, based on my experience.
Is the correct solution to 1) combine the two data sources into tables that harmonize IDs in a single column, have a second column SOURCE that is metadata specifying where the record came from (levels = pubmed, orphanet), and finally a column TEXT containing the text from each source and then 2) embed?
Or is the solution to deal with distinct vectorstores sequentially, then bringing two separate responses back together in a next context, and have the LLM think about how best to integrate the two?