Information used for embeddings using the OpenAI nodes

VAGR_ISK · April 23, 2024, 3:51pm

Hi Everyone,

I’ve been watching @tosinlitics’ amazing videos and experimenting with the Agent prompter. My process involved creating embeddings with the Vector Store tool and incorporating them into my agent prompter workflow. But I’ve hit a snag – when I checked my OpenAI account, I couldn’t find any of the vectors I thought I had stored. To add to the confusion, there’s no activity log or billing details related to these actions.

This raises a couple of questions for me: Firstly, where exactly is my data being stored when I use the Vector Store tool, considering it requires my OpenAI API key? And secondly, why isn’t my account reflecting any charges for the use of the GPT models?

Appreciate any help you can provide,

AG.

TosinLitics · April 24, 2024, 7:48am

Hi @VAGR_ISK!

Glad you are enjoying my content!

Regarding where the vector store is stored, my best guess is that it is stored somewhere on your local machine or might be in a cache. I have contacted KNIME about allowing users to specify where a vector store is stored, to facilitate easy retrieval in the future as the current set-up is really not workable in a production context, or for a serious business use case.

Regarding charges, you only get charged for creating embeddings, not for creating a vector store. So the charges you will see on OpenAI would be for using the Ada embedding model. Once you start using the LLM model, or the chat model, you will be charged for these as well. The cost of embeddings are extremely low, so if your document is small, you may not yet have accrued any detectable costs. Also, there is a lag of about a day I believe from when you use the OpenAI model to when the costs appear on your billing account.

VAGR_ISK · April 24, 2024, 11:07am

Hi,

Thank you for your response. Yes, I am somewhat uneasy about sending company information directly, especially given the limited details on where and how it is transmitted and stored. The recent scandal involving OpenAI, reported by The Times Journal, which alleged unauthorized access to private data, makes it difficult to trust anyone, regardless of a company’s reputation or prestige.

For scalability, as you mentioned, it’s essential to have complete transparency about where and how the information is processed and stored. This level of transparency is crucial to adapt company policies for the adoption of such tools in business cases.

I am impressed with the OpenAI nodes and am tempted to use them. In the meantime, while the KNIME nodes are being updated, I will rely on the Python script node-based components I developed to connect to OpenAI and Anthropic.

Btw, the new embeddings models of OpenAI are fantastic, absolutely amazing.

Looking forward to more of your videos…!

Cheers,

AG.

TosinLitics · April 24, 2024, 12:55pm

Hi @VAGR_ISK

I double-checked and it does appear the vector store is indeed created on the local machine. If you open the folder for the associated node, you’ll find it there.

On the topic of data security and privacy, I do not believe the KNIME node poses an additional security risk. Rather, I think the risk comes from OpenAI itself. With OpenAI, users do not have control over where the data is processed or stored, and OpenAI does use contractors to handle processing. While OpenAI has several data privacy certifications, I don’t know how they ensure the same level with their contractors.

The reasons outlined above are why many companies who are concerned about data security and privacy use the OpenAI models via Azure since Azure provides full data control and top-notch data security. For work projects, I only use the Azure OpenAI APIs and there are KNIME nodes for these as well.

Thanks for the note about the new embedding models! I have a robust work-related use case, built in python as well, and I am currently using Ada for the embeddings. I see that the embeddings 3 model outperforms ada, albeit marginally, and is cheaper.

VAGR_ISK · April 24, 2024, 1:33pm

Thanks for the screenshot. It is reassuring to hear your opinion about KNIME’s security. Indeed they are nice people, specially those I meet here in Germany.

Regarding OpenAI’s subcontractors, my understanding is that they are required to conduct supplier/subcontractor qualification to comply with regulatory standards. This implies that the security level for data management must be at least as stringent as OpenAI’s, and they must perform audits to verify the subcontractor’s claims (Link). This is particularly critical given the close scrutiny from regulatory bodies. However, in practice, you know how it goes in Silicon Valley—with all those wealthy tech oligarchs around, bending the rules can just be another day at the office.

I did attempt to gain access to OpenAI’s Azure service, but access is granted only after submitting a form. We completed this process, but have yet to receive a response or approval of our application. Perhaps you know of an alternative method to expedite access.

Cheers,

AG

VAGR_ISK · April 25, 2024, 5:35am

BTW, I created a work flow to extract the FAISS embeddings and pass it to KNIME tables. Just in case you get someone asking for it in the future.

Link to my workflow

Cheers.

system · July 24, 2024, 5:36am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.