How can I determine the number of tokens (e.g. for chatgpt) of a text?

Hello all,
how can I determine the number of tokens of a text, which e.g. chatgpt uses for the cost calculation?

Is there a solution for this?

Many greetings
sabsab

You could tokenize the text and then calculate the length
br

Hmm, not so easy. I have built a simple tokenizer here.


tokenizer_simpel.knwf (13.6 KB)

But the result unfortunately differs from the calculations of e.g. OpenAI (OpenAI Platform): 138 vs 163.

Have I chosen the right way? What do you mean by calculate the lenght?

br

ChatGPT pricing for their api is based on token count so you could count the tokens (what i meant by length) for the input and also for the response to calculate the cost

OpenAI uses their own word embeddings so their token creation is based on that.
br

That is exactly my goal. A token counter based on Knime that comes close to ChatGPT. Background: this would be the starting point to compile texts for e.g. the FAISS Vector.

There is already a Python solution, but that is beyond my competence.

Br