Hi everyone, I’m new to KNIME and would like to use it for a scientific study. I wanted to ask for your help and input! I’ve created three personas with different characteristics for investment advice. I want three AI benchmark models to then conduct investment advice and save the results. I know I can control the responses to some extent with temperature settings, but how do I ensure that the responses aren’t unreliable, meaning they exhibit a certain degree of “repeatability,” and that the results aren’t altered afterward? In other words, that the output isn’t random and is consistent. I want to have the responses reviewed by experts afterward. How would you set up this? Are there any things I absolutely need to consider? I would be very grateful for any help, godspeed
In short: you can’t. The only way that I can think of is to give (very) detailed instructions and use models that are tilted toward instruction and not chat. You will have to experiment which sort of text will work best but other than traditional Machine Learning models you will have no certainty about the outcome.
You can try to experiment with agents where one final agent checks the result against some benchmark but that will add another layer of complexion.
I agree with @mlauber71 here - even with temperature = 0 you will likely get some sort of variability in the output even when using the same input - responses may vary if the input prompt differs (although from a “human” perspective, the input may have exactly the same meaning…).
An agentic set up may help - you could give one agent two tools - one tool uses LLMs inside of it to create the first response, the second tool has an LLM inside it with instructions on quality control / examples how the response should be framed. Your agent then has instructions to first use tool 1 to create an initial response and pass this to tool 2 for QA - the output of tool 2 should then be presented back to the user…