In short: you can’t. The only way that I can think of is to give (very) detailed instructions and use models that are tilted toward instruction and not chat. You will have to experiment which sort of text will work best but other than traditional Machine Learning models you will have no certainty about the outcome.
You can try to experiment with agents where one final agent checks the result against some benchmark but that will add another layer of complexion.