For the fun of it I expanded the example and added a few tasks. Like letting the (local) model create a JSON file with the severity level of the accident and also assign a compensation based on the image and description. To be honest I am not 100% sure the local Quen3 model actually does look at the picture and it always agrees with the initial claim but the reasoning it displays are interesting and I think one can work on the prompts. It is interesting that the model will give back consistently a good JSON that you can then process further. And all this with a local LLM via Ollama. So no leakage to the web - run on a standard Apple M1 machine.
Mistral 3.1 ( mistral-small3.1 ) claims to actually ‘see’ the images. One would have to explore further if this is the case.
