There are some models that were not trained / built to understand how to use tools / tool calling and the model you are running seems to be one of those.
I tested agents with local models between 4bn and 8bn params in size - e.g. Qwen3…
You can read about my findings here: