i have toyed around Tiny Llama 1.1B ( https://github.com/jzhang38/TinyLlama ) for the "conversational brain" of NPCs in a game i was dabbling.... it was fun.
Qwen models are generally the best if you want local models in the 100b max size.
I am not sure but I recommend looking at the size of deepseek flash as well to see if it can work for you, perhaps GLM 5 Air model is also around this size.
and if you want something at the larger models scale, then you have glm 5.2, kimi k2.7,mimo v2.5 pro and deepseek v4
i have toyed around Tiny Llama 1.1B ( https://github.com/jzhang38/TinyLlama ) for the "conversational brain" of NPCs in a game i was dabbling.... it was fun.
When I added... and assistant sometimes likes to talk like a pirate to an existing system prompt.
It was quite fun
Qwen models are generally the best if you want local models in the 100b max size.
I am not sure but I recommend looking at the size of deepseek flash as well to see if it can work for you, perhaps GLM 5 Air model is also around this size.
and if you want something at the larger models scale, then you have glm 5.2, kimi k2.7,mimo v2.5 pro and deepseek v4
[flagged]
[flagged]
[flagged]