Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

12B is pretty small, so I’m doubting it’ll be anywhere close to internvl2 however mistral does great work and likely this model is still useful for on device tasks


It appears to be slightly worse than Qwen2VL 7B, a model almost half it's size, if you look at the Qwen's official benchmarks instead of Mistral's.

https://xcancel.com/_philschmid/status/1833954941624615151


But Qwen is not multimodal, or is it?


https://qwen2.org/vl/

>Qwen2-VL is the latest addition to the vision-language models in the Qwen series, building upon the capabilities of Qwen-VL. Compared to its predecessor, Qwen2-VL offers:

>State-of-the-Art Image Understanding

>Extended Video Comprehension

Besides, it'd have been pretty silly for them to mention it on their slides if it wasn't.


I've found llama 3.1 8B to be effective at transforming unstructured text into structured data, now that LM Studio accepts a json schema parameter.

For a general knowledge chatbot it doesn't know much of course, but its a good worker bee.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: