12B is pretty small, so I’m doubting it’ll be anywhere close to internvl2 howeve...

Jackson__ · on Sept 11, 2024

It appears to be slightly worse than Qwen2VL 7B, a model almost half it's size, if you look at the Qwen's official benchmarks instead of Mistral's.

https://xcancel.com/_philschmid/status/1833954941624615151

kaoD · on Sept 11, 2024

But Qwen is not multimodal, or is it?

Jackson__ · on Sept 11, 2024

https://qwen2.org/vl/

>Qwen2-VL is the latest addition to the vision-language models in the Qwen series, building upon the capabilities of Qwen-VL. Compared to its predecessor, Qwen2-VL offers:

>State-of-the-Art Image Understanding

>Extended Video Comprehension

Besides, it'd have been pretty silly for them to mention it on their slides if it wasn't.

jazzyjackson · on Sept 12, 2024

I've found llama 3.1 8B to be effective at transforming unstructured text into structured data, now that LM Studio accepts a json schema parameter.

For a general knowledge chatbot it doesn't know much of course, but its a good worker bee.