The model was primarily trained on English documents, which is why English is listed as the main language. However, the training data did include a smaller proportion of Chinese and various European languages. Additionally, the base model (Qwen-2.5-VL-3B) is multilingual. Someone on Reddit mentioned it worked on Chinese: https://www.reddit.com/r/LocalLLaMA/comments/1l9p54x/comment...
Douglas Adams was onto something when he decided the superintelligent servant in Hitchhikers Guide was to loudly complain about its endless depression. Maybe then we’ll only ask things of it when we actually need it and otherwise avoid interaction.
Why not go the other way and decrease the actions per minute so you learn the overall point of the game , And with each game the actions per minute increases.
Or maybe extend the traditional categories of macro and micro with another one, call it 'nano'... the micro agent indicates where each unit ought to be in 9 frames, and the nano agent figures out how to take them there. Since the timescale is so short, the agent could brute-force enumerate possible moves to some extent and figure out which is optimal, like chess AI. Or use a separate network.
I guess that's inelegant when a deep network already has its own concept of fine-grained versus coarse-grained layers, and should be able to do this on its own with the right training method.
That sounds like an interesting research angle. The thing about AI research is there are so many open ends there are essentially unlimited research options. If you can pose it as a problem and identify a reasonable programming approach then you have an avenue for AI research. Deep Learning isn't the end of AI research. It is the beginning.