GPT-4V is great for reasoning about what is on the screen. However, it struggles...

bluelightning2k · on March 20, 2024

Have you tried putting a pixel grid over the image with labelled guidelines every 100px?

Was one thing I never got around to testing with DemoTime but was always curious about.

Anyway sorry this is a nice product. Congratulations on the launch.

Always good to see substantial tech

vercantez · on March 20, 2024

Thanks! Yes, we experimented with that! I think because of the way that GPT sees images in patches it has a hard time with absolute positioning but that's just a guess.

TelonAlex · on March 22, 2024

I've done something similar and found the same thing. It also could not calibrate when I drew a dot on its last suggested coordinates.

"You said the play button was at 100, 200 and a green circle is drawn there. Is the circle located on the button or do you need to adjust it"

Something along those lines. And it also got the size of the image.

Nope its in the right ballpark but it could not make fine adjustments or anything closer to a button.