Remove background from an image. Summarize some text. OCR to select text or click links in a screenshot. Relighting and centering you in your webcam. Semantic search for images and files.
A lot of that is in the first party Mac and Windows apps.
iOS Safara user here. The viewport jumped for me only on the active search demo and only the first time when the result content was initially loaded. This can be easily fixed by having a fixed-size results box loaded from the beginning.
Either way it is text instructions used to call a function (via a JSON object for MCP or a shell command for scripts). What works better depends on how the model you’re using was post trained and where in the prompt that info gets injected.
reply