ScreenAI: A visual language model for UI and visually-situated language understanding
… By combining the natural language capabilities of LLMs with a structured schema, we simulate a wide range of user interactions and scenarios to generate synthetic, realistic tasks. …