Implement GUI Interaction and VLM Integration

implementationChallengeNovember 21, 2025

Prompt Content

Develop the core Playwright scripts for navigating a target web application (e.g., a simple e-commerce site or a public form). Implement the integration with your chosen VLM to capture screenshots, process them, and generate textual descriptions of the GUI state. How will the VLM outputs be structured for LLM consumption?

Usage Tips

Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)

Customize placeholder values with your specific requirements and context

For best results, provide clear examples and test different variations