GPT-5 Multimodal Tool Integration for Figures
implementationChallengeOctober 31, 2025
Prompt Content
Implement an 'Image Analysis Agent' that uses GPT-5 (or a specialized VLM integrated with GPT-5) to analyze a base64 encoded patent figure. This agent should extract key features, objects, and their relationships described in the image. Use MCP to mock calling a 'Getty Images API' or similar service if GPT-5's multimodal capabilities are not sufficient for complex diagram interpretation. The output should be structured textual descriptions suitable for feeding into the graph RAG.
Related Prompts
Explore similar prompts from our community
Usage Tips
Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)
Customize placeholder values with your specific requirements and context
For best results, provide clear examples and test different variations