GPT-5 Multimodal Tool Integration for Figures

implementationChallengeOctober 31, 2025

Prompt Content

Implement an 'Image Analysis Agent' that uses GPT-5 (or a specialized VLM integrated with GPT-5) to analyze a base64 encoded patent figure. This agent should extract key features, objects, and their relationships described in the image. Use MCP to mock calling a 'Getty Images API' or similar service if GPT-5's multimodal capabilities are not sufficient for complex diagram interpretation. The output should be structured textual descriptions suitable for feeding into the graph RAG.

Usage Tips

Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)

Customize placeholder values with your specific requirements and context

For best results, provide clear examples and test different variations