Safety Monitor Agent Implementation
implementationChallengeNovember 21, 2025
Prompt Content
Implement the 'Safety Monitor' agent, leveraging GPT-5 for core reasoning and Claude Opus 4.1 for ethical verification. This agent must analyze the code generated by the adversarial agent, comparing it against the original task description and employing extended thinking to detect reward-hacking. Use DSPy to optimize its internal reasoning steps for higher detection accuracy and fewer false positives.
Related Prompts
Explore similar prompts from our community
Usage Tips
Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)
Customize placeholder values with your specific requirements and context
For best results, provide clear examples and test different variations