AgentsThe Decoder·May 16, 2026

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

Claude Mythos and GPT-5.5 just set a new benchmark by developing real browser exploits autonomously. This means AI can now perform complex tasks without human intervention, raising concerns about security and ethical use.

Read the full article on The Decoder

More in Agents

AgentsAWS Machine Learning1d

Building an agentic AI solution at Bluesight with Amazon Bedrock

Bluesight is developing an agentic AI solution using Amazon Bedrock. This integration allows for more efficient task automation and enhanced decision-making capabilities in their operations.

AgentsThe Decoder2d

Claude Code now has a built-in browser that lets the AI read, click, and type on external websites

Claude Code just added a built-in browser that allows the AI to read, click, and type on external websites. This upgrade enables Claude to perform tasks autonomously on the web, enhancing its functionality for users.

AgentsThe Decoder2d

Claude Cowork's biggest use case is the mundane office work nobody wants to own, Anthropic says

Anthropic is positioning Claude Cowork to tackle mundane office tasks that employees typically avoid. This means users can offload routine work to AI, freeing up time for more engaging activities.

AgentsThe Decoder2d

AI agents win at Slay the Spire 2 after researchers replace growing chat logs with structured memory

Researchers just improved AI agents by replacing growing chat logs with structured memory in Slay the Spire 2. This change boosts the agents' performance and efficiency in completing complex tasks during gameplay.