OpenAI’s new flagship crushes coding tests and complex logic, but creative work is sterile, safety rails are heavy-handed, and its cramped context window leaves it lagging rivals like Claude.