I spun up a new LLM benchmark: how well can they handle this prompt?

Simon Willison

Generate an SVG of a pelican riding a bicycle

I find the results so far utterly delightful: https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/

Simon Willison

OpenAI's models are quite good at it (not as good as Claude 3.5 Sonnet though)

Simon Willison

The Llama models I tried both did terribly, but Gemini 1.5 Flash 8B wins for weird charm (even if it doesn't really look like a pelican at all)

Simon Willison

Paul Calcraft extended this idea into an implementation of Pictionary where different vision LLMs generate SVGs and race to guess what the others are drawing and it is absolutely brilliant https://twitter.com/paul_cal/status/1850262678712856764