I spun up a new LLM benchmark: how well can they handle this prompt?
-
I spun up a new LLM benchmark: how well can they handle this prompt?
Generate an SVG of a pelican riding a bicycle
I find the results so far utterly delightful: https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/
-
Simon Willisonreplied to Simon Willison last edited by
OpenAI's models are quite good at it (not as good as Claude 3.5 Sonnet though)
-
Simon Willisonreplied to Simon Willison last edited by
The Llama models I tried both did terribly, but Gemini 1.5 Flash 8B wins for weird charm (even if it doesn't really look like a pelican at all)
-
Simon Willisonreplied to Simon Willison last edited by
Paul Calcraft extended this idea into an implementation of Pictionary where different vision LLMs generate SVGs and race to guess what the others are drawing and it is absolutely brilliant https://twitter.com/paul_cal/status/1850262678712856764
Copyright © 2024 NodeBB | Contributors