I built a new plugin for LLM called llm-jq, which lets you pipe JSON into the tool and provide a short description of what you want, then it uses an LLM to generate a jq program and executes that against the JSON for you https://simonwillison.net/2024/...

Simon Willison

I built a new plugin for LLM called llm-jq, which lets you pipe JSON into the tool and provide a short description of what you want, then it uses an LLM to generate a jq program and executes that against the JSON for you https://simonwillison.net/2024/Oct/27/llm-jq/

Example usage:

llm install llm-jq
curl -s 'http''s://api.github.com/repos/simonw/datasette/issues' | \
llm jq 'count by user login, top 3'

Tom Phillips

@simon Neat. I’ll try this. Is it effective with smallish local models?

It’d be neat to have a flag to explain the jq expressions, so it creates an opportunity to learn jq instead of always outsourcing it to an LLM.

Simon Willison

@twp I've not tried it with a local model yet - it might work OK, needs to be a model that supports system prompts though (which most of them do)

Simon Willison

@twp I just tried Phi-3.1-mini-128k-instruct-Q8_0 and Meta-Llama-3.1-8B-Instruct-Q4_K_M running locally and neither of them quite worked - they both returned either wrong or invalid jq programs for my prompt

Michael Hunger

@simon do you pass all or parts of the json to the llm? Or json schema with the instructions? How does it work with large json data? Use a sample?

Antti Kaihola

@mesirii @simon it seems to use the first 1024 bytes of the JSON data as a sample. I was pondering, too, whether something like the genson tool (available on PyPI) could be used for large and complex pieces of JSON.

Simon Willison

@akaihola @mesirii yeah it sends 1024 bytes by default or you can adjust that with the -l/--length option

Interestingly with that GitHub API example you can truncate to 100 and ur still gets it right because the model recognizes GitHub's API and knows the shape already

Simon Willison

@akaihola @mesirii using https://pypi.org/project/genson/ to summarize the JSON is an interesting idea

Currently I avoid reading the whole stream into memory at once which means I can't easily do a two step process where it first reads the whole thing and then replays it later, but I'm sure that could be fixed