Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent https://simonwillison.net/2024/Oct/17/video-scraping/

Simon Willison

@dbreunig I'm still frustrated that Anthropic don't release their tokenizer!

Gemini have an API endpoint for counting tokens but I think it needs an API key

Drew Breunig

@simon Now that you mention it, I'm curious how different each platform is with tokens and how that might affect pricing (or just be a wash)

Simon Willison

@dbreunig yeah it's frustratingly difficult to compare tokenizers, which sure make price per million less directly comparable

Simon Willison

@dbreunig running a benchmark that processes a long essay and records the input token count for different models could be interesting though

axleyjc

@simon At what environmental cost though?

Simon Willison

@axleyjc I'd love to understand that more

In particular, how does the energy usage of running that prompt for a few seconds compare to the energy usage of me running my laptop for a few minutes longer to achieve the task by hand?

Simon Willison

@axleyjc I was using Gemini Flash here which is a much cheaper, faster and (presumably) less energy intensive model than Gemini Pro

There's also the new Gemini Flash 8B which is cheaper still, and the "8B" in the name suggests that the parameter count may be low enough that it could run on a laptop if they ever released the weights (I run open 8B models locally all the time)

th0ma5

@simon great documentation ... Any details on accuracy? How much did you have to clean up the output and did you have to check it all by hand?

Simon Willison

@th0ma5 I checked it all, didn't take long (I watched the 35s video and scanned the JSON) - it was exactly correct

Phil Gyford

@simon Is it also possible to calculate how much energy these things use, and some comparisons of what that's equivalent to? I hear that AI is energy intensive but I have zero concept of what that means in reality for a single "thing" like this.

Simon Willison

@philgyford if that's possible I haven't seen anyone do it yet - the industry don't seem to want to talk specifics

GPUs apparently draw a lot more power when they are actively computing than when they are idle, so there's an energy cost associated with running a prompt that wouldn't exist if the hardware was turned on but not doing anything

Simon Willison

Video scraping in Ars Technica! https://arstechnica.com/ai/2024/10/cheap-ai-video-scraping-can-now-extract-data-from-any-screen-recording/

Felix Westphal

@simon @th0ma5 you being surprised that this actually worked tells a lot about the state we're in and afaik with this technology we can never be sure that the result will actually be correct. So if you have to double-check anyways you can just do it yourself manually (or use another non-LLM-tool).

Simon Willison

@superFelix5000 @th0ma5 right - the single hardest thing about learning to productively work with LLMs is figuring out how to get useful results out of inherently unreliable technology

axleyjc

@simon Are you also considering that the low costs are illusory? They are artificially low either because they are subsidized by VC money or in Google's case, advertising revenue. Openai is bleeding $ and not making a profit. Just given the energy and hardware costs, the current prices are unsustainable.

There's a serious risk that companies who have hard dependencies on cloud genai will get a big surprise bill down the road or have to pivot to a local model that's not as good.

Simon Willison

@axleyjc I think about that all the time! It's one of the reasons I've been hesitant to committing to building substantial things on Gemini that assume the price will stay constant