I added multi-modal (image, audio, video) support to my LLM command-line tool and Python library, so now you can use it to run all sorts of content through LLMs such as GPT-4o, Claude and Google Gemini
-
@aburka from my post:
-
@simon I mean, they've been evaluated, they're not suitable. What's left to explore?
-
@aburka since you ask, I did dig around in one of the papers underlying the other story and found it was partly about how much better whisper v3 was compared to v2 https://fedi.simonwillison.net/@simon/113380266881069878
-
Simon Willisonreplied to Simon Willison last edited by
@aburka generally though the most important thing about using LLMs (and AI/machine learning models in general) is figuring out how to make effective and responsible use of inherently unreliable technology
Generating unreviewed medical transcripts and then throwing away the original recordings is NOT responsible
-
@simon Does video work? I tried both Gemini pro and flash, but I only got some error message. Do I need a paid account to use video scraping? (Image works as expected.)
-
@xsc video should work, what file format were you trying? Currently needs to be less than 20MB - that's a temporary limitation of my llm-gemini plugin
-
@simon I was using an MP4 of 5 mb size. The error just says "internal error" I downloaded the video from here https://www.pexels.com/video/catching-and-releasing-a-big-carp-fish-in-the-lake-5538137/
-
@xsc I've seen a few of those "Internal error" messages too - I think it's Gemini being a little bit flaky, sometimes resubmitting works fine the second time
-
@florenciocano @djh @simon
Just not very useful for solving maths problems that haven't already been solved and scraped into the training data
https://youtu.be/8_Nr5oKIAmI
And students are supposedly using this to cheat on their homework? -
@bornach @florenciocano @djh media right - LLMs are notoriously bad at math (and logic puzzles too)
-
@simon I was using the following command
> llm 'please explain what is happening in the video' -a man-in-water.mp4 -m gemini-1.5-flash-latest
Does it look like it should work?
-
@xsc yes, if you have the llm-gemini plugin installed and configured with an API key
You could try using this script here (or using Google's AI Studio tool) ti check it's not an LLM bug: https://til.simonwillison.net/llms/prompt-gemini