Can modern screen readers read academic papers that are published as two column PDFs? Do they know how to separate out the two columns?
-
Can modern screen readers read academic papers that are published as two column PDFs? Do they know how to separate out the two columns?
-
Raphael Fetzer :kirby:replied to Simon Willison last edited by
@simon You can specify the reading order in a PDF document so the screen reader can follow it and doesn’t need to guess.
-
Simon Willisonreplied to Raphael Fetzer :kirby: last edited by
@pheraph that’s reassuring! Do you know if published papers tend to do that? Any way for me to tell if this one works properly? https://storage.googleapis.com/gweb-research2023-media/pubtools/1004848.pdf
-
Simon Willisonreplied to Simon Willison last edited by
As an experiment I downloaded the two column PDF of this new paper from Google research "SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL" https://research.google/pubs/sql-has-problems-we-can-fix-them-pipe-syntax-in-sql/
... and uploaded it to Google AI Studio and told Gemini Pro 1.5 "Convert this document to neatly styled semantic HTML" - and the results were pretty good! https://static.simonwillison.net/static/2024/Pipe-Syntax-In-SQL.html
-
@simon I'd be really worried about both hallucination and prompt injection when using an LLM for document conversion, as an accessibility tool for blind or other disabled users. But the tools I've tried on this paper do worse than what you got out of Gemini.
-
@matt yeah, me too. The responsible way to do this would be to use Gemini Pro to create the first draft, then spend significant time and effort checking and verifying it, iterating on the prompts, porting across the figures etc