I need to build a big database of video filenames.

anne.💫

How can do?

Riley S. Faelan

@ann3nova How big, and what sort of queries do you want to run them?

FWIW,on a modern computer, doing a lot of what woud have been "databases" twenty years ago caplement in make sense to im JSON/YAML instead, and just do linear scans through them.

anne.💫

@riley That kinda sounds more what I'm wanting to do.

I would be grateful for any suggestions/tutorials.

Riley S. Faelan

@ann3nova Which programming language you use?

In Ruby, I prefer to parse a YAML file like this:

require 'yaml'

...

$data = open 'my-fancy-database.yaml', 'r' do |f|
  YAML.load f.read
end

but for the most part, especially if you don't have to deal with the pesky text/binary file distinction of Windows (which, with YAML, you may be able to bypass anyway), you can often get away with just

require 'yaml'

...

$data = YAML.load File.read 'my-fancy-database.yaml'

Depending on what your growth strategy looks like, you might, at some point, decide to switch over to Marshal#dump(), which is pretty much the same idea, only using a binary representation and a faster implementation of it, or alternatively, SQLite. Which, of course, doesn't mean that you'll have to go all in with relational database theory — you can just split your big dataset up into smaller items, to be queried via SQLite. But then again, depending on how you can cut your dataset up, it might be workable to keep a couple dozens of YAML or JSON files around, and use their names as their "primary indices".

Ölbaum

@ann3nova @riley I agree with Riley, without knowing what you’d like to accomplish, it’s difficult to advise you. Do you want a GUI? Command line tool? Plain files? There’s a tool that lets you query and join CSV files with SQL. It achieves this by seamlessly importing them into an SQLite database. It’s pretty cool and I can try to find it again.

Riley S. Faelan

@oscherler I think there's a CPAN module that does SQL queries directly on a bunch of CSV/TSV files. I used it many years ago for a university project.

Of course, you won't have any indices this way, so it's not great for performance, but it might help with system simplicity, depending on what you're doing. Consider that the classic Unix /etc/ is full of small tables of various sorts; they just don't have a single format; each one has its idiosyncrasies. If one was to build a roughly similar system, then having a primitive SQL layer over them, and a common table syntax, might come in quite handy.

@ann3nova