Why I don’t like databases

I am not a big fan of using “real” databases—by which I mean ones with a separate server process, like MySQL, and to a lesser extent inline ones like SQLite or Berkeley DB —for tasks where you don’t have a lot of data. No weblog has that much data. This view normally causes controversy when I come up against one of the LAMP people who use MySQL for everything.
You see, to my mind, a real database process has two things that make it really useful. The first is that it’s got super-optimised access to data. This is important, if you’ve got a lot of data. If you haven’t, then it doesn’t matter anywhere near as much, and indeed you may start finding that the database’s own overheads start to be significant. For example, the difference between grep '^title:' files/*.txt and select title from filelist where filetype = 'txt' (pulling those out of my head) is insignificant over a hundred files, pretty much. Over a hundred thousand, or a hundred million: maybe that would be different. However, no-one’s working with data on those scales on a weblog, as far as I’m aware—the Technorati people and so on probably are, but then I bet they’re using a database.
The second reason that databases are great is SQL. Again, if you’re using complicated SQL queries, then it’s an amazingly powerful tool. If, on the other hand, all your queries are pretty simple ones—select name,time,content from articles where year = 2004 and month = 01—then the overhead of setting up and keeping running a database is a lot to pay to avoid the equivalent file management statement. This is why I like and use pyblosxom: it fits this philosophy down to a tee. Use the filesystem as your database: to add a new entry, just drop a file in a directory. Easy. You don’t need a special interface to do it, you can go in and edit files any time you like.

More in the discussion (powered by webmentions)

  • (no mentions, yet.)