"Filesystems corrupt data"

Mountains

He that is first in his own cause seemeth just; but his neighbour cometh and searcheth him.

Proverbs 18:17, The Bible

The other day I was going through different blog posts on my RSS feed reader, minding my business, when I came across an article called Why do We Need Databases and SQL. I have already made a post about SQL but I figured it would be nice to see if the author had any legitimate points. Let's dive in. ๐ŸŠ

The claim

Reading the article was going well at first. He talked about how using files can work for your database, but as you get more users, thousands of people could be performing the same tasks simultaneously. This could lead to race conditions, where files can be modified or deleted in unexpected ways. He then claimed:

At this point, the simplicity of files becomes fragile. Imagine one user is updating a task at the exact moment another tries to delete it. Or maybe two users are editing the same task at the same time.

With a simple file system, you're likely to end up with corrupted or lost data because there's no inherent mechanism to handle such conflicts.

Hearing this claim got me thinking, "Can the problem of corrupted or lost data only be solved by an inherent mechanism to handle such conflicts? Has all hope been lost for the ubiquitous filesystem?"

The truth is that I never really knew how the file system would handle those cases. So I googled. I had to get to the bottom of it.

Google confirmed that his claim was true. But is SQL our only savior? Are we doomed without our precious president of data?

The investigation

On further googling, I came about an important concept called file locking:

File locking is a mechanism that restricts access to a computer file, or to a region of a file, by allowing only one user or process to modify or delete it at a specific time and to prevent reading of the file while it's being modified or deleted.

File locking, Wikipedia article

Hmm, interesting. That definition sounds a lot like what I need. Further investigation showed that Linux had support for this feature through application like flock, an terminal application for file locking already installed in my computer. flock is used to manage file locks from shell scripts.

There are two types of locking: advisory and mandatory. Mandatory locking supposed to be a system-wide type of locking, but it's not advised because of its shortcomings. Advisory locking requires that all participating scripts work in coordination. This means that every script that writes to or reads from the files in question have to be called through flock. If they make their calls directly however, the OS will not stop the operation.

To demonstrate, imagine I had a script called delete_record.sh on a data file called data_file. I could call it in the terminal with file locking like this:

# First process
$ flock data_file ./update_record.sh
# Runs script

If another process tries to call that same script, it must also use flock, to lock data_file so that it is not used until the first call is done:

# Second process
$ flock data_file ./update_record.sh
# Waits for previous call to delete_record.sh to finish before it executes.

However, if the second process calls the script directly, there is no protection:

# Second process
$ ./update_record.sh
# Doesn't wait for first call to complete. Data integrity is lost.

The process is so simple! Just call all your database-modifying scripts with flock.

What does flock get us over SQL?

The verdict

As we can see (from my calculations above), SQL is still unnecessary in such a scenario. There are old and proven ways to accomplish the many tasks that SQL performs. We as developers are the chefs; we just have to choose the recipe we're most comfortable with.

We must be careful of snake oil and snake-oil salesmen in software development (the Pharisees). They want everybody to put on the same burden as them, even though there are easier ways to manage data and databases.

The bottom line

We need databases, we don't need SQL. Embrace files and filesystems!

Further reading

If you would like to reply to or comment on this blog post, feel free to email me at efe@mmhq.me.