SQL schemas should have documentation strings

One thing I spend a lot of time doing at work is picking apart applications written by people who have come before me and left leaving nary a trace of their intentions. Software developers should write more comments. It’s either a form of hubris to think that you will be the last person to see your code and you will never need to reread it, or a form laziness to simply not care and take the time to type some simple, high-level documentation into your code.

However, occasionally it’s not the developer’s fault that there are no comments. Sometimes commenting just doesn’t make as much sense. One such place is in SQL schemas. SQL tables and views are generally defined by typing in some SQL code at a command prompt. Alternatively the code can be written to a .sql file and run as a script. In either case, comments can be added through the standard “/* */” syntax available in C, but the comments aren’t saved with the schema in the database. If one saves the .sql file it can be referred to, but only if a .sql file was used and it was saved, which is often not the case. So when exploring a new database for the first time, one is often reduced to divining how it worked by reading table and column names and parsing the source code which is supposed to make use of it.

This is a pain in the ass. Sure, everyone would like to think that their database schema is simple and self documenting; that each table has a descriptive name saying exactly what its purpose is. The thing is, everyone is usually wrong about that. When a schema is first created, it might be simple and clear, but as time goes on it generally accumulates cruft, like temporary tables meant to solve a short term problem but never removed, deprecated tables and views that haven’t been removed because of the hassle of removing a few lingering queries in the code, denormalized views, etc, all of which complicates the schema. One of the most annoying, is a temporary table which was named something like “temp_2009” but was later changed to a permanent part of the database *but still has the temp name because it’s too much hassle to refactor the entire application to use a different one. If one doesn’t already know what’s going on, it can be difficult to parse an old schema.

A simple solution would be to just add documentation strings to SQL. In some programming languages such as Lisp or Smalltalk, one can assign a special strings to any object which explicates that object’s purpose. The strings can be queried at run time so one doesn’t have to dig up the source file to know the purpose of a function, macro, or variable. Why this is not a feature in the SQL standard is beyond me. It would be so simple to store documentation strings in the table which already stores table and schema information in most SQL databases. It would involve literally adding only one column to said tables. Better yet, store documentation strings for tables, views, and optionally for each column of each table. Make the adding of the string part of the table definition:

$ create table foo "This table contains foos" (
    alpha varchar(25) "The first field of a foo",
beta numeric "The second field of a foo"
);
=> Table created.

$ documentation foo
=> "This table contains foos"

And everything is set. We can now specify, in English what a table is used for, and how it’s used, and what part of the application uses it, without future developers having to resort to grepping for everything. Even better, this is programmatically accessible so applications can query an fetch these strings as well, which would be useful for database administration applications or for applications like Cost Point which are little more than a thin interface layered over a database schema. Considering how much support SQL offers for validating data, from types to check constraints, a little bit of high level explanation would seem to make sense. As I already said, why this isn’t already a feature, is beyond me.