SQL schemas should have documentation strings


One thing I spend a lot of time doing at work is pick­ing apart appli­ca­tions writ­ten by peo­ple who have come before me and left leav­ing nary a trace of their inten­tions. Soft­ware devel­op­ers should write more com­ments. It’s either a form of hubris to think that you will be the last per­son to see your code and you will never need to reread it, or a form lazi­ness to sim­ply not care and take the time to type some sim­ple, high­-level doc­u­men­ta­tion into your code.

How­ev­er, occa­sion­ally it’s not the devel­op­er’s fault that there are no com­ments. Some­times com­ment­ing just does­n’t make as much sense. One such place is in SQL schemas. SQL tables and views are gen­er­ally defined by typ­ing in some SQL code at a com­mand prompt. Alter­na­tively the code can be writ­ten to a .sql file and run as a script. In either case, com­ments can be added through the stan­dard “/* */” syn­tax avail­able in C, but the com­ments aren’t saved with the schema in the data­base. If one saves the .sql file it can be referred to, but only if a .sql file was used and it was saved, which is often not the case. So when explor­ing a new data­base for the first time, one is often reduced to divin­ing how it worked by read­ing ta­ble and col­umn names and pars­ing the source code which is sup­posed to make use of it.

This is a pain in the ass. Sure, every­one would like to think that their data­base schema is sim­ple and self documenting; that each ta­ble has a descrip­tive name say­ing exactly what its pur­pose is. The thing is, every­one is usu­ally wrong about that. When a schema is first cre­at­ed, it might be sim­ple and clear, but as time goes on it gen­er­ally accu­mu­lates cruft, like tem­po­rary tables meant to solve a short term prob­lem but never removed, dep­re­cated tables and views that haven’t been removed because of the has­sle of remov­ing a few lin­ger­ing queries in the code, denor­mal­ized views, etc, all of which com­pli­cates the schema. One of the most annoy­ing, is a tem­po­rary ta­ble which was named some­thing like “tem­p_2009” but was later changed to a per­ma­nent part of the data­base *but still has the temp name because it’s too much has­sle to refac­tor the entire appli­ca­tion to use a dif­fer­ent one. If one does­n’t already know what’s going on, it can be dif­fi­cult to parse an old schema.

A sim­ple solu­tion would be to just add doc­u­men­ta­tion strings to SQL. In some pro­gram­ming lan­guages such as Lisp or Smalltalk, one can assign a spe­cial strings to any object which expli­cates that objec­t’s pur­pose. The strings can be queried at run time so one does­n’t have to dig up the source file to know the pur­pose of a func­tion, macro, or vari­able. Why this is not a fea­ture in the SQL stan­dard is beyond me. It would be so sim­ple to store doc­u­men­ta­tion strings in the ta­ble which already stores ta­ble and schema infor­ma­tion in most SQL data­bas­es. It would involve lit­er­ally adding only one col­umn to said tables. Bet­ter yet, store doc­u­men­ta­tion strings for tables, views, and option­ally for each col­umn of each table. Make the adding of the string part of the ta­ble definition:

$ create table foo "This table contains foos" (
    alpha varchar(25) "The first field of a foo",
beta numeric "The second field of a foo"
);
=> Table created.

$ documentation foo
=> "This table contains foos"

And every­thing is set. We can now spec­i­fy, in Eng­lish what a ta­ble is used for, and how it’s used, and what part of the appli­ca­tion uses it, with­out future devel­op­ers hav­ing to resort to grep­ping for every­thing. Even bet­ter, this is pro­gram­mat­i­cally acces­si­ble so appli­ca­tions can query an fetch these strings as well, which would be use­ful for data­base admin­is­tra­tion appli­ca­tions or for appli­ca­tions like Cost Point which are lit­tle more than a thin inter­face lay­ered over a data­base schema. Con­sid­er­ing how much sup­port SQL offers for val­i­dat­ing data, from types to check con­straints, a lit­tle bit of high level expla­na­tion would seem to make sense. As I already said, why this isn’t already a fea­ture, is beyond me.

    Last update: 16/05/2013

    blog comments powered by Disqus