I’m going to be expand­ing on something I’ve talked about before. This idea of Unix’s sup­posed sim­plic­ity and how Unix has devi­ated over the years rather fas­ci­nates me.

Some years ago I remem­ber read­ing the The Art of Unix Programming by the vocif­er­ous Eric Ray­mond. I remem­ber this book mak­ing a strong impact on how a I thought about sys­tem design and the writ­ing of new pro­grams. TAoUP is not, by itself, a rev­o­lu­tion­ary book. Rather, it is a col­lec­tion of received wis­dom regard­ing the design of the Unix oper­at­ing sys­tem and of pro­grams intended to be run in the Unix envi­ron­ment. I think that the most impor­tant idea put for­ward in the book is the notion of Unix, rather than sim­ply being a plat­form on which to run large com­pli­cated pro­grams, is rather a col­lec­tion of smaller pro­grams, uni­fied by a few metaphors. Specif­i­cal­ly, the notion that ‘ev­ery­thing is a file’ and the pipe metaphor which is built in top of that are the glue which holds Unix togeth­er. Unix is a col­lec­tion of small pro­grams, each of which does one thing well, and these pro­grams can be uni­fied togeth­er, using a com­bi­na­tion of pipes and shell script­ing, to cre­ate far more com­pli­cated and func­tional sys­tems. Hav­ing small, sim­ple pro­grams, makes them eas­ier to debug and get right, while being able to com­pose them give appli­ca­tion devel­op­ers a toolkit denied to devel­op­ers on other sys­tems.

This mode of devel­op­ment is a fun­da­men­tally sound idea, I think. There are a few draw­backs, such as the inci­den­tal com­plex­ity of hav­ing so many lit­tle tools to work with, as well as the fact that each one used become yet another depen­dency to man­age in your application.1 Gen­er­al­ly, though, com­pos­ing larger pro­grams out of sim­pler parts is a fun­da­men­tal prin­ci­ple of soft­ware devel­op­ment. Arguable, the major­ity of advance­ment in pro­gram­ming lan­guage design has been in find­ing new and bet­ter ways of doing just this. First we intro­duced the sub­rou­tine, then struc­tured pro­gram­ming had the pro­ce­dure, OOP came up with the object, and now we’re talk­ing about using func­tions. Each step of the way we ended with improve­ments to our abil­ity to com­pose pro­grams out of smaller parts of pro­grams and to reuse old code. Con­cepts like poly­mor­phism and code reuse are core to pro­gram­ming in gen­er­al.

So, hav­ing the abil­ity to reuse things like grep and sed, in con­junc­tion with what­ever small appli­ca­tions you write using pipes and FIFOs and what­not is an obvi­ous Good Thing™. Yet, as pow­er­ful as the abstrac­tions and metaphors which Unix pro­vides are, they aren’t nearly as pow­er­ful as the abstrac­tions pro­vided by ‘ac­tu­al’ pro­gram­ming lan­guages. Fur­ther­more, as I men­tioned above, with con­stant inno­va­tion in pro­gram­ming lan­guage design, the gap is widen­ing.

I remem­ber at one point attempt­ing to work with the Scheme Shell (Scsh). I was inter­ested in Lisp at the time and Scsh seems like a neat idea. I ended up find­ing it a lit­tle imprac­ti­cal for my uses (I was look­ing for an inter­ac­tive shell and scsh is def­i­nitely not that,) but read­ing Olin Shiv­er’s whitepa­per on his shell A Scheme Shell was illu­mi­nat­ing, this quote especially:

The really com­pelling advan­tage of shell lan­guages over other pro­gram­ming lan­guages is the first one men­tioned above. Shells pro­vide a pow­er­ful nota­tion for con­nect­ing processes and files togeth­er. In this respect, shell lan­guages are extremely well-adapted to the gen­eral par­a­digm of the Unix oper­at­ing sys­tem. In Unix, the fun­da­men­tal com­pu­ta­tional agents are pro­grams, run­ning as processes in indi­vid­ual address spaces. These agents coop­er­ate and com­mu­ni­cate among them­selves to solve a prob­lem by com­mu­ni­cat­ing over directed byte streams called pipes. Viewed at this lev­el, Unix is a data-flow archi­tec­ture. From this per­spec­tive, the shell serves a crit­i­cal role as the lan­guage designed to assem­ble the indi­vid­ual com­pu­ta­tional agents to solve a par­tic­u­lar task.

As a pro­gram­ming lan­guage, this inter­process “glue” aspect of the shell is its key desire­able fea­ture. This leads us to a fairly obvi­ous idea: instead of adding weak pro­gram­ming fea­tures to a Unix process-­con­trol lan­guage, why not add process invo­ca­tion fea­tures to a strong pro­gram­ming lan­guage?

The key point that Prof. Shiv­ers is get­ting at here, is that shell scripts are pro­grams where the fun­da­men­tal units of com­pu­ta­tion are small pro­grams rather than func­tions or pro­ce­dures. On the one hand shell script­ing is hugely pow­er­ful because the abil­ity to com­pose appli­ca­tions uni­fies your entire sys­tem rather than just the com­po­nents in your favorite lan­guage and its avail­able libraries. On the other hand, shell script­ing is fun­da­men­tally flawed because the abstrac­tions gen­er­ally avail­able in shell scripts are infe­rior to those in proper pro­gram­ming lan­guages. Prof. Shiv­er’s attempted to unify these worlds by writ­ing Scsh, which adds Unix shell fea­tures to a proper pro­gram­ming lan­guage, Scheme.

Ulti­mate­ly, although Scsh gets some use it has­n’t actu­ally sup­planted the Bourne Shell.2 That’s not to say, how­ev­er, that shell script­ing has­n’t actu­ally been sup­plant­ed. Out­side of a few tra­di­tional uses such as init scripts,3 most of what was once done with Bourne or Bash now tends to be done with some­thing like Perl or one of its spir­i­tual suc­ces­sors such as Ruby or Python. Ulti­mate­ly, Olin Shiv­er’s plan to instead of adding weak pro­gram­ming fea­tures to a Unix process-­con­trol lan­guage, … add process invo­ca­tion fea­tures to a strong pro­gram­ming language worked. It’s just that Perl beat him to the punch. Though, I think that there is more to Per­l’s suc­cess than just that.

Those of you who know their history,4 remem­ber that Perl started life as an improved ver­sion of Awk. Its pro­gram­matic under­pin­nings were not that strong, at least at first. They did improve over time. The thing that really pushed Perl over the edge though, was­n’t the intro­duc­tion of objects or ref­er­ences, but the intro­duc­tion of modules. You see, the real prob­lem with shell script­ing isn’t the inad­e­quacy of the Bash equiv­a­lent of a for loop, but the lim­i­ta­tions of using pipes and other Unix forms of IO redi­rec­tion to com­pose pro­grams. Perl mod­ules solved this prob­lem.

To illus­trate what I mean, try this thought experiment: How does one inter­act with a rela­tional data­base from a script using only the nor­mal Unix metaphors? There were a num­ber of approaches depend­ing on the tools your data­base hap­pened to provide, but com­monly what you might do was write a script with Expect. Expect is a DSL for driving shells and other ter­mi­nal appli­ca­tions. You’d start up your data­base shell and your script would sort of talk to it by watch­ing for strings cor­re­spond­ing to prompt and out­puts to appear. If this sounds error prone, yes it was and so were the other options of send­ing SQL strings directly and pars­ing the out­put man­u­ally using a com­bi­na­tion of Awk and Sed. If you’ve done any pro­gram­ming with a mod­ern pro­gram­ming lan­guage how­ev­er, you’ll note that work­ing with data­bases absolutely does not work this way any­more. Instead your lan­guage likely has a plug­gable API. Perl in par­tic­u­lar has DBI, which defines a sta­ble API against which a pro­gram­mer can write Perl scripts. The data­base can be inter­acted with at the level of a pro­gram­ming lan­guage rather than the level of text stream and this is a very good thing. A well defined and sta­ble binary API nearly always beats a poorly defined text API.5

And that’s what Perl pro­vides that makes it so pop­u­lar. In fact, it’s famous for this. Using CPAN, allows one to spend one’s time com­pos­ing mod­ules rather than appli­ca­tion, with­out in turn cut­ting one off from those appli­ca­tions if need­ed. Perl works fine as a tool for com­pos­ing appli­ca­tions, and actu­ally inte­grates that use into its syn­tax bet­ter than nearly any other pro­gram­ming lan­guage, but the fact that later script­ing lan­guages de-em­pha­size this fea­ture sug­gests to me that the com­puter com­mu­nity in gen­eral has implic­itly accepted my premise: Composing mod­ules with pro­ce­dure calls is supe­rior to com­pos­ing pro­grams with IO redirection. The adop­tion of Perl sig­ni­fied a huge shift away from the default metaphors of the Unix envi­ron­ment and later script­ing lan­guages have con­tin­ued the shift. Those metaphors where fine for a while, even rev­o­lu­tion­ary, but they even­tu­ally proved inad­e­quate. Attempts to improve upon then, such as Plan 9 from Bell Labs, and its suc­ces­sors, ulti­mately failed to gain trac­tion, but the use of Perl lay­ered on top of those Unix abstrac­tions, won out in the end.

That said, there is one sig­nif­i­cant dis­ad­van­tage to this approach: There is a lot of dupli­cate effort in the mod­ules between dif­fer­ent pro­gram­ming lan­guages and libraries. Some of this is unavoid­able, and one often wants an API adapted to the seman­tics of his cho­sen pro­gram­ming lan­guage and some of it illu­so­ry, see­ing as many mod­ules are really just FFI bind­ings to a lower level C-li­brary, but much of it is real. As a result, we’ve lost one of the chief advan­tages of the old Unix way: gen­uine poly­glot pro­gram­ming. Pipes don’t care in what lan­guage your appli­ca­tion is writ­ten. This of course leads to an oth­er­wise unnec­es­sary sep­a­ra­tion of effort as well as an increase in com­plex­ity of a com­plete sys­tem in terms of num­ber of mov­ing parts. It’s a small cost I think, given the gain, but an unnec­es­sary one if we could some­how make a num­ber of major changes to Unix, stan­dard­iz­ing the inputs and out­puts of pro­grams. I think I’ll rumi­nate about more at a later date.

