Unix, Awk, Perl, and Scsh

I’m going to be expand­ing on something I’ve talked about before. This idea of Unix’s sup­posed sim­plic­ity and how Unix has devi­ated over the years rather fas­ci­nates me.

Some years ago I remem­ber read­ing the The Art of Unix Programming by the vocif­er­ous Eric Ray­mond. I remem­ber this book mak­ing a strong impact on how a I thought about sys­tem design and the writ­ing of new pro­grams. TAoUP is not, by itself, a rev­o­lu­tion­ary book. Rather, it is a col­lec­tion of received wis­dom regard­ing the design of the Unix oper­at­ing sys­tem and of pro­grams intended to be run in the Unix envi­ron­ment. I think that the most impor­tant idea put for­ward in the book is the notion of Unix, rather than sim­ply being a plat­form on which to run large com­pli­cated pro­grams, is rather a col­lec­tion of smaller pro­grams, uni­fied by a few metaphors. Specif­i­cal­ly, the notion that ‘ev­ery­thing is a file’ and the pipe metaphor which is built in top of that are the glue which holds Unix togeth­er. Unix is a col­lec­tion of small pro­grams, each of which does one thing well, and these pro­grams can be uni­fied togeth­er, using a com­bi­na­tion of pipes and shell script­ing, to cre­ate far more com­pli­cated and func­tional sys­tems. Hav­ing small, sim­ple pro­grams, makes them eas­ier to debug and get right, while being able to com­pose them give appli­ca­tion devel­op­ers a toolkit denied to devel­op­ers on other sys­tems.

This mode of devel­op­ment is a fun­da­men­tally sound idea, I think. There are a few draw­backs, such as the inci­den­tal com­plex­ity of hav­ing so many lit­tle tools to work with, as well as the fact that each one used become yet another depen­dency to man­age in your application.1 Gen­er­al­ly, though, com­pos­ing larger pro­grams out of sim­pler parts is a fun­da­men­tal prin­ci­ple of soft­ware devel­op­ment. Arguable, the major­ity of advance­ment in pro­gram­ming lan­guage design has been in find­ing new and bet­ter ways of doing just this. First we intro­duced the sub­rou­tine, then struc­tured pro­gram­ming had the pro­ce­dure, OOP came up with the object, and now we’re talk­ing about using func­tions. Each step of the way we ended with improve­ments to our abil­ity to com­pose pro­grams out of smaller parts of pro­grams and to reuse old code. Con­cepts like poly­mor­phism and code reuse are core to pro­gram­ming in gen­er­al.

So, hav­ing the abil­ity to reuse things like grep and sed, in con­junc­tion with what­ever small appli­ca­tions you write using pipes and FIFOs and what­not is an obvi­ous Good Thing™. Yet, as pow­er­ful as the abstrac­tions and metaphors which Unix pro­vides are, they aren’t nearly as pow­er­ful as the abstrac­tions pro­vided by ‘ac­tu­al’ pro­gram­ming lan­guages. Fur­ther­more, as I men­tioned above, with con­stant inno­va­tion in pro­gram­ming lan­guage design, the gap is widen­ing.

I remem­ber at one point attempt­ing to work with the Scheme Shell (Scsh). I was inter­ested in Lisp at the time and Scsh seems like a neat idea. I ended up find­ing it a lit­tle imprac­ti­cal for my uses (I was look­ing for an inter­ac­tive shell and scsh is def­i­nitely not that,) but read­ing Olin Shiv­er’s whitepa­per on his shell A Scheme Shell was illu­mi­nat­ing, this quote especially:

The really com­pelling advan­tage of shell lan­guages over other pro­gram­ming lan­guages is the first one men­tioned above. Shells pro­vide a pow­er­ful nota­tion for con­nect­ing processes and files togeth­er. In this respect, shell lan­guages are extremely well-adapted to the gen­eral par­a­digm of the Unix oper­at­ing sys­tem. In Unix, the fun­da­men­tal com­pu­ta­tional agents are pro­grams, run­ning as processes in indi­vid­ual address spaces. These agents coop­er­ate and com­mu­ni­cate among them­selves to solve a prob­lem by com­mu­ni­cat­ing over directed byte streams called pipes. Viewed at this lev­el, Unix is a data-flow archi­tec­ture. From this per­spec­tive, the shell serves a crit­i­cal role as the lan­guage designed to assem­ble the indi­vid­ual com­pu­ta­tional agents to solve a par­tic­u­lar task.

As a pro­gram­ming lan­guage, this inter­process “glue” aspect of the shell is its key desire­able fea­ture. This leads us to a fairly obvi­ous idea: instead of adding weak pro­gram­ming fea­tures to a Unix process-­con­trol lan­guage, why not add process invo­ca­tion fea­tures to a strong pro­gram­ming lan­guage?

The key point that Prof. Shiv­ers is get­ting at here, is that shell scripts are pro­grams where the fun­da­men­tal units of com­pu­ta­tion are small pro­grams rather than func­tions or pro­ce­dures. On the one hand shell script­ing is hugely pow­er­ful because the abil­ity to com­pose appli­ca­tions uni­fies your entire sys­tem rather than just the com­po­nents in your favorite lan­guage and its avail­able libraries. On the other hand, shell script­ing is fun­da­men­tally flawed because the abstrac­tions gen­er­ally avail­able in shell scripts are infe­rior to those in proper pro­gram­ming lan­guages. Prof. Shiv­er’s attempted to unify these worlds by writ­ing Scsh, which adds Unix shell fea­tures to a proper pro­gram­ming lan­guage, Scheme.

Ulti­mate­ly, although Scsh gets some use it has­n’t actu­ally sup­planted the Bourne Shell.2 That’s not to say, how­ev­er, that shell script­ing has­n’t actu­ally been sup­plant­ed. Out­side of a few tra­di­tional uses such as init scripts,3 most of what was once done with Bourne or Bash now tends to be done with some­thing like Perl or one of its spir­i­tual suc­ces­sors such as Ruby or Python. Ulti­mate­ly, Olin Shiv­er’s plan to instead of adding weak pro­gram­ming fea­tures to a Unix process-­con­trol lan­guage, … add process invo­ca­tion fea­tures to a strong pro­gram­ming language worked. It’s just that Perl beat him to the punch. Though, I think that there is more to Per­l’s suc­cess than just that.

Those of you who know their history,4 remem­ber that Perl started life as an improved ver­sion of Awk. Its pro­gram­matic under­pin­nings were not that strong, at least at first. They did improve over time. The thing that really pushed Perl over the edge though, was­n’t the intro­duc­tion of objects or ref­er­ences, but the intro­duc­tion of modules. You see, the real prob­lem with shell script­ing isn’t the inad­e­quacy of the Bash equiv­a­lent of a for loop, but the lim­i­ta­tions of using pipes and other Unix forms of IO redi­rec­tion to com­pose pro­grams. Perl mod­ules solved this prob­lem.

To illus­trate what I mean, try this thought experiment: How does one inter­act with a rela­tional data­base from a script using only the nor­mal Unix metaphors? There were a num­ber of approaches depend­ing on the tools your data­base hap­pened to provide, but com­monly what you might do was write a script with Expect. Expect is a DSL for driving shells and other ter­mi­nal appli­ca­tions. You’d start up your data­base shell and your script would sort of talk to it by watch­ing for strings cor­re­spond­ing to prompt and out­puts to appear. If this sounds error prone, yes it was and so were the other options of send­ing SQL strings directly and pars­ing the out­put man­u­ally using a com­bi­na­tion of Awk and Sed. If you’ve done any pro­gram­ming with a mod­ern pro­gram­ming lan­guage how­ev­er, you’ll note that work­ing with data­bases absolutely does not work this way any­more. Instead your lan­guage likely has a plug­gable API. Perl in par­tic­u­lar has DBI, which defines a sta­ble API against which a pro­gram­mer can write Perl scripts. The data­base can be inter­acted with at the level of a pro­gram­ming lan­guage rather than the level of text stream and this is a very good thing. A well defined and sta­ble binary API nearly always beats a poorly defined text API.5

And that’s what Perl pro­vides that makes it so pop­u­lar. In fact, it’s famous for this. Using CPAN, allows one to spend one’s time com­pos­ing mod­ules rather than appli­ca­tion, with­out in turn cut­ting one off from those appli­ca­tions if need­ed. Perl works fine as a tool for com­pos­ing appli­ca­tions, and actu­ally inte­grates that use into its syn­tax bet­ter than nearly any other pro­gram­ming lan­guage, but the fact that later script­ing lan­guages de-em­pha­size this fea­ture sug­gests to me that the com­puter com­mu­nity in gen­eral has implic­itly accepted my premise: Composing mod­ules with pro­ce­dure calls is supe­rior to com­pos­ing pro­grams with IO redirection. The adop­tion of Perl sig­ni­fied a huge shift away from the default metaphors of the Unix envi­ron­ment and later script­ing lan­guages have con­tin­ued the shift. Those metaphors where fine for a while, even rev­o­lu­tion­ary, but they even­tu­ally proved inad­e­quate. Attempts to improve upon then, such as Plan 9 from Bell Labs, and its suc­ces­sors, ulti­mately failed to gain trac­tion, but the use of Perl lay­ered on top of those Unix abstrac­tions, won out in the end.

That said, there is one sig­nif­i­cant dis­ad­van­tage to this approach: There is a lot of dupli­cate effort in the mod­ules between dif­fer­ent pro­gram­ming lan­guages and libraries. Some of this is unavoid­able, and one often wants an API adapted to the seman­tics of his cho­sen pro­gram­ming lan­guage and some of it illu­so­ry, see­ing as many mod­ules are really just FFI bind­ings to a lower level C-li­brary, but much of it is real. As a result, we’ve lost one of the chief advan­tages of the old Unix way: gen­uine poly­glot pro­gram­ming. Pipes don’t care in what lan­guage your appli­ca­tion is writ­ten. This of course leads to an oth­er­wise unnec­es­sary sep­a­ra­tion of effort as well as an increase in com­plex­ity of a com­plete sys­tem in terms of num­ber of mov­ing parts. It’s a small cost I think, given the gain, but an unnec­es­sary one if we could some­how make a num­ber of major changes to Unix, stan­dard­iz­ing the inputs and out­puts of pro­grams. I think I’ll rumi­nate about more at a later date.

  1. This is somewhat mitigated by the fact that most of these tools should be available in your base system, of course. 
  2. Or Bash, really, these days. 
  3. But look at systemd and its equivalents! 
  4. Or were active programmers at the time; I don’t judge. 
  5. The one real advantage of a text API is that when it’s poorly defined, you can use standard tools such as a text editor and telnet to inspect it. Reverse engineering a binary API is more difficult. But make no mistake, if you have to read the text output of a program to figure out how to make a script which uses it, you are reverse engineering the API. You have no guarantee that your visual inspection of the program has resulted in a correct understanding of how it works. 

Last update: 19/01/2015

blog comments powered by Disqus