Wikipedia:Reference desk/Archives/Computing/2014 September 23

From Wikipedia, the free encyclopedia
Computing desk
< September 22 << Aug | September | Oct >> September 24 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


September 23[edit]

If I invoke head -1 *.txt, I get something like

==> a.txt <==
file a line 1

==> b.txt <==
file b line 1

==> c.txt <==
file c line 1

But if I invoke the very similar-looking tail -1 *.txt, I am treated to the computing equivalent of a raspberry:

tail: option used in invalid context -- 1

Now, I do know most of the story here:

  1. the Posix-approved way to specify a number of lines other than the default 10 is -n 1, not -1
  2. the backwards-compatibility hack that lets you still use the old -1 notation is now only allowed on one file
  3. if you're a good little do-bee and use the new -n 1 notation, you then are allowed to do it on multiple files at once

What I don't quite remember is, why??? Why does Posix mandate this peculiar behavior? What purpose is served by restricting the obsolete notation to a single filename argument? If I can still get away with the old notation and multiple files with head, why can I not with tail? Is there an environment variable (sort of the inverse of POSIXLY_CORRECT) that will let GNU tail under Linux behave sensibly? (It does seem to behave sensibly on my Mac, where the shell utilities are I believe more bsdish.) I'm getting tired of having to type

for f in *.txt; do echo "==> $f <=="; tail -1 $f; echo; done

whenever this happens; must I retrain my fingers (after all these years) to use -n, or is there a better way? —Steve Summit (talk) 21:57, 23 September 2014 (UTC)[reply]

P.S. Despite my admittedly somewhat inflammatory tone, I am really just looking for an answer or two here; I am not trying to incite any flamewars. —scs

Nice rant :-) The -count option is considered obsolete; I suppose obsolete versions of head and tail must have been asymmetric, but it would be nice to see the history of that. I tend to use, e.g.,
  head -n13 *.txt
  tail -n13 *.txt
It is just one extra letter and is easily learned. --Mark viking (talk) 22:28, 23 September 2014 (UTC)[reply]
Or you could write a script named "tail" and put it in a directory that's in your PATH before /bin; it would just parse out the option arguments in the same way that standard "tail" does, identify the presence of a -count that's an option and not a filename, convert it to -n count, and invoke the real "tail". (Hmm. And now that I've thought of that, I'm tempted not only to do it for myself, but to do the same thing for "sort", so I never have to use the blasted -k option again.) --65.94.51.64 (talk) 22:58, 23 September 2014 (UTC)[reply]
Interesting you should mention sort and -k. Mark viking asserts that tail -n is "easily learned", but I wasn't kidding about the for f in *.txt invocation, which I've been using for years whenever I'm confronted with a Posix-compliant tail(1); I literally only learned that -n specifically was the approved alternative today. I did finally "learn" sort -k a year or so ago, when too many of the systems I used stopped supporting the admittedly clumsy old 0-based +# options that my fingers were used to, but dammit, my fingers were used to them, and I still have to slow down to about half my usual typing speed to stutter out that "correct" -k. We're dealing with lower spinal cord reflexes, here; it's like asking Mario Andretti to race in a car where you've gratuitously reversed the brake and clutch pedals, a rearrangement which is also easily learned.
But, anyway, back to the questions. Why does Posix mandate this peculiar behavior? Why is it this way for tail and not head? Is there an environment variable that will let GNU tail under Linux behave sensibly? Steve Summit (talk) 23:15, 23 September 2014 (UTC)[reply]
Here are the POSIX.1-2008 specifications: head, tail. No behavior is defined for tail with more than one file argument, regardless of the options. So there is a gratuitous inconsistency between head and tail in the spec, but it isn't the one you noticed in the GNU implementations; POSIX doesn't require that tail -# fail with more than one file or that tail -n# succeed. head, for its part, doesn't support the -c option or negative counts to count from the end of the file, even though they would both make as much sense as they do with tail. The specs are surprisingly different. I can only assume that early UNIX developers added features to the command-line utilities as they needed them, with no grand vision, and POSIX chose to document the behavior rather than try to make it consistent. The head page says "There is no -c option because it is not historical practice and because other utilities in this volume of POSIX.1-2008 provide similar functionality", but the other differences are not explained.
The documentation for GNU tail is here. The illegality of -# with more than one file is mentioned but not explained. There's no apparent way to change this behavior. Setting _POSIX2_VERSION doesn't seem to work (not surprising, as the multi-file syntax was never in any version of POSIX). Even sillier is that +# does work with multiple files. You could report this as a bug, but I suspect they'd just close it since they're not technically doing anything wrong. Older versions of GNU tail are documented as supporting multiple files with -# ([1]), so someone apparently made a deliberate choice to break backward compatibility. You might be able to figure out why from old mailing list archives or commit messages. -- BenRG (talk) 03:38, 24 September 2014 (UTC)[reply]
can't explain these geeky historical nuances, but let me get this straight. you'd rather type, what, fifty extra characters for that on-the-fly for loop to run tail on multiple files than just learn and use the right -n option? nothing personal, but that's some kind of stubbornness! 71.174.182.157 (talk) 11:51, 24 September 2014 (UTC)[reply]
Thanks very much for all that research, Ben. I knew the situation was weird, but I didn't know it was that weird. I don't know why I thought Posix mandated the peculiar behavior, but I'm glad to have that misbelief dispelled.
It appears that the main issue is that someone was worried about the gradual erosion of legal filenames implied by tail's even-more-nonstandard +# option, but the deprecation of that form seems to have swept up -# in the same net. (Me, I'd say that anyone who's crazy enough to give their files names like "+5" and "-5" is just going to have to know how to call them "./+5" and "./-5" when necessary, and once they do, it doesn't matter if head or tail or any other given program otherwise can't accept them as filenames. Or, oh yeah, speaking of modern standards, you can always use -- to signal an end to your option flags and the beginning of pure filename arguments.)
65: A shell script wrapper is a good idea -- lord knows my personal bin directory is littered with various bandaids on top of standard utilities that have become broken evolved against my wishes over the years -- but in this case I think I'll just download a copy of GNU coreutils and fix it instead. (The ability to do so is, after all, one of the heralded benefits of open source software!)
Finally, 71, yes, I can be pretty stubborn about such things, but it's like this: I guess I don't try to tail multiple files with an other-than-10 line count that often, and anyway there are plenty of versions of tail floating around that do behave sensibly even for the problematic case(s), but whenever I've encountered one that didn't, I was probably always in the middle of solving some real problem such that, yes, it was incrementally expedient to just fire off the one-liner than embark on the yak shaving exercise of discovering what the strange fine new Standard option is. —Steve Summit (talk) 22:53, 25 September 2014 (UTC)[reply]