...making Linux just a little more fun!
By Ben Okopnik
Originally published in Issue 55 of Linux Gazette, May 2000
There are two major products that come out of Berkeley: LSD and UNIX. We don't believe this to be a coincidence. -- Jeremy Anderson
MAILPATH='/usr/spool/mail/bfox?"You have mail":~/shell-mail?"$_ has
mail!"'
(He was immediately hired by an Unnamed Company in Silicon Valley for an unstated (but huge) salary... but that's beside the point.)
<Shrug> What the heck; I've already gone parasailing and scuba-diving this month (and will shortly be taking off on a 500-mile sail up the Gulf Stream); let's keep living La Vida Loca!
The built-in parsing capabilities of "bash" are rather minimal as compared to, say, Perl or AWK; in my best estimate, they're not intended for serious processing, just "quick and dirty" minor-task handling. Nevertheless, they can be very handy for that purpose.
As an example, let's say that you need to differentiate between lowercase and capitalized filenames in processing a directory - I ended up doing that with my backgrounds for X, since some of them look best tiled, and others stretched to full-screen size (file size wasn't quite a good-enough guide). I "capped" all the names of the full-sized pics, and "decapped" all the tiles. Then, as part of my random background selector, "bkgr", I wrote the following:
... # We need _just_ the filename fn=$(basename $fnm) # Set the "-max" switch if the test returns 'true' [ -z "${fn##[A-Z]*}" ] && MAX="-max" # Run "xv" with|without the "-max" option based on the test result xv -root -quit $MAX $fnm & ...
Confusing-looking stuff, isn't it? Well, part of it we already
know: the [ -z ... ]
is a test for a zero-length
string. What about the other part, though?
In order to 'protect' our parameter expansion result from the cold, cruel world (e.g., if you wanted to use the result as part of a filename, you'd need the 'protection' to keep it separate from the trailing characters), we use curly brackets to surround the whole enchilada -
$d
is the same as${d}
- except that
the second variety can be combined with other things without losing
its identity, like so:
... d=Digit # "Digitize" echo ${d}ize # "Digital" echo ${d}al # "Digits" echo ${d}s # "Digitalis" echo ${d}alis ...
Now that we have it isolated from the world, friendless and all alone... oops, sorry - that's "shell script", not "horror movie script" - I lose track once in a while... Anyway, now that we've separated the variable out via the curly braces, we can apply a few tools incorporated in Bash to perform some basic parsing. Note that I'm going to show the result after each 'echo' statement as if that statement had been executed.
### Let's pick a nice, longish word to play with. var="amanuensis" ### ${#var} - return length echo ${#var} 10 ### ${var#word} - cut shortest match from start echo ${var#*n} uensis ### ${var##word} - cut longest match from start echo ${var##*n} sis ### ${var%word} - cut shortest match from end echo ${var%n*} amanue ### ${var%%word} - cut longest match from end echo ${var%%n*} ama ### ${var:offset} - return string starting at 'offset' echo ${var:7} sis ### ${var:offset:length} - return 'length' characters starting at 'offset' echo ${var:1:3} man ### ${var/pattern/string} - replace single match echo ${var/amanuen/paralip} paralipsis ### ${var//pattern/string} - replace all matches echo ${var//a/A} AmAnuensis
(For the last two operations, if the pattern begins with #, it will match at the beginning of the string; if it begins with %, it will match at the end. If the string is empty, matches will be deleted.)
There's actually a bit more to it - things like variable indirection, and parsing arrays - but, gee, I guess you'll just have to study that man page yourself. <grin> Just consider this as motivational material.
So, now that we've looked at the tools, let's look back at the
code -[ -z "${fn##[A-Z]*}" ]
Not all that
difficult anymore, is it? Or maybe it is; my thought process, in
dealing with searches and matches, tends to resemble
pretzel-bending. What I did here - and it could be done in a number
of other ways, given the above tools - is match for a max-length
string (i.e., the entire filename) that begins with an uppercase
character. The [ -z ... ]
returns 'true' if the
resulting string is zero-length (i.e., matches the [A-Z]* pattern),
and $MAX is set to "-max".
Note that, since we're matching the entire string,
${fn%%[A-Z]*}
would work just as well. If that seems
confusing - if all of the above seems confusing - I
suggest lots and lots of experimentation to familiarize yourself
with it. It's easy: set a parameter value, and experiment, like so
-
Odin:~$ experiment=supercallifragilisticexpialadocious Odin:~$ echo ${experiment%l*} supercallifragilisticexpia Odin:~$ echo ${experiment%%l*} superca Odin:~$ echo ${experiment#*l} lifragilisticexpialadocious Odin:~$ echo ${experiment##*l} adocious...and so on. It's the best way to get a feel for what a certain tool does; pick it up, plug it in, put on your safety glasses and gently squuueeeze the trigger. Observe all safety precautions as random deletion of valuable data may occur. Actual results may vary and will often surprise you.
joe= ### ${parameter:-word} - If parameter is unset, substitute "word" echo ${joe:-mary} mary echo $joe ### ${parameter:?word} - Display "word" or error if parameter is unset echo ${joe:?"Not set"} bash: joe: Not set echo ${joe:?} bash: joe: parameter null or not set ### ${parameter:=word} - If parameter is unset, set it to "word" echo ${joe:=mary} mary echo $joe mary ### ${parameter:+word} - "word" is substituted only if parameter is set echo ${joe:+blahblah} blahblah
Let's look at what this might involve. Here's a clip of a notional phonebook to be used for the job:
Name | Category | Address | |
Jim & Fanny Friends | Business | 101101 Digital Dr. LA CA | fr@gnarly.com |
Fred & Wilma Rocks | friends | 12 Cave St. Granite, CT | shale@hill.com |
Joe 'Da Fingers' Lucci | Business | 45 Caliber Av. B-klyn NY | tuff@ny.org |
Yoda Leahy-Hu | Friend | 1 Peak Fribourg Switz | warble@sing.ch |
Cyndi, Wendi, & Myndi | Friends | 5-X Rated St. Holiday FL | 3cuties@fl.net |
Whew. This stuff obviously needs to be read in by fields - word counting won't do; neither will a text search. Arrays to the rescue!
#!/bin/bash # 'nlmail' sends the monthly newsletter to friends listed # in the phonebook # "bash" would create the arrays automatically, since we'll # use the 'name[subscript]' syntax to load the variables - # but I happen to like explicit declarations. declare -a name category address email # A little Deep Hackery to make the 'for' loop read a line at a time OLD=$IFS IFS=' ' for line in $(cat phonelist) do # Increment the line counter ((x++)) # Load up the 'name' variable name[$x]="${line:0:25}" # etc. for 'category', category[$x]="${line:25:10}" # etc. for 'address', address[$x]="${line:35:25}" # etc. for 'email'... email[$x]="${line:60:20}" done # Undo the line-parsing magic IFS=$OLD ...At this point, we have the "phonelist" file loaded into the four arrays that we've created, ready for further processing. Each of the fields is easily addressable, thus making the stated problem - that of e-mailing a given file to all my friends - a trivial one (this snippet is a continuation of the previous script):
... for y in $(seq $x) do # We'll look for the word "friend" in the 'category' field, # make it "case-blind", and clip any trailing characters. if [ -z "${category[$y]##[Ff]riend*}" ] then mutt -a Newsletter.pdf -s 'S/V Ulysses News, 6/2000' ${email[$y]} echo "Mail sent to ${name[$y]}" >> sent_list.txt fi doneThat should do it, as well as pasting the recipients' names into a file called "sent_list.txt" - a nice double-check feature that lets me see if I missed anyone.
The array processing capabilities of "bash" extend a bit beyond this simple example. Suffice it to say that for simple cases of this sort, with files under, say, a couple of hundred kB, "bash" arrays are the way to go.
Note that the above script can be easily generalized - as an example, you could add the ability to specify different phone-lists, criteria, or actions, right from the command line. Once the data is broken up into an easily-addressable format, the possibilities are endless...
Until next month -
Happy Linuxing!
"...Yet terrible as UNIX addiction is, there are worse fates. If UNIX is the heroin of operating systems, then VMS is barbiturate addiction, the Mac is MDMA, and MS-DOS is sniffing glue. (Windows is filling your sinuses with lucite and letting it set.) You owe the Oracle a twelve-step program." -- the Usenet Oracle
The "man" pages for 'bash', 'builtins', 'sed', 'mutt'
"Introduction to Shell Scripting - The Basics" by Ben Okopnik, LG #53 "Introduction to Shell Scripting" by Ben Okopnik, LG #54 "Introduction to Shell Scripting" by Ben Okopnik, LG #55 "Introduction to Shell Scripting" by Ben Okopnik, LG #56