Open Computing ``Hands-On'': ``Wizard's Grabbag'' Column:
June, 1994: Listings
Listing 1: The shuffle script processes line-oriented
data, catenating it, then extracting selected lines into
specified files with possible ordering.
A. Listing of the shuffle Korn
shell script:
1 #!/usr/bin/ksh
2 # @(#) shuffle Version 5 A rule-based list processor
3 # Author: Thomas Baker <tbaker@unix.amherst.edu>
4 # Modified by: Becca Thomas, February 1994
5 $DBG_SH # Dormant debugging directive
6
7 trap 'rm -f $Tmpfile $Targetfilenames >|$Devnull 2>&1; \
8 exit $Stat' 0
9 trap 'print -u2 "$(basename $0): Interrupted!"; exit' 1 2 3 15
10
11 # CONFIGURATION
12 Allfiles=combined.dat # File for all catenated input files
13 Bkupdir=.backup # Unix input-files backup directory
14 #Bkupdir=backup # MKS input-files backup directory
15 Devnull="/dev/null" # Unix bit-bucket file
16 Rulefile=.rules # Unix rule file
17 #Rulefile=rules # MKS rule file
18 Usage="Usage: $(basename $0) datafile [datafile ...]" # Correct usage
19 # Temporary directory-dependent variables:
20 Tmpdir=/tmp # MKS/Unix temporary directory
21 #Devnull=$Tmpdir/null # MKS bit-bucket file
22 Targetfilenames=$Tmpdir/sht$$.tmp # MKS/Unix target-names file
23 Tmpfile=$Tmpdir/shf$$.tmp # MKS/Unix temporary work file
24
25 # FUNCTION DEFINITIONS:
26 function usage_exit {
27 print -u2 "$Usage"; Stat=1 ; exit
28 }
29 function movelines { # Args: $Searchkey $Source $Target $Sortcmd
30 print -n "Lines with [$1] moved from \""$2"\" to \""$3"\""
31 egrep "$1" $2 >>$3; egrep -v "$1" $2 >|$Tmpfile; mv $Tmpfile $2
32 [ "$4" ] && print ", ${4}." || print "." # Print sort command
33 [ "$4" ] && { eval $4 -o $3 $3 ||
34 { print "\aBad rule-file sort command: $4"; Stat=2; exit;};}
35 }
36
37 # PROCESS COMMAND-LINE ARGUMENTS:
38 case $# in # User must specify at least one file-name argument
39 0) usage_exit ;;
40 esac
41
42 # SANITY CHECK: Rule file:
43 [ -r $Rulefile ] ||
44 { print -u2 "\aCannot read \"$Rulefile\" file!"; Stat=4; exit;}
45 sed 's/#.*$//' $Rulefile | # Remove comments.
46 egrep -v '^$' | # Remove blank lines.
47 nawk -F\| ' # Rules separated by vertical bar
48 NR == 1 && ($1 != "." || $2 != "$Allfiles") { # Check first rule
49 print $0, ": rule 1 is illegal!" }
50 NF != 3 && NF != 4 { # All rules have 3 or 4 fields.
51 print $0, ": must have 3 or 4 fields!" }
52 $2 == $3 { # Source different from target.
53 print $0, ": source cannot equal target!" }
54 $4 != "" && $4 !~ /^sort/ { # Field 4 is for sort commands.
55 print $0, ": field 4 is only for sort!" }
56 $1 == "" || $2 == "" || $3 == "" { # First three fields are non-empty.
57 print $0, ": 1 of first 3 fields is empty!" }
58 { target[$3] = 1 } # Note names of target files
59 NR > 1 { # For all lines after the first
60 if ($2 in target) # If source file is also a target
61 next; # No problem, fetch next input line
62 else print $0, ": ", $2, "has no precedent!"
63 }' >| $Tmpfile # Save unique lines and display
64 [ -s $Tmpfile ] &&
65 { print -u2 "Bad rule format:\n$(cat $Tmpfile)"; Stat=5; exit;}
66
67 # SANITY CHECKS: Current directory, combined data, backup directory:
68 [ -w "." ] || # Current (data) directory
69 { print -u2 "\aCannot write to current directory!"; Stat=6; exit;}
70 [ -f $Allfiles ] && # Combined data file
71 { print -u2 "\a\"$Allfiles\" should not yet exist!"; Stat=7; exit;}
72 [ -d $Bkupdir ] || mkdir $Bkupdir 2>|$Devnull ||
73 { print -u2 "\aCannot make directory \"$Bkupdir\"!"; Stat=8; exit;}
74 [ "$(ls $Bkupdir)" ] && { # if there are files in backup dir
75 print -n "Okay to erase files in $Bkupdir (y*|Y*/n)? "; read ans
76 case $ans in
77 y*|Y*) rm -f $Bkupdir/* >|$Devnull 2>&1 ;; # Remove old backups
78 *) print "Exiting, check $Bkupdir directory."; Stat=0; exit ;;
79 esac;}
80
81 # CHECK DATA FILES, BACK UP, THEN COMBINE INTO A COMMON FILE:
82 for File in "$@"; do
83 [ -d $File ] && continue # Ignore directories.
84 [ "$File" = "$Rulefile" ] && continue # Ignore rules (just data).
85 [ "$(dirname $File)" = "." ] || [ "$(dirname $File)" = "$PWD" ] ||
86 { print -u2 "\aData files must be in current directory!"
87 Stat=9; exit;}
88 [ -r $File ] ||
89 { print -u2 "\a\"$File\" file not readable."; Stat=10; exit;}
90 { file $File | egrep 'text|empty' >|$Devnull 2>&1;} ||
91 { print -u2 "\a\"$File\" not text nor empty."; Stat=11; exit;}
92 egrep '^[ ]*$' $File >|$Devnull 2>&1 &&
93 { print -u2 "\a\"$File\" has blank lines!"; Stat=12; exit;}
94 cp $File $Bkupdir || # Copy to backup directory.
95 { print -u2 "\aCannot back up $File!"; Stat=13; exit;}
96 cat $File >> $Allfiles; rm $File # Combine into common file.
97 done
98
99 # CHECK COMBINED DATA FILE:
100 [ -s $Allfiles ] || { print -u2 "\aNo data to process!"; Stat=14; exit;}
101 Beforesize=$(wc -c <$Allfiles | awk '{ print $1 }') # Data size before
102 print "Data backed up to \"$Bkupdir\", concatenated in \"$Allfiles\"."
103
104 # PROCESS DATA FILES under direction of rule file:
105 OldIFS="$IFS" # Save old internal field separator char(s)
106 IFS="|" # Rule-file field separator for "read"
107 sed 's/#.*$//' $Rulefile | # Remove rule-file comments
108 egrep -v '^$' | # Remove blank lines
109 while read Searchkey From To Sortcmd ; do # put fields into variables
110 eval Source=$From; eval Target=$To # interpolate these var.
111 movelines $Searchkey $Source $Target $Sortcmd # Do the shuffle
112 print -u3 "$Target" # Output goes to fd 3.
113 done 3>| $Targetfilenames # Store fd3 output in a file.
114 IFS="$OldIFS" # Restore original IFS values.
115 Targetnames=$(sort -u $Targetfilenames) # Place unique list in variable.
116
117 # CONCLUSION: Cleanup and exit message:
118 for File in $Targetnames $Allfiles; do
119 [ -s $File ] || rm $File # Erase data files if empty
120 done
121 if [ $Beforesize -ne $(cat $Targetnames 2>|$Devnull | wc -c) ]; then
122 print -u2 "Warning: data may have been lost--use backup!\a\a\a"
123 else
124 print -u2 "Done: data shuffled and intact!"
125 fi
B. A sample data file:
- 1994 Feb 23 Smith 01 John Lunch at Panda East.
- 1994 Jan 23 Smith 02 Not coming to session, but writing paper.
- 1994 Feb 23 Smith 03 FOLLOWUP Read Sep 1993 SCILS article on SGML
Smith John 432 E43rd St, New York NY 01002 212-555-5555, fax 666-6666
Feb 10 BDAY Sarah (1956)
LATER Read SCILS article on SGML.
NOW Renew passport!
Beans stock and info 800-221-4221, customer service 800-341-4341
Clothes Shoes Timberland "Blucher" size W12
Convert US Ounces to Grams: 1 oz = 28.35 gm
Wallet [07 Sep 93] NY Drivers' # A01234 56789 123456 78, exp 7/96
- 1993 Dec 20 10am Called John Smith, set appt and faxed letter.
Wallet [07 Sep 93] Visa 1234-5678-1234-5678, lost: 1-800-423-3823
Fastback differential backup of C: c:/fastback/fb ')c)b)d)s))'
Clothes Shoes Adidas Marath.Train.II 1CA, size 12.5(D) 48(F) 13(USA)
C. A sample rule file:
# Rule file for "Shuffle: a rule-based list processor"
# 1. Rules contain: searchkey|source|target|optional_sort_command
# 2. First rule must have "." in first field, "$Allfiles" in second.
# 3. Common sort types:
# sort Straight alphabetic.
# sort +0M -1 +1n -2 Data format: Jun 25
# sort +1n -2 +2M -3 +3n -4 Data format: - 1992 Jun 25
.|$Allfiles|phone|sort
^- |phone|1993|sort +1n -2 +2M -3 +3n -4
^- 1994 |1993|1994|sort +1n -2 +2M -3 +3n -4
^Jan |phone|calendar
^Feb |phone|calendar
^Dec |phone|calendar|sort +0M -1 +1n -2
BDAY|calendar|bday|sort +0M -1 +1n -2
^NOW |phone|now|sort
^LATER |phone|later|sort
D. Another example of a data-file line:
Jan 23 Smith John Lunch at Panda East.
E. Some transformations of the data-file line
shown above in Part D:
- 1994 Jan 23 Smith 01 John Lunch at Panda East.
- 1994 Jan 23 Smith 02 Not coming to session, but writing paper.
- 1994 Jan 23 Smith 03 FOLLOWUP Sep 1993 SCILS article on SGML
- 1994 Jan 23 Smith 04 FOLLOWUP Call Joachim Mann 321-4567
Mann Joachim, tel 321-4567
LATER Read Sep 1993 SCILS article on SGML.
F. Another example of a rule-file line:
FOLLOWUP|1994|followup|sort +1n -2 +2M -3 +3n -4
G. A version of shuffle
written for Coherent that runs under the Bourne shell with the
``old'' awk.
1 #!/usr/bin/sh
2 # @(#) shuffle Version 5 A rule-based list processor
3 # Author: Thomas Baker <tbaker@unix.amherst.edu>
4 # Modified by: Becca Thomas, February 1994
5 # Modified by: Ga'bor Zahemszky, March 1994 to use sh and "old" awk
6 $DBG_SH # Dormant debugging directive
7
8 trap 'rm -f $Tmpfile $Targetfilenames >$Devnull 2>&1; exit $Stat' 0
9 trap 'echo "`basename $0`: Interrupted!" >&2 ; exit' 1 2 3 15
10
11 # CONFIGURATION
12 Allfiles=combined.dat # File for all catenated input files
13 Bkupdir=.backup # Unix input-files backup directory
14 #Bkupdir=backup # MKS input-files backup directory
15 Devnull="/dev/null" # Unix bit-bucket file
16 Rulefile=.rules # Unix rule file
17 #Rulefile=rules # MKS rule file
18 Usage="Usage: `basename $0` datafile [datafile ...]" # Correct usage
19 # Temporary directory-dependent variables:
20 Tmpdir=/tmp # MKS/Unix temporary directory
21 #Devnull=$Tmpdir/null # MKS bit-bucket file
22 Targetfilenames=$Tmpdir/sht$$.tmp # MKS/Unix target-names file
23 Tmpfile=$Tmpdir/shf$$.tmp # MKS/Unix temporary work file
24
25 # FUNCTION DEFINITIONS:
26 usage_exit() {
27 echo "$Usage" >&2 ; Stat=1 ; exit
28 }
29 movelines() { # Args: $Searchkey $Source $Target $Sortcmd
30 echo "Lines with [$1] moved from \""$2"\" to \""$3"\""
31 egrep "$1" $2 >>$3; egrep -v "$1" $2 >$Tmpfile; mv $Tmpfile $2
32 [ "$4" ] && echo ", ${4}." || echo "." # Print sort command
33 [ "$4" ] && { eval $4 -o $3 $3 ||
34 { echo "\007Bad rule-file sort command: $4"; Stat=2; exit;};}
35 }
36
37 # PROCESS COMMAND-LINE ARGUMENTS:
38 case $# in # User must specify at least one file-name argument
39 0) usage_exit ;;
40 esac
41
42 # SANITY CHECK: Rule file:
43 [ -r $Rulefile ] ||
44 { echo "\007Cannot read \"$Rulefile\" file!" >&2 ; Stat=4; exit;}
45 sed 's/#.*$//' $Rulefile | # Remove comments.
46 egrep -v '^$' | # Remove blank lines.
47 oawk -F\| ' # Rules separated by vertical bar
48 NR == 1 && ($1 != "." || $2 != "$Allfiles") { # Check first rule
49 print $0, ": rule 1 is illegal!" }
50 NF != 3 && NF != 4 { # All rules have 3 or 4 fields.
51 print $0, ": must have 3 or 4 fields!" }
52 $2 == $3 { # Source different from target.
53 print $0, ": source cannot equal target!" }
54 $4 != "" && $4 !~ /^sort/ { # Field 4 is for sort commands.
55 print $0, ": field 4 is only for sort!" }
56 $1 == "" || $2 == "" || $3 == "" { # First three fields are non-empty.
57 print $0, ": 1 of first 3 fields is empty!" }
58 { target[$3] = 1 } # Note names of target files
59 NR > 1 { # For all lines after the first
60 ZGvar2 = 0
61 for (ZGvar1 in target) {
62 if (ZGvar1 == $2) {
63 next
64 } else {
65 ZGvar2 = 1
66 }
67 }
68 if (ZGvar2 == 1) {
69 print $0, ": ", $2, "has no precedent!"
70 }
71 }' > $Tmpfile # Save unique lines and display
72 [ -s $Tmpfile ] &&
73 { echo "Bad rule format:\n`cat $Tmpfile`" >&2 ; Stat=5; exit;}
74
75 # SANITY CHECKS: Current directory, combined data, backup directory:
76 [ -w "." ] || # Current (data) directory
77 { echo "\007Cannot write to current directory!" >&2 ; Stat=6; exit;}
78 [ -f $Allfiles ] && # Combined data file
79 { echo "\007\"$Allfiles\" shouldn't exist!" >&2 ; Stat=7; exit;}
80 [ -d $Bkupdir ] || mkdir $Bkupdir 2>$Devnull ||
81 { echo "\007Can't make directory \"$Bkupdir\"!" >&2 ; Stat=8; exit;}
82 [ "`ls $Bkupdir`" ] && { # if there are files in backup dir
83 echo "Okay to erase files in $Bkupdir (y*|Y*/n)? \c"; read ans
84 case $ans in
85 y*|Y*) rm -f $Bkupdir/* >$Devnull 2>&1 ;; # Remove old backups
86 *) echo "Exiting, check $Bkupdir directory."; Stat=0; exit ;;
87 esac;}
88
89 # CHECK DATA FILES, BACK UP, THEN COMBINE INTO A COMMON FILE:
90 for File in $*; do
91 [ -d $File ] && continue # Ignore directories.
92 [ "$File" = "$Rulefile" ] && continue # Ignore rules (just data).
93 [ "`dirname $File`" = "." ] || [ "`dirname $File`" = "`pwd`" ] ||
94 { echo "\007Data files must be in current directory!" >&2
95 Stat=9; exit;}
96 [ -r $File ] ||
97 { echo "\007\"$File\" file not readable." >&2 ; Stat=10; exit;}
98 { file $File | egrep 'text|empty' >$Devnull 2>&1;} ||
99 { echo "\007\"$File\" not text nor empty." >&2 ; Stat=11; exit;}
100 egrep '^[ ]*$' $File >$Devnull 2>&1 &&
101 { echo "\007\"$File\" has blank lines!" >&2 ; Stat=12; exit;}
102 cp $File $Bkupdir || # Copy to backup directory.
103 { echo "\007Cannot back up $File!" >&2 ; Stat=13; exit;}
104 cat $File >> $Allfiles; rm $File # Combine into common file.
105 done
106
107 # CHECK COMBINED DATA FILE:
108 [ -s $Allfiles ] || { echo "\007No data to process!">&2; Stat=14; exit;}
109 Beforesize=`wc -c <$Allfiles | oawk '{ print $1 }'` # Data size before
110 echo "Data backed up to \"$Bkupdir\", concatenated in \"$Allfiles\"."
111
112 # PROCESS DATA FILES under direction of rule file:
113 OldIFS="$IFS" # Save old internal field separator char(s)
114 IFS="|" # Rule-file field separator for "read"
115 sed 's/#.*$//' $Rulefile | # Remove rule-file comments
116 egrep -v '^$' | # Remove blank lines
117 while read Searchkey From To Sortcmd ; do # put fields into variables
118 eval Source=$From; eval Target=$To # interpolate these var.
119 movelines $Searchkey $Source $Target $Sortcmd # Do the shuffle
120 echo "$Target" >&3 # Output goes to fd 3.
121 done 3> $Targetfilenames # Store fd3 output in a file.
122 IFS="$OldIFS" # Restore original IFS values.
123 Targetnames=`sort -u $Targetfilenames` # Place unique list in variable.
124
125 # CONCLUSION: Cleanup and exit message:
126 for File in $Targetnames $Allfiles; do
127 [ -s $File ] || rm $File # Erase data files if empty
128 done
129 if [ $Beforesize -ne `cat $Targetnames 2>$Devnull | wc -c` ]; then
130 echo "Warning: data may have been lost--use backup!\007" >&2
131 else
132 echo "Done: data shuffled and intact!" >&2
133 fi
Figure 1: A data-flow diagram for the
example discussed in Tom Baker's introductory letter.
$Allfiles
|
V Sorted by year:
phone ---> 1993 [^- ] -----\-----------> 1994 [^- 1994 ]
| \----------> 1993 (everything else)
|
V Sorted by month:
phone ---> calendar [^Jan,^Feb..] \----> bday [BDAY]
| \---> calendar (everything else)
|
V Sorted alphabetically:
phone ---> now [^NOW ]
|
phone ---> later [^LATER ]
|
\------> phone (everything else)
Copyright © 1995-1997
The McGraw-Hill Companies, Inc.
All Rights Reserved.
Edited by Becca Thomas / editor@unixworld.com
Last Modified: Thursday, 12-Sep-96 20:50:47