1*84441f85SGarrett D'Amore# @(#)POSIX 8.1 (Berkeley) 6/6/93 2*84441f85SGarrett D'Amore# $FreeBSD$ 3*84441f85SGarrett D'Amore 4*84441f85SGarrett D'AmoreComments on the IEEE P1003.2 Draft 12 5*84441f85SGarrett D'Amore Part 2: Shell and Utilities 6*84441f85SGarrett D'Amore Section 4.55: sed - Stream editor 7*84441f85SGarrett D'Amore 8*84441f85SGarrett D'AmoreDiomidis Spinellis <dds@doc.ic.ac.uk> 9*84441f85SGarrett D'AmoreKeith Bostic <bostic@cs.berkeley.edu> 10*84441f85SGarrett D'Amore 11*84441f85SGarrett D'AmoreIn the following paragraphs, "wrong" usually means "inconsistent with 12*84441f85SGarrett D'Amorehistoric practice", as most of the following comments refer to 13*84441f85SGarrett D'Amoreundocumented inconsistencies between the historical versions of sed and 14*84441f85SGarrett D'Amorethe POSIX 1003.2 standard. All the comments are notes taken while 15*84441f85SGarrett D'Amoreimplementing a POSIX-compatible version of sed, and should not be 16*84441f85SGarrett D'Amoreinterpreted as official opinions or criticism towards the POSIX committee. 17*84441f85SGarrett D'AmoreAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 18*84441f85SGarrett D'Amore 19*84441f85SGarrett D'Amore 1. 32V and BSD derived implementations of sed strip the text 20*84441f85SGarrett D'Amore arguments of the a, c and i commands of their initial blanks, 21*84441f85SGarrett D'Amore i.e. 22*84441f85SGarrett D'Amore 23*84441f85SGarrett D'Amore #!/bin/sed -f 24*84441f85SGarrett D'Amore a\ 25*84441f85SGarrett D'Amore foo\ 26*84441f85SGarrett D'Amore \ indent\ 27*84441f85SGarrett D'Amore bar 28*84441f85SGarrett D'Amore 29*84441f85SGarrett D'Amore produces: 30*84441f85SGarrett D'Amore 31*84441f85SGarrett D'Amore foo 32*84441f85SGarrett D'Amore indent 33*84441f85SGarrett D'Amore bar 34*84441f85SGarrett D'Amore 35*84441f85SGarrett D'Amore POSIX does not specify this behavior as the System V versions of 36*84441f85SGarrett D'Amore sed do not do this stripping. The argument against stripping is 37*84441f85SGarrett D'Amore that it is difficult to write sed scripts that have leading blanks 38*84441f85SGarrett D'Amore if they are stripped. The argument for stripping is that it is 39*84441f85SGarrett D'Amore difficult to write readable sed scripts unless indentation is allowed 40*84441f85SGarrett D'Amore and ignored, and leading whitespace is obtainable by entering a 41*84441f85SGarrett D'Amore backslash in front of it. This implementation follows the BSD 42*84441f85SGarrett D'Amore historic practice. 43*84441f85SGarrett D'Amore 44*84441f85SGarrett D'Amore 2. Historical versions of sed required that the w flag be the last 45*84441f85SGarrett D'Amore flag to an s command as it takes an additional argument. This 46*84441f85SGarrett D'Amore is obvious, but not specified in POSIX. 47*84441f85SGarrett D'Amore 48*84441f85SGarrett D'Amore 3. Historical versions of sed required that whitespace follow a w 49*84441f85SGarrett D'Amore flag to an s command. This is not specified in POSIX. This 50*84441f85SGarrett D'Amore implementation permits whitespace but does not require it. 51*84441f85SGarrett D'Amore 52*84441f85SGarrett D'Amore 4. Historical versions of sed permitted any number of whitespace 53*84441f85SGarrett D'Amore characters to follow the w command. This is not specified in 54*84441f85SGarrett D'Amore POSIX. This implementation permits whitespace but does not 55*84441f85SGarrett D'Amore require it. 56*84441f85SGarrett D'Amore 57*84441f85SGarrett D'Amore 5. The rule for the l command differs from historic practice. Table 58*84441f85SGarrett D'Amore 2-15 includes the various ANSI C escape sequences, including \\ 59*84441f85SGarrett D'Amore for backslash. Some historical versions of sed displayed two 60*84441f85SGarrett D'Amore digit octal numbers, too, not three as specified by POSIX. POSIX 61*84441f85SGarrett D'Amore is a cleanup, and is followed by this implementation. 62*84441f85SGarrett D'Amore 63*84441f85SGarrett D'Amore 6. The POSIX specification for ! does not specify that for a single 64*84441f85SGarrett D'Amore command the command must not contain an address specification 65*84441f85SGarrett D'Amore whereas the command list can contain address specifications. The 66*84441f85SGarrett D'Amore specification for ! implies that "3!/hello/p" works, and it never 67*84441f85SGarrett D'Amore has, historically. Note, 68*84441f85SGarrett D'Amore 69*84441f85SGarrett D'Amore 3!{ 70*84441f85SGarrett D'Amore /hello/p 71*84441f85SGarrett D'Amore } 72*84441f85SGarrett D'Amore 73*84441f85SGarrett D'Amore does work. 74*84441f85SGarrett D'Amore 75*84441f85SGarrett D'Amore 7. POSIX does not specify what happens with consecutive ! commands 76*84441f85SGarrett D'Amore (e.g. /foo/!!!p). Historic implementations allow any number of 77*84441f85SGarrett D'Amore !'s without changing the behaviour. (It seems logical that each 78*84441f85SGarrett D'Amore one might reverse the behaviour.) This implementation follows 79*84441f85SGarrett D'Amore historic practice. 80*84441f85SGarrett D'Amore 81*84441f85SGarrett D'Amore 8. Historic versions of sed permitted commands to be separated 82*84441f85SGarrett D'Amore by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 83*84441f85SGarrett D'Amore three lines of a file. This is not specified by POSIX. 84*84441f85SGarrett D'Amore Note, the ; command separator is not allowed for the commands 85*84441f85SGarrett D'Amore a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 86*84441f85SGarrett D'Amore command. This implementation follows historic practice and 87*84441f85SGarrett D'Amore implements the ; separator. 88*84441f85SGarrett D'Amore 89*84441f85SGarrett D'Amore 9. Historic versions of sed terminated the script if EOF was reached 90*84441f85SGarrett D'Amore during the execution of the 'n' command, i.e.: 91*84441f85SGarrett D'Amore 92*84441f85SGarrett D'Amore sed -e ' 93*84441f85SGarrett D'Amore n 94*84441f85SGarrett D'Amore i\ 95*84441f85SGarrett D'Amore hello 96*84441f85SGarrett D'Amore ' </dev/null 97*84441f85SGarrett D'Amore 98*84441f85SGarrett D'Amore did not produce any output. POSIX does not specify this behavior. 99*84441f85SGarrett D'Amore This implementation follows historic practice. 100*84441f85SGarrett D'Amore 101*84441f85SGarrett D'Amore10. Deleted. 102*84441f85SGarrett D'Amore 103*84441f85SGarrett D'Amore11. Historical implementations do not output the change text of a c 104*84441f85SGarrett D'Amore command in the case of an address range whose first line number 105*84441f85SGarrett D'Amore is greater than the second (e.g. 3,1). POSIX requires that the 106*84441f85SGarrett D'Amore text be output. Since the historic behavior doesn't seem to have 107*84441f85SGarrett D'Amore any particular purpose, this implementation follows the POSIX 108*84441f85SGarrett D'Amore behavior. 109*84441f85SGarrett D'Amore 110*84441f85SGarrett D'Amore12. POSIX does not specify whether address ranges are checked and 111*84441f85SGarrett D'Amore reset if a command is not executed due to a jump. The following 112*84441f85SGarrett D'Amore program will behave in different ways depending on whether the 113*84441f85SGarrett D'Amore 'c' command is triggered at the third line, i.e. will the text 114*84441f85SGarrett D'Amore be output even though line 3 of the input will never logically 115*84441f85SGarrett D'Amore encounter that command. 116*84441f85SGarrett D'Amore 117*84441f85SGarrett D'Amore 2,4b 118*84441f85SGarrett D'Amore 1,3c\ 119*84441f85SGarrett D'Amore text 120*84441f85SGarrett D'Amore 121*84441f85SGarrett D'Amore Historic implementations did not output the text in the above 122*84441f85SGarrett D'Amore example. Therefore it was believed that a range whose second 123*84441f85SGarrett D'Amore address was never matched extended to the end of the input. 124*84441f85SGarrett D'Amore However, the current practice adopted by this implementation, 125*84441f85SGarrett D'Amore as well as by those from GNU and SUN, is as follows: The text 126*84441f85SGarrett D'Amore from the 'c' command still isn't output because the second address 127*84441f85SGarrett D'Amore isn't actually matched; but the range is reset after all if its 128*84441f85SGarrett D'Amore second address is a line number. In the above example, only the 129*84441f85SGarrett D'Amore first line of the input will be deleted. 130*84441f85SGarrett D'Amore 131*84441f85SGarrett D'Amore13. Historical implementations allow an output suppressing #n at the 132*84441f85SGarrett D'Amore beginning of -e arguments as well as in a script file. POSIX 133*84441f85SGarrett D'Amore does not specify this. This implementation follows historical 134*84441f85SGarrett D'Amore practice. 135*84441f85SGarrett D'Amore 136*84441f85SGarrett D'Amore14. POSIX does not explicitly specify how sed behaves if no script is 137*84441f85SGarrett D'Amore specified. Since the sed Synopsis permits this form of the command, 138*84441f85SGarrett D'Amore and the language in the Description section states that the input 139*84441f85SGarrett D'Amore is output, it seems reasonable that it behave like the cat(1) 140*84441f85SGarrett D'Amore command. Historic sed implementations behave differently for "ls | 141*84441f85SGarrett D'Amore sed", where they produce no output, and "ls | sed -e#", where they 142*84441f85SGarrett D'Amore behave like cat. This implementation behaves like cat in both cases. 143*84441f85SGarrett D'Amore 144*84441f85SGarrett D'Amore15. The POSIX requirement to open all w files at the beginning makes 145*84441f85SGarrett D'Amore sed behave nonintuitively when the w commands are preceded by 146*84441f85SGarrett D'Amore addresses or are within conditional blocks. This implementation 147*84441f85SGarrett D'Amore follows historic practice and POSIX, by default, and provides the 148*84441f85SGarrett D'Amore -a option which opens the files only when they are needed. 149*84441f85SGarrett D'Amore 150*84441f85SGarrett D'Amore16. POSIX does not specify how escape sequences other than \n and \D 151*84441f85SGarrett D'Amore (where D is the delimiter character) are to be treated. This is 152*84441f85SGarrett D'Amore reasonable, however, it also doesn't state that the backslash is 153*84441f85SGarrett D'Amore to be discarded from the output regardless. A strict reading of 154*84441f85SGarrett D'Amore POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 155*84441f85SGarrett D'Amore As historic sed implementations always discarded the backslash, 156*84441f85SGarrett D'Amore this implementation does as well. 157*84441f85SGarrett D'Amore 158*84441f85SGarrett D'Amore17. POSIX specifies that an address can be "empty". This implies 159*84441f85SGarrett D'Amore that constructs like ",d" or "1,d" and ",5d" are allowed. This 160*84441f85SGarrett D'Amore is not true for historic implementations or this implementation 161*84441f85SGarrett D'Amore of sed. 162*84441f85SGarrett D'Amore 163*84441f85SGarrett D'Amore18. The b t and : commands are documented in POSIX to ignore leading 164*84441f85SGarrett D'Amore white space, but no mention is made of trailing white space. 165*84441f85SGarrett D'Amore Historic implementations of sed assigned different locations to 166*84441f85SGarrett D'Amore the labels "x" and "x ". This is not useful, and leads to subtle 167*84441f85SGarrett D'Amore programming errors, but it is historic practice and changing it 168*84441f85SGarrett D'Amore could theoretically break working scripts. This implementation 169*84441f85SGarrett D'Amore follows historic practice. 170*84441f85SGarrett D'Amore 171*84441f85SGarrett D'Amore19. Although POSIX specifies that reading from files that do not exist 172*84441f85SGarrett D'Amore from within the script must not terminate the script, it does not 173*84441f85SGarrett D'Amore specify what happens if a write command fails. Historic practice 174*84441f85SGarrett D'Amore is to fail immediately if the file cannot be opened or written. 175*84441f85SGarrett D'Amore This implementation follows historic practice. 176*84441f85SGarrett D'Amore 177*84441f85SGarrett D'Amore20. Historic practice is that the \n construct can be used for either 178*84441f85SGarrett D'Amore string1 or string2 of the y command. This is not specified by 179*84441f85SGarrett D'Amore POSIX. This implementation follows historic practice. 180*84441f85SGarrett D'Amore 181*84441f85SGarrett D'Amore21. Deleted. 182*84441f85SGarrett D'Amore 183*84441f85SGarrett D'Amore22. Historic implementations of sed ignore the RE delimiter characters 184*84441f85SGarrett D'Amore within character classes. This is not specified in POSIX. This 185*84441f85SGarrett D'Amore implementation follows historic practice. 186*84441f85SGarrett D'Amore 187*84441f85SGarrett D'Amore23. Historic implementations handle empty RE's in a special way: the 188*84441f85SGarrett D'Amore empty RE is interpreted as if it were the last RE encountered, 189*84441f85SGarrett D'Amore whether in an address or elsewhere. POSIX does not document this 190*84441f85SGarrett D'Amore behavior. For example the command: 191*84441f85SGarrett D'Amore 192*84441f85SGarrett D'Amore sed -e /abc/s//XXX/ 193*84441f85SGarrett D'Amore 194*84441f85SGarrett D'Amore substitutes XXX for the pattern abc. The semantics of "the last 195*84441f85SGarrett D'Amore RE" can be defined in two different ways: 196*84441f85SGarrett D'Amore 197*84441f85SGarrett D'Amore 1. The last RE encountered when compiling (lexical/static scope). 198*84441f85SGarrett D'Amore 2. The last RE encountered while running (dynamic scope). 199*84441f85SGarrett D'Amore 200*84441f85SGarrett D'Amore While many historical implementations fail on programs depending 201*84441f85SGarrett D'Amore on scope differences, the SunOS version exhibited dynamic scope 202*84441f85SGarrett D'Amore behaviour. This implementation does dynamic scoping, as this seems 203*84441f85SGarrett D'Amore the most useful and in order to remain consistent with historical 204*84441f85SGarrett D'Amore practice. 205