19b50d902SRodney W. Grimes# @(#)POSIX 8.1 (Berkeley) 6/6/93 29b50d902SRodney W. Grimes 39b50d902SRodney W. GrimesComments on the IEEE P1003.2 Draft 12 49b50d902SRodney W. Grimes Part 2: Shell and Utilities 59b50d902SRodney W. Grimes Section 4.55: sed - Stream editor 69b50d902SRodney W. Grimes 79b50d902SRodney W. GrimesDiomidis Spinellis <dds@doc.ic.ac.uk> 89b50d902SRodney W. GrimesKeith Bostic <bostic@cs.berkeley.edu> 99b50d902SRodney W. Grimes 109b50d902SRodney W. GrimesIn the following paragraphs, "wrong" usually means "inconsistent with 119b50d902SRodney W. Grimeshistoric practice", as most of the following comments refer to 129b50d902SRodney W. Grimesundocumented inconsistencies between the historical versions of sed and 139b50d902SRodney W. Grimesthe POSIX 1003.2 standard. All the comments are notes taken while 149b50d902SRodney W. Grimesimplementing a POSIX-compatible version of sed, and should not be 159b50d902SRodney W. Grimesinterpreted as official opinions or criticism towards the POSIX committee. 169b50d902SRodney W. GrimesAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 179b50d902SRodney W. Grimes 189b50d902SRodney W. Grimes 1. 32V and BSD derived implementations of sed strip the text 199b50d902SRodney W. Grimes arguments of the a, c and i commands of their initial blanks, 209b50d902SRodney W. Grimes i.e. 219b50d902SRodney W. Grimes 229b50d902SRodney W. Grimes #!/bin/sed -f 239b50d902SRodney W. Grimes a\ 249b50d902SRodney W. Grimes foo\ 259b50d902SRodney W. Grimes \ indent\ 269b50d902SRodney W. Grimes bar 279b50d902SRodney W. Grimes 289b50d902SRodney W. Grimes produces: 299b50d902SRodney W. Grimes 309b50d902SRodney W. Grimes foo 319b50d902SRodney W. Grimes indent 329b50d902SRodney W. Grimes bar 339b50d902SRodney W. Grimes 349b50d902SRodney W. Grimes POSIX does not specify this behavior as the System V versions of 359b50d902SRodney W. Grimes sed do not do this stripping. The argument against stripping is 369b50d902SRodney W. Grimes that it is difficult to write sed scripts that have leading blanks 379b50d902SRodney W. Grimes if they are stripped. The argument for stripping is that it is 389b50d902SRodney W. Grimes difficult to write readable sed scripts unless indentation is allowed 399b50d902SRodney W. Grimes and ignored, and leading whitespace is obtainable by entering a 409b50d902SRodney W. Grimes backslash in front of it. This implementation follows the BSD 419b50d902SRodney W. Grimes historic practice. 429b50d902SRodney W. Grimes 439b50d902SRodney W. Grimes 2. Historical versions of sed required that the w flag be the last 449b50d902SRodney W. Grimes flag to an s command as it takes an additional argument. This 459b50d902SRodney W. Grimes is obvious, but not specified in POSIX. 469b50d902SRodney W. Grimes 479b50d902SRodney W. Grimes 3. Historical versions of sed required that whitespace follow a w 489b50d902SRodney W. Grimes flag to an s command. This is not specified in POSIX. This 499b50d902SRodney W. Grimes implementation permits whitespace but does not require it. 509b50d902SRodney W. Grimes 519b50d902SRodney W. Grimes 4. Historical versions of sed permitted any number of whitespace 529b50d902SRodney W. Grimes characters to follow the w command. This is not specified in 539b50d902SRodney W. Grimes POSIX. This implementation permits whitespace but does not 549b50d902SRodney W. Grimes require it. 559b50d902SRodney W. Grimes 569b50d902SRodney W. Grimes 5. The rule for the l command differs from historic practice. Table 579b50d902SRodney W. Grimes 2-15 includes the various ANSI C escape sequences, including \\ 589b50d902SRodney W. Grimes for backslash. Some historical versions of sed displayed two 599b50d902SRodney W. Grimes digit octal numbers, too, not three as specified by POSIX. POSIX 609b50d902SRodney W. Grimes is a cleanup, and is followed by this implementation. 619b50d902SRodney W. Grimes 629b50d902SRodney W. Grimes 6. The POSIX specification for ! does not specify that for a single 639b50d902SRodney W. Grimes command the command must not contain an address specification 649b50d902SRodney W. Grimes whereas the command list can contain address specifications. The 659b50d902SRodney W. Grimes specification for ! implies that "3!/hello/p" works, and it never 669b50d902SRodney W. Grimes has, historically. Note, 679b50d902SRodney W. Grimes 689b50d902SRodney W. Grimes 3!{ 699b50d902SRodney W. Grimes /hello/p 709b50d902SRodney W. Grimes } 719b50d902SRodney W. Grimes 729b50d902SRodney W. Grimes does work. 739b50d902SRodney W. Grimes 749b50d902SRodney W. Grimes 7. POSIX does not specify what happens with consecutive ! commands 759b50d902SRodney W. Grimes (e.g. /foo/!!!p). Historic implementations allow any number of 769b50d902SRodney W. Grimes !'s without changing the behaviour. (It seems logical that each 779b50d902SRodney W. Grimes one might reverse the behaviour.) This implementation follows 789b50d902SRodney W. Grimes historic practice. 799b50d902SRodney W. Grimes 809b50d902SRodney W. Grimes 8. Historic versions of sed permitted commands to be separated 819b50d902SRodney W. Grimes by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 829b50d902SRodney W. Grimes three lines of a file. This is not specified by POSIX. 839b50d902SRodney W. Grimes Note, the ; command separator is not allowed for the commands 849b50d902SRodney W. Grimes a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 859b50d902SRodney W. Grimes command. This implementation follows historic practice and 869b50d902SRodney W. Grimes implements the ; separator. 879b50d902SRodney W. Grimes 889b50d902SRodney W. Grimes 9. Historic versions of sed terminated the script if EOF was reached 899b50d902SRodney W. Grimes during the execution of the 'n' command, i.e.: 909b50d902SRodney W. Grimes 919b50d902SRodney W. Grimes sed -e ' 929b50d902SRodney W. Grimes n 939b50d902SRodney W. Grimes i\ 949b50d902SRodney W. Grimes hello 959b50d902SRodney W. Grimes ' </dev/null 969b50d902SRodney W. Grimes 979b50d902SRodney W. Grimes did not produce any output. POSIX does not specify this behavior. 989b50d902SRodney W. Grimes This implementation follows historic practice. 999b50d902SRodney W. Grimes 1009b50d902SRodney W. Grimes10. Deleted. 1019b50d902SRodney W. Grimes 1029b50d902SRodney W. Grimes11. Historical implementations do not output the change text of a c 1039b50d902SRodney W. Grimes command in the case of an address range whose first line number 1049b50d902SRodney W. Grimes is greater than the second (e.g. 3,1). POSIX requires that the 1059b50d902SRodney W. Grimes text be output. Since the historic behavior doesn't seem to have 1069b50d902SRodney W. Grimes any particular purpose, this implementation follows the POSIX 1079b50d902SRodney W. Grimes behavior. 1089b50d902SRodney W. Grimes 1099b50d902SRodney W. Grimes12. POSIX does not specify whether address ranges are checked and 1109b50d902SRodney W. Grimes reset if a command is not executed due to a jump. The following 1119b50d902SRodney W. Grimes program will behave in different ways depending on whether the 1129b50d902SRodney W. Grimes 'c' command is triggered at the third line, i.e. will the text 1139b50d902SRodney W. Grimes be output even though line 3 of the input will never logically 1149b50d902SRodney W. Grimes encounter that command. 1159b50d902SRodney W. Grimes 1169b50d902SRodney W. Grimes 2,4b 1179b50d902SRodney W. Grimes 1,3c\ 1189b50d902SRodney W. Grimes text 1199b50d902SRodney W. Grimes 1209b50d902SRodney W. Grimes Historic implementations, and this implementation, do not output 1219b50d902SRodney W. Grimes the text in the above example. The general rule, therefore, 1229b50d902SRodney W. Grimes is that a range whose second address is never matched extends to 1239b50d902SRodney W. Grimes the end of the input. 1249b50d902SRodney W. Grimes 1259b50d902SRodney W. Grimes13. Historical implementations allow an output suppressing #n at the 1269b50d902SRodney W. Grimes beginning of -e arguments as well as in a script file. POSIX 1279b50d902SRodney W. Grimes does not specify this. This implementation follows historical 1289b50d902SRodney W. Grimes practice. 1299b50d902SRodney W. Grimes 1309b50d902SRodney W. Grimes14. POSIX does not explicitly specify how sed behaves if no script is 1319b50d902SRodney W. Grimes specified. Since the sed Synopsis permits this form of the command, 1329b50d902SRodney W. Grimes and the language in the Description section states that the input 1339b50d902SRodney W. Grimes is output, it seems reasonable that it behave like the cat(1) 1349b50d902SRodney W. Grimes command. Historic sed implementations behave differently for "ls | 1359b50d902SRodney W. Grimes sed", where they produce no output, and "ls | sed -e#", where they 1369b50d902SRodney W. Grimes behave like cat. This implementation behaves like cat in both cases. 1379b50d902SRodney W. Grimes 1389b50d902SRodney W. Grimes15. The POSIX requirement to open all w files at the beginning makes 1399b50d902SRodney W. Grimes sed behave nonintuitively when the w commands are preceded by 1409b50d902SRodney W. Grimes addresses or are within conditional blocks. This implementation 1419b50d902SRodney W. Grimes follows historic practice and POSIX, by default, and provides the 1429b50d902SRodney W. Grimes -a option which opens the files only when they are needed. 1439b50d902SRodney W. Grimes 1449b50d902SRodney W. Grimes16. POSIX does not specify how escape sequences other than \n and \D 1459b50d902SRodney W. Grimes (where D is the delimiter character) are to be treated. This is 1469b50d902SRodney W. Grimes reasonable, however, it also doesn't state that the backslash is 1479b50d902SRodney W. Grimes to be discarded from the output regardless. A strict reading of 1489b50d902SRodney W. Grimes POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 1499b50d902SRodney W. Grimes As historic sed implementations always discarded the backslash, 1509b50d902SRodney W. Grimes this implementation does as well. 1519b50d902SRodney W. Grimes 1529b50d902SRodney W. Grimes17. POSIX specifies that an address can be "empty". This implies 1539b50d902SRodney W. Grimes that constructs like ",d" or "1,d" and ",5d" are allowed. This 1549b50d902SRodney W. Grimes is not true for historic implementations or this implementation 1559b50d902SRodney W. Grimes of sed. 1569b50d902SRodney W. Grimes 1579b50d902SRodney W. Grimes18. The b t and : commands are documented in POSIX to ignore leading 1589b50d902SRodney W. Grimes white space, but no mention is made of trailing white space. 1599b50d902SRodney W. Grimes Historic implementations of sed assigned different locations to 1609b50d902SRodney W. Grimes the labels "x" and "x ". This is not useful, and leads to subtle 1619b50d902SRodney W. Grimes programming errors, but it is historic practice and changing it 1629b50d902SRodney W. Grimes could theoretically break working scripts. This implementation 1639b50d902SRodney W. Grimes follows historic practice. 1649b50d902SRodney W. Grimes 1659b50d902SRodney W. Grimes19. Although POSIX specifies that reading from files that do not exist 1669b50d902SRodney W. Grimes from within the script must not terminate the script, it does not 1679b50d902SRodney W. Grimes specify what happens if a write command fails. Historic practice 1689b50d902SRodney W. Grimes is to fail immediately if the file cannot be opened or written. 1699b50d902SRodney W. Grimes This implementation follows historic practice. 1709b50d902SRodney W. Grimes 1719b50d902SRodney W. Grimes20. Historic practice is that the \n construct can be used for either 1729b50d902SRodney W. Grimes string1 or string2 of the y command. This is not specified by 1739b50d902SRodney W. Grimes POSIX. This implementation follows historic practice. 1749b50d902SRodney W. Grimes 1759b50d902SRodney W. Grimes21. Deleted. 1769b50d902SRodney W. Grimes 1779b50d902SRodney W. Grimes22. Historic implementations of sed ignore the RE delimiter characters 1789b50d902SRodney W. Grimes within character classes. This is not specified in POSIX. This 1799b50d902SRodney W. Grimes implementation follows historic practice. 1809b50d902SRodney W. Grimes 1819b50d902SRodney W. Grimes23. Historic implementations handle empty RE's in a special way: the 1829b50d902SRodney W. Grimes empty RE is interpreted as if it were the last RE encountered, 1839b50d902SRodney W. Grimes whether in an address or elsewhere. POSIX does not document this 1849b50d902SRodney W. Grimes behavior. For example the command: 1859b50d902SRodney W. Grimes 1869b50d902SRodney W. Grimes sed -e /abc/s//XXX/ 1879b50d902SRodney W. Grimes 1889b50d902SRodney W. Grimes substitutes XXX for the pattern abc. The semantics of "the last 1899b50d902SRodney W. Grimes RE" can be defined in two different ways: 1909b50d902SRodney W. Grimes 1919b50d902SRodney W. Grimes 1. The last RE encountered when compiling (lexical/static scope). 1929b50d902SRodney W. Grimes 2. The last RE encountered while running (dynamic scope). 1939b50d902SRodney W. Grimes 1949b50d902SRodney W. Grimes While many historical implementations fail on programs depending 1959b50d902SRodney W. Grimes on scope differences, the SunOS version exhibited dynamic scope 1969b50d902SRodney W. Grimes behaviour. This implementation does dynamic scoping, as this seems 1979b50d902SRodney W. Grimes the most useful and in order to remain consistent with historical 1989b50d902SRodney W. Grimes practice. 199