19b50d902SRodney W. Grimes 29b50d902SRodney W. GrimesComments on the IEEE P1003.2 Draft 12 39b50d902SRodney W. Grimes Part 2: Shell and Utilities 49b50d902SRodney W. Grimes Section 4.55: sed - Stream editor 59b50d902SRodney W. Grimes 69b50d902SRodney W. GrimesDiomidis Spinellis <dds@doc.ic.ac.uk> 79b50d902SRodney W. GrimesKeith Bostic <bostic@cs.berkeley.edu> 89b50d902SRodney W. Grimes 99b50d902SRodney W. GrimesIn the following paragraphs, "wrong" usually means "inconsistent with 109b50d902SRodney W. Grimeshistoric practice", as most of the following comments refer to 119b50d902SRodney W. Grimesundocumented inconsistencies between the historical versions of sed and 129b50d902SRodney W. Grimesthe POSIX 1003.2 standard. All the comments are notes taken while 139b50d902SRodney W. Grimesimplementing a POSIX-compatible version of sed, and should not be 149b50d902SRodney W. Grimesinterpreted as official opinions or criticism towards the POSIX committee. 159b50d902SRodney W. GrimesAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 169b50d902SRodney W. Grimes 179b50d902SRodney W. Grimes 1. 32V and BSD derived implementations of sed strip the text 189b50d902SRodney W. Grimes arguments of the a, c and i commands of their initial blanks, 199b50d902SRodney W. Grimes i.e. 209b50d902SRodney W. Grimes 219b50d902SRodney W. Grimes #!/bin/sed -f 229b50d902SRodney W. Grimes a\ 239b50d902SRodney W. Grimes foo\ 249b50d902SRodney W. Grimes \ indent\ 259b50d902SRodney W. Grimes bar 269b50d902SRodney W. Grimes 279b50d902SRodney W. Grimes produces: 289b50d902SRodney W. Grimes 299b50d902SRodney W. Grimes foo 309b50d902SRodney W. Grimes indent 319b50d902SRodney W. Grimes bar 329b50d902SRodney W. Grimes 339b50d902SRodney W. Grimes POSIX does not specify this behavior as the System V versions of 349b50d902SRodney W. Grimes sed do not do this stripping. The argument against stripping is 359b50d902SRodney W. Grimes that it is difficult to write sed scripts that have leading blanks 369b50d902SRodney W. Grimes if they are stripped. The argument for stripping is that it is 379b50d902SRodney W. Grimes difficult to write readable sed scripts unless indentation is allowed 389b50d902SRodney W. Grimes and ignored, and leading whitespace is obtainable by entering a 399b50d902SRodney W. Grimes backslash in front of it. This implementation follows the BSD 409b50d902SRodney W. Grimes historic practice. 419b50d902SRodney W. Grimes 429b50d902SRodney W. Grimes 2. Historical versions of sed required that the w flag be the last 439b50d902SRodney W. Grimes flag to an s command as it takes an additional argument. This 449b50d902SRodney W. Grimes is obvious, but not specified in POSIX. 459b50d902SRodney W. Grimes 469b50d902SRodney W. Grimes 3. Historical versions of sed required that whitespace follow a w 479b50d902SRodney W. Grimes flag to an s command. This is not specified in POSIX. This 489b50d902SRodney W. Grimes implementation permits whitespace but does not require it. 499b50d902SRodney W. Grimes 509b50d902SRodney W. Grimes 4. Historical versions of sed permitted any number of whitespace 519b50d902SRodney W. Grimes characters to follow the w command. This is not specified in 529b50d902SRodney W. Grimes POSIX. This implementation permits whitespace but does not 539b50d902SRodney W. Grimes require it. 549b50d902SRodney W. Grimes 559b50d902SRodney W. Grimes 5. The rule for the l command differs from historic practice. Table 569b50d902SRodney W. Grimes 2-15 includes the various ANSI C escape sequences, including \\ 579b50d902SRodney W. Grimes for backslash. Some historical versions of sed displayed two 589b50d902SRodney W. Grimes digit octal numbers, too, not three as specified by POSIX. POSIX 599b50d902SRodney W. Grimes is a cleanup, and is followed by this implementation. 609b50d902SRodney W. Grimes 619b50d902SRodney W. Grimes 6. The POSIX specification for ! does not specify that for a single 629b50d902SRodney W. Grimes command the command must not contain an address specification 639b50d902SRodney W. Grimes whereas the command list can contain address specifications. The 649b50d902SRodney W. Grimes specification for ! implies that "3!/hello/p" works, and it never 659b50d902SRodney W. Grimes has, historically. Note, 669b50d902SRodney W. Grimes 679b50d902SRodney W. Grimes 3!{ 689b50d902SRodney W. Grimes /hello/p 699b50d902SRodney W. Grimes } 709b50d902SRodney W. Grimes 719b50d902SRodney W. Grimes does work. 729b50d902SRodney W. Grimes 739b50d902SRodney W. Grimes 7. POSIX does not specify what happens with consecutive ! commands 749b50d902SRodney W. Grimes (e.g. /foo/!!!p). Historic implementations allow any number of 759b50d902SRodney W. Grimes !'s without changing the behaviour. (It seems logical that each 769b50d902SRodney W. Grimes one might reverse the behaviour.) This implementation follows 779b50d902SRodney W. Grimes historic practice. 789b50d902SRodney W. Grimes 799b50d902SRodney W. Grimes 8. Historic versions of sed permitted commands to be separated 809b50d902SRodney W. Grimes by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 819b50d902SRodney W. Grimes three lines of a file. This is not specified by POSIX. 829b50d902SRodney W. Grimes Note, the ; command separator is not allowed for the commands 839b50d902SRodney W. Grimes a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 849b50d902SRodney W. Grimes command. This implementation follows historic practice and 859b50d902SRodney W. Grimes implements the ; separator. 869b50d902SRodney W. Grimes 879b50d902SRodney W. Grimes 9. Historic versions of sed terminated the script if EOF was reached 889b50d902SRodney W. Grimes during the execution of the 'n' command, i.e.: 899b50d902SRodney W. Grimes 909b50d902SRodney W. Grimes sed -e ' 919b50d902SRodney W. Grimes n 929b50d902SRodney W. Grimes i\ 939b50d902SRodney W. Grimes hello 949b50d902SRodney W. Grimes ' </dev/null 959b50d902SRodney W. Grimes 969b50d902SRodney W. Grimes did not produce any output. POSIX does not specify this behavior. 979b50d902SRodney W. Grimes This implementation follows historic practice. 989b50d902SRodney W. Grimes 999b50d902SRodney W. Grimes10. Deleted. 1009b50d902SRodney W. Grimes 1019b50d902SRodney W. Grimes11. Historical implementations do not output the change text of a c 1029b50d902SRodney W. Grimes command in the case of an address range whose first line number 1039b50d902SRodney W. Grimes is greater than the second (e.g. 3,1). POSIX requires that the 1049b50d902SRodney W. Grimes text be output. Since the historic behavior doesn't seem to have 1059b50d902SRodney W. Grimes any particular purpose, this implementation follows the POSIX 1069b50d902SRodney W. Grimes behavior. 1079b50d902SRodney W. Grimes 1089b50d902SRodney W. Grimes12. POSIX does not specify whether address ranges are checked and 1099b50d902SRodney W. Grimes reset if a command is not executed due to a jump. The following 1109b50d902SRodney W. Grimes program will behave in different ways depending on whether the 1119b50d902SRodney W. Grimes 'c' command is triggered at the third line, i.e. will the text 1129b50d902SRodney W. Grimes be output even though line 3 of the input will never logically 1139b50d902SRodney W. Grimes encounter that command. 1149b50d902SRodney W. Grimes 1159b50d902SRodney W. Grimes 2,4b 1169b50d902SRodney W. Grimes 1,3c\ 1179b50d902SRodney W. Grimes text 1189b50d902SRodney W. Grimes 1190d68e7feSYaroslav Tykhiy Historic implementations did not output the text in the above 1200d68e7feSYaroslav Tykhiy example. Therefore it was believed that a range whose second 1210d68e7feSYaroslav Tykhiy address was never matched extended to the end of the input. 1220d68e7feSYaroslav Tykhiy However, the current practice adopted by this implementation, 1230d68e7feSYaroslav Tykhiy as well as by those from GNU and SUN, is as follows: The text 1240d68e7feSYaroslav Tykhiy from the 'c' command still isn't output because the second address 1254ebdfcd2SYaroslav Tykhiy isn't actually matched; but the range is reset after all if its 1264ebdfcd2SYaroslav Tykhiy second address is a line number. In the above example, only the 1274ebdfcd2SYaroslav Tykhiy first line of the input will be deleted. 1289b50d902SRodney W. Grimes 1299b50d902SRodney W. Grimes13. Historical implementations allow an output suppressing #n at the 1309b50d902SRodney W. Grimes beginning of -e arguments as well as in a script file. POSIX 1319b50d902SRodney W. Grimes does not specify this. This implementation follows historical 1329b50d902SRodney W. Grimes practice. 1339b50d902SRodney W. Grimes 1349b50d902SRodney W. Grimes14. POSIX does not explicitly specify how sed behaves if no script is 1359b50d902SRodney W. Grimes specified. Since the sed Synopsis permits this form of the command, 1369b50d902SRodney W. Grimes and the language in the Description section states that the input 1379b50d902SRodney W. Grimes is output, it seems reasonable that it behave like the cat(1) 1389b50d902SRodney W. Grimes command. Historic sed implementations behave differently for "ls | 1399b50d902SRodney W. Grimes sed", where they produce no output, and "ls | sed -e#", where they 1409b50d902SRodney W. Grimes behave like cat. This implementation behaves like cat in both cases. 1419b50d902SRodney W. Grimes 1429b50d902SRodney W. Grimes15. The POSIX requirement to open all w files at the beginning makes 1439b50d902SRodney W. Grimes sed behave nonintuitively when the w commands are preceded by 1449b50d902SRodney W. Grimes addresses or are within conditional blocks. This implementation 1459b50d902SRodney W. Grimes follows historic practice and POSIX, by default, and provides the 1469b50d902SRodney W. Grimes -a option which opens the files only when they are needed. 1479b50d902SRodney W. Grimes 1489b50d902SRodney W. Grimes16. POSIX does not specify how escape sequences other than \n and \D 1499b50d902SRodney W. Grimes (where D is the delimiter character) are to be treated. This is 1509b50d902SRodney W. Grimes reasonable, however, it also doesn't state that the backslash is 1519b50d902SRodney W. Grimes to be discarded from the output regardless. A strict reading of 1529b50d902SRodney W. Grimes POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 1539b50d902SRodney W. Grimes As historic sed implementations always discarded the backslash, 1549b50d902SRodney W. Grimes this implementation does as well. 1559b50d902SRodney W. Grimes 1569b50d902SRodney W. Grimes17. POSIX specifies that an address can be "empty". This implies 1579b50d902SRodney W. Grimes that constructs like ",d" or "1,d" and ",5d" are allowed. This 1589b50d902SRodney W. Grimes is not true for historic implementations or this implementation 1599b50d902SRodney W. Grimes of sed. 1609b50d902SRodney W. Grimes 1619b50d902SRodney W. Grimes18. The b t and : commands are documented in POSIX to ignore leading 1629b50d902SRodney W. Grimes white space, but no mention is made of trailing white space. 1639b50d902SRodney W. Grimes Historic implementations of sed assigned different locations to 1649b50d902SRodney W. Grimes the labels "x" and "x ". This is not useful, and leads to subtle 1659b50d902SRodney W. Grimes programming errors, but it is historic practice and changing it 1669b50d902SRodney W. Grimes could theoretically break working scripts. This implementation 1679b50d902SRodney W. Grimes follows historic practice. 1689b50d902SRodney W. Grimes 1699b50d902SRodney W. Grimes19. Although POSIX specifies that reading from files that do not exist 1709b50d902SRodney W. Grimes from within the script must not terminate the script, it does not 1719b50d902SRodney W. Grimes specify what happens if a write command fails. Historic practice 1729b50d902SRodney W. Grimes is to fail immediately if the file cannot be opened or written. 1739b50d902SRodney W. Grimes This implementation follows historic practice. 1749b50d902SRodney W. Grimes 1759b50d902SRodney W. Grimes20. Historic practice is that the \n construct can be used for either 1769b50d902SRodney W. Grimes string1 or string2 of the y command. This is not specified by 1779b50d902SRodney W. Grimes POSIX. This implementation follows historic practice. 1789b50d902SRodney W. Grimes 1799b50d902SRodney W. Grimes21. Deleted. 1809b50d902SRodney W. Grimes 1819b50d902SRodney W. Grimes22. Historic implementations of sed ignore the RE delimiter characters 1829b50d902SRodney W. Grimes within character classes. This is not specified in POSIX. This 1839b50d902SRodney W. Grimes implementation follows historic practice. 1849b50d902SRodney W. Grimes 1859b50d902SRodney W. Grimes23. Historic implementations handle empty RE's in a special way: the 1869b50d902SRodney W. Grimes empty RE is interpreted as if it were the last RE encountered, 1879b50d902SRodney W. Grimes whether in an address or elsewhere. POSIX does not document this 1889b50d902SRodney W. Grimes behavior. For example the command: 1899b50d902SRodney W. Grimes 1909b50d902SRodney W. Grimes sed -e /abc/s//XXX/ 1919b50d902SRodney W. Grimes 1929b50d902SRodney W. Grimes substitutes XXX for the pattern abc. The semantics of "the last 1939b50d902SRodney W. Grimes RE" can be defined in two different ways: 1949b50d902SRodney W. Grimes 1959b50d902SRodney W. Grimes 1. The last RE encountered when compiling (lexical/static scope). 1969b50d902SRodney W. Grimes 2. The last RE encountered while running (dynamic scope). 1979b50d902SRodney W. Grimes 1989b50d902SRodney W. Grimes While many historical implementations fail on programs depending 1999b50d902SRodney W. Grimes on scope differences, the SunOS version exhibited dynamic scope 2009b50d902SRodney W. Grimes behaviour. This implementation does dynamic scoping, as this seems 2019b50d902SRodney W. Grimes the most useful and in order to remain consistent with historical 2029b50d902SRodney W. Grimes practice. 203