xref: /freebsd/usr.bin/sed/POSIX (revision bdcbfde31e8e9b343f113a1956384bdf30d1ed62)
19b50d902SRodney W. Grimes
29b50d902SRodney W. GrimesComments on the IEEE P1003.2 Draft 12
39b50d902SRodney W. Grimes     Part 2: Shell and Utilities
49b50d902SRodney W. Grimes  Section 4.55: sed - Stream editor
59b50d902SRodney W. Grimes
69b50d902SRodney W. GrimesDiomidis Spinellis <dds@doc.ic.ac.uk>
79b50d902SRodney W. GrimesKeith Bostic <bostic@cs.berkeley.edu>
89b50d902SRodney W. Grimes
99b50d902SRodney W. GrimesIn the following paragraphs, "wrong" usually means "inconsistent with
109b50d902SRodney W. Grimeshistoric practice", as most of the following comments refer to
119b50d902SRodney W. Grimesundocumented inconsistencies between the historical versions of sed and
129b50d902SRodney W. Grimesthe POSIX 1003.2 standard.  All the comments are notes taken while
139b50d902SRodney W. Grimesimplementing a POSIX-compatible version of sed, and should not be
149b50d902SRodney W. Grimesinterpreted as official opinions or criticism towards the POSIX committee.
159b50d902SRodney W. GrimesAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
169b50d902SRodney W. Grimes
179b50d902SRodney W. Grimes 1.	32V and BSD derived implementations of sed strip the text
189b50d902SRodney W. Grimes	arguments of the a, c and i commands of their initial blanks,
199b50d902SRodney W. Grimes	i.e.
209b50d902SRodney W. Grimes
219b50d902SRodney W. Grimes	#!/bin/sed -f
229b50d902SRodney W. Grimes	a\
239b50d902SRodney W. Grimes		foo\
249b50d902SRodney W. Grimes		\  indent\
259b50d902SRodney W. Grimes		bar
269b50d902SRodney W. Grimes
279b50d902SRodney W. Grimes	produces:
289b50d902SRodney W. Grimes
299b50d902SRodney W. Grimes	foo
309b50d902SRodney W. Grimes	  indent
319b50d902SRodney W. Grimes	bar
329b50d902SRodney W. Grimes
339b50d902SRodney W. Grimes	POSIX does not specify this behavior as the System V versions of
349b50d902SRodney W. Grimes	sed do not do this stripping.  The argument against stripping is
359b50d902SRodney W. Grimes	that it is difficult to write sed scripts that have leading blanks
369b50d902SRodney W. Grimes	if they are stripped.  The argument for stripping is that it is
379b50d902SRodney W. Grimes	difficult to write readable sed scripts unless indentation is allowed
389b50d902SRodney W. Grimes	and ignored, and leading whitespace is obtainable by entering a
399b50d902SRodney W. Grimes	backslash in front of it.  This implementation follows the BSD
409b50d902SRodney W. Grimes	historic practice.
419b50d902SRodney W. Grimes
429b50d902SRodney W. Grimes 2.	Historical versions of sed required that the w flag be the last
439b50d902SRodney W. Grimes	flag to an s command as it takes an additional argument.  This
449b50d902SRodney W. Grimes	is obvious, but not specified in POSIX.
459b50d902SRodney W. Grimes
469b50d902SRodney W. Grimes 3.	Historical versions of sed required that whitespace follow a w
479b50d902SRodney W. Grimes	flag to an s command.  This is not specified in POSIX.  This
489b50d902SRodney W. Grimes	implementation permits whitespace but does not require it.
499b50d902SRodney W. Grimes
509b50d902SRodney W. Grimes 4.	Historical versions of sed permitted any number of whitespace
519b50d902SRodney W. Grimes	characters to follow the w command.  This is not specified in
529b50d902SRodney W. Grimes	POSIX.  This implementation permits whitespace but does not
539b50d902SRodney W. Grimes	require it.
549b50d902SRodney W. Grimes
559b50d902SRodney W. Grimes 5.	The rule for the l command differs from historic practice.  Table
569b50d902SRodney W. Grimes	2-15 includes the various ANSI C escape sequences, including \\
579b50d902SRodney W. Grimes	for backslash.  Some historical versions of sed displayed two
589b50d902SRodney W. Grimes	digit octal numbers, too, not three as specified by POSIX.  POSIX
599b50d902SRodney W. Grimes	is a cleanup, and is followed by this implementation.
609b50d902SRodney W. Grimes
619b50d902SRodney W. Grimes 6.	The POSIX specification for ! does not specify that for a single
629b50d902SRodney W. Grimes	command the command must not contain an address specification
639b50d902SRodney W. Grimes	whereas the command list can contain address specifications.  The
649b50d902SRodney W. Grimes	specification for ! implies that "3!/hello/p" works, and it never
659b50d902SRodney W. Grimes	has, historically.  Note,
669b50d902SRodney W. Grimes
679b50d902SRodney W. Grimes		3!{
689b50d902SRodney W. Grimes			/hello/p
699b50d902SRodney W. Grimes		}
709b50d902SRodney W. Grimes
719b50d902SRodney W. Grimes	does work.
729b50d902SRodney W. Grimes
739b50d902SRodney W. Grimes 7.	POSIX does not specify what happens with consecutive ! commands
749b50d902SRodney W. Grimes	(e.g. /foo/!!!p).  Historic implementations allow any number of
759b50d902SRodney W. Grimes	!'s without changing the behaviour.  (It seems logical that each
769b50d902SRodney W. Grimes	one might reverse the behaviour.)  This implementation follows
779b50d902SRodney W. Grimes	historic practice.
789b50d902SRodney W. Grimes
799b50d902SRodney W. Grimes 8.	Historic versions of sed permitted commands to be separated
809b50d902SRodney W. Grimes	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
819b50d902SRodney W. Grimes	three lines of a file.  This is not specified by POSIX.
829b50d902SRodney W. Grimes	Note, the ; command separator is not allowed for the commands
839b50d902SRodney W. Grimes	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
849b50d902SRodney W. Grimes	command.  This implementation follows historic practice and
859b50d902SRodney W. Grimes	implements the ; separator.
869b50d902SRodney W. Grimes
879b50d902SRodney W. Grimes 9.	Historic versions of sed terminated the script if EOF was reached
889b50d902SRodney W. Grimes	during the execution of the 'n' command, i.e.:
899b50d902SRodney W. Grimes
909b50d902SRodney W. Grimes	sed -e '
919b50d902SRodney W. Grimes	n
929b50d902SRodney W. Grimes	i\
939b50d902SRodney W. Grimes	hello
949b50d902SRodney W. Grimes	' </dev/null
959b50d902SRodney W. Grimes
969b50d902SRodney W. Grimes	did not produce any output.  POSIX does not specify this behavior.
979b50d902SRodney W. Grimes	This implementation follows historic practice.
989b50d902SRodney W. Grimes
999b50d902SRodney W. Grimes10.	Deleted.
1009b50d902SRodney W. Grimes
1019b50d902SRodney W. Grimes11.	Historical implementations do not output the change text of a c
1029b50d902SRodney W. Grimes	command in the case of an address range whose first line number
1039b50d902SRodney W. Grimes	is greater than the second (e.g. 3,1).  POSIX requires that the
1049b50d902SRodney W. Grimes	text be output.  Since the historic behavior doesn't seem to have
1059b50d902SRodney W. Grimes	any particular purpose, this implementation follows the POSIX
1069b50d902SRodney W. Grimes	behavior.
1079b50d902SRodney W. Grimes
1089b50d902SRodney W. Grimes12.	POSIX does not specify whether address ranges are checked and
1099b50d902SRodney W. Grimes	reset if a command is not executed due to a jump.  The following
1109b50d902SRodney W. Grimes	program will behave in different ways depending on whether the
1119b50d902SRodney W. Grimes	'c' command is triggered at the third line, i.e. will the text
1129b50d902SRodney W. Grimes	be output even though line 3 of the input will never logically
1139b50d902SRodney W. Grimes	encounter that command.
1149b50d902SRodney W. Grimes
1159b50d902SRodney W. Grimes	2,4b
1169b50d902SRodney W. Grimes	1,3c\
1179b50d902SRodney W. Grimes		text
1189b50d902SRodney W. Grimes
1190d68e7feSYaroslav Tykhiy	Historic implementations did not output the text in the above
1200d68e7feSYaroslav Tykhiy	example.  Therefore it was believed that a range whose second
1210d68e7feSYaroslav Tykhiy	address was never matched extended to the end of the input.
1220d68e7feSYaroslav Tykhiy	However, the current practice adopted by this implementation,
1230d68e7feSYaroslav Tykhiy	as well as by those from GNU and SUN, is as follows:  The text
1240d68e7feSYaroslav Tykhiy	from the 'c' command still isn't output because the second address
1254ebdfcd2SYaroslav Tykhiy	isn't actually matched; but the range is reset after all if its
1264ebdfcd2SYaroslav Tykhiy	second address is a line number.  In the above example, only the
1274ebdfcd2SYaroslav Tykhiy	first line of the input will be deleted.
1289b50d902SRodney W. Grimes
1299b50d902SRodney W. Grimes13.	Historical implementations allow an output suppressing #n at the
1309b50d902SRodney W. Grimes	beginning of -e arguments as well as in a script file.  POSIX
1319b50d902SRodney W. Grimes	does not specify this.  This implementation follows historical
1329b50d902SRodney W. Grimes	practice.
1339b50d902SRodney W. Grimes
1349b50d902SRodney W. Grimes14.	POSIX does not explicitly specify how sed behaves if no script is
1359b50d902SRodney W. Grimes	specified.  Since the sed Synopsis permits this form of the command,
1369b50d902SRodney W. Grimes	and the language in the Description section states that the input
1379b50d902SRodney W. Grimes	is output, it seems reasonable that it behave like the cat(1)
1389b50d902SRodney W. Grimes	command.  Historic sed implementations behave differently for "ls |
1399b50d902SRodney W. Grimes	sed", where they produce no output, and "ls | sed -e#", where they
1409b50d902SRodney W. Grimes	behave like cat.  This implementation behaves like cat in both cases.
1419b50d902SRodney W. Grimes
1429b50d902SRodney W. Grimes15.	The POSIX requirement to open all w files at the beginning makes
1439b50d902SRodney W. Grimes	sed behave nonintuitively when the w commands are preceded by
1449b50d902SRodney W. Grimes	addresses or are within conditional blocks.  This implementation
1459b50d902SRodney W. Grimes	follows historic practice and POSIX, by default, and provides the
1469b50d902SRodney W. Grimes	-a option which opens the files only when they are needed.
1479b50d902SRodney W. Grimes
1489b50d902SRodney W. Grimes16.	POSIX does not specify how escape sequences other than \n and \D
1499b50d902SRodney W. Grimes	(where D is the delimiter character) are to be treated.  This is
1509b50d902SRodney W. Grimes	reasonable, however, it also doesn't state that the backslash is
1519b50d902SRodney W. Grimes	to be discarded from the output regardless.  A strict reading of
1529b50d902SRodney W. Grimes	POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
1539b50d902SRodney W. Grimes	As historic sed implementations always discarded the backslash,
1549b50d902SRodney W. Grimes	this implementation does as well.
1559b50d902SRodney W. Grimes
1569b50d902SRodney W. Grimes17.	POSIX specifies that an address can be "empty".  This implies
1579b50d902SRodney W. Grimes	that constructs like ",d" or "1,d" and ",5d" are allowed.  This
1589b50d902SRodney W. Grimes	is not true for historic implementations or this implementation
1599b50d902SRodney W. Grimes	of sed.
1609b50d902SRodney W. Grimes
1619b50d902SRodney W. Grimes18.	The b t and : commands are documented in POSIX to ignore leading
1629b50d902SRodney W. Grimes	white space, but no mention is made of trailing white space.
1639b50d902SRodney W. Grimes	Historic implementations of sed assigned different locations to
1649b50d902SRodney W. Grimes	the labels "x" and "x ".  This is not useful, and leads to subtle
1659b50d902SRodney W. Grimes	programming errors, but it is historic practice and changing it
1669b50d902SRodney W. Grimes	could theoretically break working scripts.  This implementation
1679b50d902SRodney W. Grimes	follows historic practice.
1689b50d902SRodney W. Grimes
1699b50d902SRodney W. Grimes19.	Although POSIX specifies that reading from files that do not exist
1709b50d902SRodney W. Grimes	from within the script must not terminate the script, it does not
1719b50d902SRodney W. Grimes	specify what happens if a write command fails.  Historic practice
1729b50d902SRodney W. Grimes	is to fail immediately if the file cannot be opened or written.
1739b50d902SRodney W. Grimes	This implementation follows historic practice.
1749b50d902SRodney W. Grimes
1759b50d902SRodney W. Grimes20.	Historic practice is that the \n construct can be used for either
1769b50d902SRodney W. Grimes	string1 or string2 of the y command.  This is not specified by
1779b50d902SRodney W. Grimes	POSIX.  This implementation follows historic practice.
1789b50d902SRodney W. Grimes
1799b50d902SRodney W. Grimes21.	Deleted.
1809b50d902SRodney W. Grimes
1819b50d902SRodney W. Grimes22.	Historic implementations of sed ignore the RE delimiter characters
1829b50d902SRodney W. Grimes	within character classes.  This is not specified in POSIX.  This
1839b50d902SRodney W. Grimes	implementation follows historic practice.
1849b50d902SRodney W. Grimes
1859b50d902SRodney W. Grimes23.	Historic implementations handle empty RE's in a special way: the
1869b50d902SRodney W. Grimes	empty RE is interpreted as if it were the last RE encountered,
1879b50d902SRodney W. Grimes	whether in an address or elsewhere.  POSIX does not document this
1889b50d902SRodney W. Grimes	behavior.  For example the command:
1899b50d902SRodney W. Grimes
1909b50d902SRodney W. Grimes		sed -e /abc/s//XXX/
1919b50d902SRodney W. Grimes
1929b50d902SRodney W. Grimes	substitutes XXX for the pattern abc.  The semantics of "the last
1939b50d902SRodney W. Grimes	RE" can be defined in two different ways:
1949b50d902SRodney W. Grimes
1959b50d902SRodney W. Grimes	1. The last RE encountered when compiling (lexical/static scope).
1969b50d902SRodney W. Grimes	2. The last RE encountered while running (dynamic scope).
1979b50d902SRodney W. Grimes
1989b50d902SRodney W. Grimes	While many historical implementations fail on programs depending
1999b50d902SRodney W. Grimes	on scope differences, the SunOS version exhibited dynamic scope
2009b50d902SRodney W. Grimes	behaviour.  This implementation does dynamic scoping, as this seems
2019b50d902SRodney W. Grimes	the most useful and in order to remain consistent with historical
2029b50d902SRodney W. Grimes	practice.
203