xref: /freebsd/usr.bin/sed/POSIX (revision 0d68e7fead93aae38d9c2ea28efee6e3a242844e)
19b50d902SRodney W. Grimes#	@(#)POSIX	8.1 (Berkeley) 6/6/93
20d68e7feSYaroslav Tykhiy#	$FreeBSD$
39b50d902SRodney W. Grimes
49b50d902SRodney W. GrimesComments on the IEEE P1003.2 Draft 12
59b50d902SRodney W. Grimes     Part 2: Shell and Utilities
69b50d902SRodney W. Grimes  Section 4.55: sed - Stream editor
79b50d902SRodney W. Grimes
89b50d902SRodney W. GrimesDiomidis Spinellis <dds@doc.ic.ac.uk>
99b50d902SRodney W. GrimesKeith Bostic <bostic@cs.berkeley.edu>
109b50d902SRodney W. Grimes
119b50d902SRodney W. GrimesIn the following paragraphs, "wrong" usually means "inconsistent with
129b50d902SRodney W. Grimeshistoric practice", as most of the following comments refer to
139b50d902SRodney W. Grimesundocumented inconsistencies between the historical versions of sed and
149b50d902SRodney W. Grimesthe POSIX 1003.2 standard.  All the comments are notes taken while
159b50d902SRodney W. Grimesimplementing a POSIX-compatible version of sed, and should not be
169b50d902SRodney W. Grimesinterpreted as official opinions or criticism towards the POSIX committee.
179b50d902SRodney W. GrimesAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
189b50d902SRodney W. Grimes
199b50d902SRodney W. Grimes 1.	32V and BSD derived implementations of sed strip the text
209b50d902SRodney W. Grimes	arguments of the a, c and i commands of their initial blanks,
219b50d902SRodney W. Grimes	i.e.
229b50d902SRodney W. Grimes
239b50d902SRodney W. Grimes	#!/bin/sed -f
249b50d902SRodney W. Grimes	a\
259b50d902SRodney W. Grimes		foo\
269b50d902SRodney W. Grimes		\  indent\
279b50d902SRodney W. Grimes		bar
289b50d902SRodney W. Grimes
299b50d902SRodney W. Grimes	produces:
309b50d902SRodney W. Grimes
319b50d902SRodney W. Grimes	foo
329b50d902SRodney W. Grimes	  indent
339b50d902SRodney W. Grimes	bar
349b50d902SRodney W. Grimes
359b50d902SRodney W. Grimes	POSIX does not specify this behavior as the System V versions of
369b50d902SRodney W. Grimes	sed do not do this stripping.  The argument against stripping is
379b50d902SRodney W. Grimes	that it is difficult to write sed scripts that have leading blanks
389b50d902SRodney W. Grimes	if they are stripped.  The argument for stripping is that it is
399b50d902SRodney W. Grimes	difficult to write readable sed scripts unless indentation is allowed
409b50d902SRodney W. Grimes	and ignored, and leading whitespace is obtainable by entering a
419b50d902SRodney W. Grimes	backslash in front of it.  This implementation follows the BSD
429b50d902SRodney W. Grimes	historic practice.
439b50d902SRodney W. Grimes
449b50d902SRodney W. Grimes 2.	Historical versions of sed required that the w flag be the last
459b50d902SRodney W. Grimes	flag to an s command as it takes an additional argument.  This
469b50d902SRodney W. Grimes	is obvious, but not specified in POSIX.
479b50d902SRodney W. Grimes
489b50d902SRodney W. Grimes 3.	Historical versions of sed required that whitespace follow a w
499b50d902SRodney W. Grimes	flag to an s command.  This is not specified in POSIX.  This
509b50d902SRodney W. Grimes	implementation permits whitespace but does not require it.
519b50d902SRodney W. Grimes
529b50d902SRodney W. Grimes 4.	Historical versions of sed permitted any number of whitespace
539b50d902SRodney W. Grimes	characters to follow the w command.  This is not specified in
549b50d902SRodney W. Grimes	POSIX.  This implementation permits whitespace but does not
559b50d902SRodney W. Grimes	require it.
569b50d902SRodney W. Grimes
579b50d902SRodney W. Grimes 5.	The rule for the l command differs from historic practice.  Table
589b50d902SRodney W. Grimes	2-15 includes the various ANSI C escape sequences, including \\
599b50d902SRodney W. Grimes	for backslash.  Some historical versions of sed displayed two
609b50d902SRodney W. Grimes	digit octal numbers, too, not three as specified by POSIX.  POSIX
619b50d902SRodney W. Grimes	is a cleanup, and is followed by this implementation.
629b50d902SRodney W. Grimes
639b50d902SRodney W. Grimes 6.	The POSIX specification for ! does not specify that for a single
649b50d902SRodney W. Grimes	command the command must not contain an address specification
659b50d902SRodney W. Grimes	whereas the command list can contain address specifications.  The
669b50d902SRodney W. Grimes	specification for ! implies that "3!/hello/p" works, and it never
679b50d902SRodney W. Grimes	has, historically.  Note,
689b50d902SRodney W. Grimes
699b50d902SRodney W. Grimes		3!{
709b50d902SRodney W. Grimes			/hello/p
719b50d902SRodney W. Grimes		}
729b50d902SRodney W. Grimes
739b50d902SRodney W. Grimes	does work.
749b50d902SRodney W. Grimes
759b50d902SRodney W. Grimes 7.	POSIX does not specify what happens with consecutive ! commands
769b50d902SRodney W. Grimes	(e.g. /foo/!!!p).  Historic implementations allow any number of
779b50d902SRodney W. Grimes	!'s without changing the behaviour.  (It seems logical that each
789b50d902SRodney W. Grimes	one might reverse the behaviour.)  This implementation follows
799b50d902SRodney W. Grimes	historic practice.
809b50d902SRodney W. Grimes
819b50d902SRodney W. Grimes 8.	Historic versions of sed permitted commands to be separated
829b50d902SRodney W. Grimes	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
839b50d902SRodney W. Grimes	three lines of a file.  This is not specified by POSIX.
849b50d902SRodney W. Grimes	Note, the ; command separator is not allowed for the commands
859b50d902SRodney W. Grimes	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
869b50d902SRodney W. Grimes	command.  This implementation follows historic practice and
879b50d902SRodney W. Grimes	implements the ; separator.
889b50d902SRodney W. Grimes
899b50d902SRodney W. Grimes 9.	Historic versions of sed terminated the script if EOF was reached
909b50d902SRodney W. Grimes	during the execution of the 'n' command, i.e.:
919b50d902SRodney W. Grimes
929b50d902SRodney W. Grimes	sed -e '
939b50d902SRodney W. Grimes	n
949b50d902SRodney W. Grimes	i\
959b50d902SRodney W. Grimes	hello
969b50d902SRodney W. Grimes	' </dev/null
979b50d902SRodney W. Grimes
989b50d902SRodney W. Grimes	did not produce any output.  POSIX does not specify this behavior.
999b50d902SRodney W. Grimes	This implementation follows historic practice.
1009b50d902SRodney W. Grimes
1019b50d902SRodney W. Grimes10.	Deleted.
1029b50d902SRodney W. Grimes
1039b50d902SRodney W. Grimes11.	Historical implementations do not output the change text of a c
1049b50d902SRodney W. Grimes	command in the case of an address range whose first line number
1059b50d902SRodney W. Grimes	is greater than the second (e.g. 3,1).  POSIX requires that the
1069b50d902SRodney W. Grimes	text be output.  Since the historic behavior doesn't seem to have
1079b50d902SRodney W. Grimes	any particular purpose, this implementation follows the POSIX
1089b50d902SRodney W. Grimes	behavior.
1099b50d902SRodney W. Grimes
1109b50d902SRodney W. Grimes12.	POSIX does not specify whether address ranges are checked and
1119b50d902SRodney W. Grimes	reset if a command is not executed due to a jump.  The following
1129b50d902SRodney W. Grimes	program will behave in different ways depending on whether the
1139b50d902SRodney W. Grimes	'c' command is triggered at the third line, i.e. will the text
1149b50d902SRodney W. Grimes	be output even though line 3 of the input will never logically
1159b50d902SRodney W. Grimes	encounter that command.
1169b50d902SRodney W. Grimes
1179b50d902SRodney W. Grimes	2,4b
1189b50d902SRodney W. Grimes	1,3c\
1199b50d902SRodney W. Grimes		text
1209b50d902SRodney W. Grimes
1210d68e7feSYaroslav Tykhiy	Historic implementations did not output the text in the above
1220d68e7feSYaroslav Tykhiy	example.  Therefore it was believed that a range whose second
1230d68e7feSYaroslav Tykhiy	address was never matched extended to the end of the input.
1240d68e7feSYaroslav Tykhiy	However, the current practice adopted by this implementation,
1250d68e7feSYaroslav Tykhiy	as well as by those from GNU and SUN, is as follows:  The text
1260d68e7feSYaroslav Tykhiy	from the 'c' command still isn't output because the second address
1270d68e7feSYaroslav Tykhiy	isn't actually matched; but the range is reset after all.  In the
1280d68e7feSYaroslav Tykhiy	above example, only the first line of the input will be deleted.
1299b50d902SRodney W. Grimes
1309b50d902SRodney W. Grimes13.	Historical implementations allow an output suppressing #n at the
1319b50d902SRodney W. Grimes	beginning of -e arguments as well as in a script file.  POSIX
1329b50d902SRodney W. Grimes	does not specify this.  This implementation follows historical
1339b50d902SRodney W. Grimes	practice.
1349b50d902SRodney W. Grimes
1359b50d902SRodney W. Grimes14.	POSIX does not explicitly specify how sed behaves if no script is
1369b50d902SRodney W. Grimes	specified.  Since the sed Synopsis permits this form of the command,
1379b50d902SRodney W. Grimes	and the language in the Description section states that the input
1389b50d902SRodney W. Grimes	is output, it seems reasonable that it behave like the cat(1)
1399b50d902SRodney W. Grimes	command.  Historic sed implementations behave differently for "ls |
1409b50d902SRodney W. Grimes	sed", where they produce no output, and "ls | sed -e#", where they
1419b50d902SRodney W. Grimes	behave like cat.  This implementation behaves like cat in both cases.
1429b50d902SRodney W. Grimes
1439b50d902SRodney W. Grimes15.	The POSIX requirement to open all w files at the beginning makes
1449b50d902SRodney W. Grimes	sed behave nonintuitively when the w commands are preceded by
1459b50d902SRodney W. Grimes	addresses or are within conditional blocks.  This implementation
1469b50d902SRodney W. Grimes	follows historic practice and POSIX, by default, and provides the
1479b50d902SRodney W. Grimes	-a option which opens the files only when they are needed.
1489b50d902SRodney W. Grimes
1499b50d902SRodney W. Grimes16.	POSIX does not specify how escape sequences other than \n and \D
1509b50d902SRodney W. Grimes	(where D is the delimiter character) are to be treated.  This is
1519b50d902SRodney W. Grimes	reasonable, however, it also doesn't state that the backslash is
1529b50d902SRodney W. Grimes	to be discarded from the output regardless.  A strict reading of
1539b50d902SRodney W. Grimes	POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
1549b50d902SRodney W. Grimes	As historic sed implementations always discarded the backslash,
1559b50d902SRodney W. Grimes	this implementation does as well.
1569b50d902SRodney W. Grimes
1579b50d902SRodney W. Grimes17.	POSIX specifies that an address can be "empty".  This implies
1589b50d902SRodney W. Grimes	that constructs like ",d" or "1,d" and ",5d" are allowed.  This
1599b50d902SRodney W. Grimes	is not true for historic implementations or this implementation
1609b50d902SRodney W. Grimes	of sed.
1619b50d902SRodney W. Grimes
1629b50d902SRodney W. Grimes18.	The b t and : commands are documented in POSIX to ignore leading
1639b50d902SRodney W. Grimes	white space, but no mention is made of trailing white space.
1649b50d902SRodney W. Grimes	Historic implementations of sed assigned different locations to
1659b50d902SRodney W. Grimes	the labels "x" and "x ".  This is not useful, and leads to subtle
1669b50d902SRodney W. Grimes	programming errors, but it is historic practice and changing it
1679b50d902SRodney W. Grimes	could theoretically break working scripts.  This implementation
1689b50d902SRodney W. Grimes	follows historic practice.
1699b50d902SRodney W. Grimes
1709b50d902SRodney W. Grimes19.	Although POSIX specifies that reading from files that do not exist
1719b50d902SRodney W. Grimes	from within the script must not terminate the script, it does not
1729b50d902SRodney W. Grimes	specify what happens if a write command fails.  Historic practice
1739b50d902SRodney W. Grimes	is to fail immediately if the file cannot be opened or written.
1749b50d902SRodney W. Grimes	This implementation follows historic practice.
1759b50d902SRodney W. Grimes
1769b50d902SRodney W. Grimes20.	Historic practice is that the \n construct can be used for either
1779b50d902SRodney W. Grimes	string1 or string2 of the y command.  This is not specified by
1789b50d902SRodney W. Grimes	POSIX.  This implementation follows historic practice.
1799b50d902SRodney W. Grimes
1809b50d902SRodney W. Grimes21.	Deleted.
1819b50d902SRodney W. Grimes
1829b50d902SRodney W. Grimes22.	Historic implementations of sed ignore the RE delimiter characters
1839b50d902SRodney W. Grimes	within character classes.  This is not specified in POSIX.  This
1849b50d902SRodney W. Grimes	implementation follows historic practice.
1859b50d902SRodney W. Grimes
1869b50d902SRodney W. Grimes23.	Historic implementations handle empty RE's in a special way: the
1879b50d902SRodney W. Grimes	empty RE is interpreted as if it were the last RE encountered,
1889b50d902SRodney W. Grimes	whether in an address or elsewhere.  POSIX does not document this
1899b50d902SRodney W. Grimes	behavior.  For example the command:
1909b50d902SRodney W. Grimes
1919b50d902SRodney W. Grimes		sed -e /abc/s//XXX/
1929b50d902SRodney W. Grimes
1939b50d902SRodney W. Grimes	substitutes XXX for the pattern abc.  The semantics of "the last
1949b50d902SRodney W. Grimes	RE" can be defined in two different ways:
1959b50d902SRodney W. Grimes
1969b50d902SRodney W. Grimes	1. The last RE encountered when compiling (lexical/static scope).
1979b50d902SRodney W. Grimes	2. The last RE encountered while running (dynamic scope).
1989b50d902SRodney W. Grimes
1999b50d902SRodney W. Grimes	While many historical implementations fail on programs depending
2009b50d902SRodney W. Grimes	on scope differences, the SunOS version exhibited dynamic scope
2019b50d902SRodney W. Grimes	behaviour.  This implementation does dynamic scoping, as this seems
2029b50d902SRodney W. Grimes	the most useful and in order to remain consistent with historical
2039b50d902SRodney W. Grimes	practice.
204