xref: /freebsd/usr.bin/tr/tr.1 (revision 884d26c84cba3ffc3d4e626306098fcdfe6a0c2b)
1.\" Copyright (c) 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" the Institute of Electrical and Electronics Engineers, Inc.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 4. Neither the name of the University nor the names of its contributors
16.\"    may be used to endorse or promote products derived from this software
17.\"    without specific prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
29.\" SUCH DAMAGE.
30.\"
31.\"     @(#)tr.1	8.1 (Berkeley) 6/6/93
32.\" $FreeBSD$
33.\"
34.Dd October 13, 2006
35.Dt TR 1
36.Os
37.Sh NAME
38.Nm tr
39.Nd translate characters
40.Sh SYNOPSIS
41.Nm
42.Op Fl Ccsu
43.Ar string1 string2
44.Nm
45.Op Fl Ccu
46.Fl d
47.Ar string1
48.Nm
49.Op Fl Ccu
50.Fl s
51.Ar string1
52.Nm
53.Op Fl Ccu
54.Fl ds
55.Ar string1 string2
56.Sh DESCRIPTION
57The
58.Nm
59utility copies the standard input to the standard output with substitution
60or deletion of selected characters.
61.Pp
62The following options are available:
63.Bl -tag -width Ds
64.It Fl C
65Complement the set of characters in
66.Ar string1 ,
67that is
68.Dq Fl C Li ab
69includes every character except for
70.Ql a
71and
72.Ql b .
73.It Fl c
74Same as
75.Fl C
76but complement the set of values in
77.Ar string1 .
78.It Fl d
79Delete characters in
80.Ar string1
81from the input.
82.It Fl s
83Squeeze multiple occurrences of the characters listed in the last
84operand (either
85.Ar string1
86or
87.Ar string2 )
88in the input into a single instance of the character.
89This occurs after all deletion and translation is completed.
90.It Fl u
91Guarantee that any output is unbuffered.
92.El
93.Pp
94In the first synopsis form, the characters in
95.Ar string1
96are translated into the characters in
97.Ar string2
98where the first character in
99.Ar string1
100is translated into the first character in
101.Ar string2
102and so on.
103If
104.Ar string1
105is longer than
106.Ar string2 ,
107the last character found in
108.Ar string2
109is duplicated until
110.Ar string1
111is exhausted.
112.Pp
113In the second synopsis form, the characters in
114.Ar string1
115are deleted from the input.
116.Pp
117In the third synopsis form, the characters in
118.Ar string1
119are compressed as described for the
120.Fl s
121option.
122.Pp
123In the fourth synopsis form, the characters in
124.Ar string1
125are deleted from the input, and the characters in
126.Ar string2
127are compressed as described for the
128.Fl s
129option.
130.Pp
131The following conventions can be used in
132.Ar string1
133and
134.Ar string2
135to specify sets of characters:
136.Bl -tag -width [:equiv:]
137.It character
138Any character not described by one of the following conventions
139represents itself.
140.It \eoctal
141A backslash followed by 1, 2 or 3 octal digits represents a character
142with that encoded value.
143To follow an octal sequence with a digit as a character, left zero-pad
144the octal sequence to the full 3 octal digits.
145.It \echaracter
146A backslash followed by certain special characters maps to special
147values.
148.Bl -column "\ea"
149.It "\ea	<alert character>"
150.It "\eb	<backspace>"
151.It "\ef	<form-feed>"
152.It "\en	<newline>"
153.It "\er	<carriage return>"
154.It "\et	<tab>"
155.It "\ev	<vertical tab>"
156.El
157.Pp
158A backslash followed by any other character maps to that character.
159.It c-c
160For non-octal range endpoints
161represents the range of characters between the range endpoints, inclusive,
162in ascending order,
163as defined by the collation sequence.
164If either or both of the range endpoints are octal sequences, it
165represents the range of specific coded values between the
166range endpoints, inclusive.
167.Ef
168.It [:class:]
169Represents all characters belonging to the defined character class.
170Class names are:
171.Bl -column "phonogram"
172.It "alnum	<alphanumeric characters>"
173.It "alpha	<alphabetic characters>"
174.It "blank	<whitespace characters>"
175.It "cntrl	<control characters>"
176.It "digit	<numeric characters>"
177.It "graph	<graphic characters>"
178.It "ideogram	<ideographic characters>"
179.It "lower	<lower-case alphabetic characters>"
180.It "phonogram	<phonographic characters>"
181.It "print	<printable characters>"
182.It "punct	<punctuation characters>"
183.It "rune	<valid characters>"
184.It "space	<space characters>"
185.It "special	<special characters>"
186.It "upper	<upper-case characters>"
187.It "xdigit	<hexadecimal characters>"
188.El
189.Pp
190.\" All classes may be used in
191.\" .Ar string1 ,
192.\" and in
193.\" .Ar string2
194.\" when both the
195.\" .Fl d
196.\" and
197.\" .Fl s
198.\" options are specified.
199.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
200.\" .Ar string2
201.\" and then only when the corresponding class (``upper'' for ``lower''
202.\" and vice-versa) is specified in the same relative position in
203.\" .Ar string1 .
204.\" .Pp
205When
206.Dq Li [:lower:]
207appears in
208.Ar string1
209and
210.Dq Li [:upper:]
211appears in the same relative position in
212.Ar string2 ,
213it represents the characters pairs from the
214.Dv toupper
215mapping in the
216.Ev LC_CTYPE
217category of the current locale.
218When
219.Dq Li [:upper:]
220appears in
221.Ar string1
222and
223.Dq Li [:lower:]
224appears in the same relative position in
225.Ar string2 ,
226it represents the characters pairs from the
227.Dv tolower
228mapping in the
229.Ev LC_CTYPE
230category of the current locale.
231.Pp
232With the exception of case conversion,
233characters in the classes are in unspecified order.
234.Pp
235For specific information as to which
236.Tn ASCII
237characters are included
238in these classes, see
239.Xr ctype 3
240and related manual pages.
241.It [=equiv=]
242Represents all characters belonging to the same equivalence class as
243.Ar equiv ,
244ordered by their encoded values.
245.It [#*n]
246Represents
247.Ar n
248repeated occurrences of the character represented by
249.Ar # .
250This
251expression is only valid when it occurs in
252.Ar string2 .
253If
254.Ar n
255is omitted or is zero, it is be interpreted as large enough to extend
256.Ar string2
257sequence to the length of
258.Ar string1 .
259If
260.Ar n
261has a leading zero, it is interpreted as an octal value, otherwise,
262it is interpreted as a decimal value.
263.El
264.Sh ENVIRONMENT
265The
266.Ev LANG , LC_ALL , LC_CTYPE
267and
268.Ev LC_COLLATE
269environment variables affect the execution of
270.Nm
271as described in
272.Xr environ 7 .
273.Sh EXIT STATUS
274.Ex -std
275.Sh EXAMPLES
276The following examples are shown as given to the shell:
277.Pp
278Create a list of the words in file1, one per line, where a word is taken to
279be a maximal string of letters.
280.Pp
281.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1"
282.Pp
283Translate the contents of file1 to upper-case.
284.Pp
285.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1"
286.Pp
287(This should be preferred over the traditional
288.Ux
289idiom of
290.Dq Li "tr a-z A-Z" ,
291since it works correctly in all locales.)
292.Pp
293Strip out non-printable characters from file1.
294.Pp
295.D1 Li "tr -cd \*q[:print:]\*q < file1"
296.Pp
297Remove diacritical marks from all accented variants of the letter
298.Ql e :
299.Pp
300.Dl "tr \*q[=e=]\*q \*qe\*q"
301.Sh COMPATIBILITY
302.Fx
303implementations of
304.Nm
305did not order characters in range expressions according to the current
306locale's collation order, making it possible to convert accented Latin
307characters from upper to lower case using
308the traditional
309.Ux
310idiom of
311.Dq Li "tr A-Z a-z" .
312As noted in the
313.Sx EXAMPLES
314section above, the character class expressions
315.Dq Li [:lower:]
316and
317.Dq Li [:upper:]
318should be used instead of explicit character ranges like
319.Dq Li a-z
320and
321.Dq Li A-Z .
322.Pp
323.Dq Li [=equiv=]
324expression is implemented for single byte locales only.
325.Pp
326System V has historically implemented character ranges using the syntax
327.Dq Li [c-c]
328instead of the
329.Dq Li c-c
330used by historic
331.Bx
332implementations and
333standardized by POSIX.
334System V shell scripts should work under this implementation as long as
335the range is intended to map in another range, i.e., the command
336.Dq Li "tr [a-z] [A-Z]"
337will work as it will map the
338.Ql \&[
339character in
340.Ar string1
341to the
342.Ql \&[
343character in
344.Ar string2 .
345However, if the shell script is deleting or squeezing characters as in
346the command
347.Dq Li "tr -d [a-z]" ,
348the characters
349.Ql \&[
350and
351.Ql \&]
352will be
353included in the deletion or compression list which would not have happened
354under a historic System V implementation.
355Additionally, any scripts that depended on the sequence
356.Dq Li a-z
357to
358represent the three characters
359.Ql a ,
360.Ql \-
361and
362.Ql z
363will have to be
364rewritten as
365.Dq Li a\e-z .
366.Pp
367The
368.Nm
369utility has historically not permitted the manipulation of NUL bytes in
370its input and, additionally, stripped NUL's from its input stream.
371This implementation has removed this behavior as a bug.
372.Pp
373The
374.Nm
375utility has historically been extremely forgiving of syntax errors,
376for example, the
377.Fl c
378and
379.Fl s
380options were ignored unless two strings were specified.
381This implementation will not permit illegal syntax.
382.Sh STANDARDS
383The
384.Nm
385utility conforms to
386.St -p1003.1-2001 .
387The
388.Dq ideogram ,
389.Dq phonogram ,
390.Dq rune ,
391and
392.Dq special
393character classes are extensions.
394.Pp
395It should be noted that the feature wherein the last character of
396.Ar string2
397is duplicated if
398.Ar string2
399has less characters than
400.Ar string1
401is permitted by POSIX but is not required.
402Shell scripts attempting to be portable to other POSIX systems should use
403the
404.Dq Li [#*]
405convention instead of relying on this behavior.
406The
407.Fl u
408option is an extension to the
409.St -p1003.1-2001
410standard.
411