xref: /freebsd/usr.bin/tr/tr.1 (revision 40a8ac8f62b535d30349faf28cf47106b7041b83)
1.\" Copyright (c) 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" the Institute of Electrical and Electronics Engineers, Inc.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 4. Neither the name of the University nor the names of its contributors
16.\"    may be used to endorse or promote products derived from this software
17.\"    without specific prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
29.\" SUCH DAMAGE.
30.\"
31.\"     @(#)tr.1	8.1 (Berkeley) 6/6/93
32.\" $FreeBSD$
33.\"
34.Dd October 13, 2006
35.Dt TR 1
36.Os
37.Sh NAME
38.Nm tr
39.Nd translate characters
40.Sh SYNOPSIS
41.Nm
42.Op Fl Ccsu
43.Ar string1 string2
44.Nm
45.Op Fl Ccu
46.Fl d
47.Ar string1
48.Nm
49.Op Fl Ccu
50.Fl s
51.Ar string1
52.Nm
53.Op Fl Ccu
54.Fl ds
55.Ar string1 string2
56.Sh DESCRIPTION
57The
58.Nm
59utility copies the standard input to the standard output with substitution
60or deletion of selected characters.
61.Pp
62The following options are available:
63.Bl -tag -width Ds
64.It Fl C
65Complement the set of characters in
66.Ar string1 ,
67that is
68.Dq Fl C Li ab
69includes every character except for
70.Ql a
71and
72.Ql b .
73.It Fl c
74Same as
75.Fl C
76but complement the set of values in
77.Ar string1 .
78.It Fl d
79Delete characters in
80.Ar string1
81from the input.
82.It Fl s
83Squeeze multiple occurrences of the characters listed in the last
84operand (either
85.Ar string1
86or
87.Ar string2 )
88in the input into a single instance of the character.
89This occurs after all deletion and translation is completed.
90.It Fl u
91Guarantee that any output is unbuffered.
92.El
93.Pp
94In the first synopsis form, the characters in
95.Ar string1
96are translated into the characters in
97.Ar string2
98where the first character in
99.Ar string1
100is translated into the first character in
101.Ar string2
102and so on.
103If
104.Ar string1
105is longer than
106.Ar string2 ,
107the last character found in
108.Ar string2
109is duplicated until
110.Ar string1
111is exhausted.
112.Pp
113In the second synopsis form, the characters in
114.Ar string1
115are deleted from the input.
116.Pp
117In the third synopsis form, the characters in
118.Ar string1
119are compressed as described for the
120.Fl s
121option.
122.Pp
123In the fourth synopsis form, the characters in
124.Ar string1
125are deleted from the input, and the characters in
126.Ar string2
127are compressed as described for the
128.Fl s
129option.
130.Pp
131The following conventions can be used in
132.Ar string1
133and
134.Ar string2
135to specify sets of characters:
136.Bl -tag -width [:equiv:]
137.It character
138Any character not described by one of the following conventions
139represents itself.
140.It \eoctal
141A backslash followed by 1, 2 or 3 octal digits represents a character
142with that encoded value.
143To follow an octal sequence with a digit as a character, left zero-pad
144the octal sequence to the full 3 octal digits.
145.It \echaracter
146A backslash followed by certain special characters maps to special
147values.
148.Bl -column "\ea"
149.It "\ea	<alert character>"
150.It "\eb	<backspace>"
151.It "\ef	<form-feed>"
152.It "\en	<newline>"
153.It "\er	<carriage return>"
154.It "\et	<tab>"
155.It "\ev	<vertical tab>"
156.El
157.Pp
158A backslash followed by any other character maps to that character.
159.It c-c
160For non-octal range endpoints
161represents the range of characters between the range endpoints, inclusive,
162in ascending order,
163as defined by the collation sequence.
164If either or both of the range endpoints are octal sequences, it
165represents the range of specific coded values between the
166range endpoints, inclusive.
167.Pp
168.Bf Em
169See the
170.Sx COMPATIBILITY
171section below for an important note regarding
172differences in the way the current
173implementation interprets range expressions differently from
174previous implementations.
175.Ef
176.It [:class:]
177Represents all characters belonging to the defined character class.
178Class names are:
179.Bl -column "phonogram"
180.It "alnum	<alphanumeric characters>"
181.It "alpha	<alphabetic characters>"
182.It "blank	<whitespace characters>"
183.It "cntrl	<control characters>"
184.It "digit	<numeric characters>"
185.It "graph	<graphic characters>"
186.It "ideogram	<ideographic characters>"
187.It "lower	<lower-case alphabetic characters>"
188.It "phonogram	<phonographic characters>"
189.It "print	<printable characters>"
190.It "punct	<punctuation characters>"
191.It "rune	<valid characters>"
192.It "space	<space characters>"
193.It "special	<special characters>"
194.It "upper	<upper-case characters>"
195.It "xdigit	<hexadecimal characters>"
196.El
197.Pp
198.\" All classes may be used in
199.\" .Ar string1 ,
200.\" and in
201.\" .Ar string2
202.\" when both the
203.\" .Fl d
204.\" and
205.\" .Fl s
206.\" options are specified.
207.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
208.\" .Ar string2
209.\" and then only when the corresponding class (``upper'' for ``lower''
210.\" and vice-versa) is specified in the same relative position in
211.\" .Ar string1 .
212.\" .Pp
213When
214.Dq Li [:lower:]
215appears in
216.Ar string1
217and
218.Dq Li [:upper:]
219appears in the same relative position in
220.Ar string2 ,
221it represents the characters pairs from the
222.Dv toupper
223mapping in the
224.Ev LC_CTYPE
225category of the current locale.
226When
227.Dq Li [:upper:]
228appears in
229.Ar string1
230and
231.Dq Li [:lower:]
232appears in the same relative position in
233.Ar string2 ,
234it represents the characters pairs from the
235.Dv tolower
236mapping in the
237.Ev LC_CTYPE
238category of the current locale.
239.Pp
240With the exception of case conversion,
241characters in the classes are in unspecified order.
242.Pp
243For specific information as to which
244.Tn ASCII
245characters are included
246in these classes, see
247.Xr ctype 3
248and related manual pages.
249.It [=equiv=]
250Represents all characters belonging to the same equivalence class as
251.Ar equiv ,
252ordered by their encoded values.
253.It [#*n]
254Represents
255.Ar n
256repeated occurrences of the character represented by
257.Ar # .
258This
259expression is only valid when it occurs in
260.Ar string2 .
261If
262.Ar n
263is omitted or is zero, it is be interpreted as large enough to extend
264.Ar string2
265sequence to the length of
266.Ar string1 .
267If
268.Ar n
269has a leading zero, it is interpreted as an octal value, otherwise,
270it is interpreted as a decimal value.
271.El
272.Sh ENVIRONMENT
273The
274.Ev LANG , LC_ALL , LC_CTYPE
275and
276.Ev LC_COLLATE
277environment variables affect the execution of
278.Nm
279as described in
280.Xr environ 7 .
281.Sh EXIT STATUS
282.Ex -std
283.Sh EXAMPLES
284The following examples are shown as given to the shell:
285.Pp
286Create a list of the words in file1, one per line, where a word is taken to
287be a maximal string of letters.
288.Pp
289.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1"
290.Pp
291Translate the contents of file1 to upper-case.
292.Pp
293.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1"
294.Pp
295(This should be preferred over the traditional
296.Ux
297idiom of
298.Dq Li "tr a-z A-Z" ,
299since it works correctly in all locales.)
300.Pp
301Strip out non-printable characters from file1.
302.Pp
303.D1 Li "tr -cd \*q[:print:]\*q < file1"
304.Pp
305Remove diacritical marks from all accented variants of the letter
306.Ql e :
307.Pp
308.Dl "tr \*q[=e=]\*q \*qe\*q"
309.Sh COMPATIBILITY
310Previous
311.Fx
312implementations of
313.Nm
314did not order characters in range expressions according to the current
315locale's collation order, making it possible to convert unaccented Latin
316characters (esp.\& as found in English text) from upper to lower case using
317the traditional
318.Ux
319idiom of
320.Dq Li "tr A-Z a-z" .
321Since
322.Nm
323now obeys the locale's collation order, this idiom may not produce
324correct results when there is not a 1:1 mapping between lower and
325upper case, or when the order of characters within the two cases differs.
326As noted in the
327.Sx EXAMPLES
328section above, the character class expressions
329.Dq Li [:lower:]
330and
331.Dq Li [:upper:]
332should be used instead of explicit character ranges like
333.Dq Li a-z
334and
335.Dq Li A-Z .
336.Pp
337System V has historically implemented character ranges using the syntax
338.Dq Li [c-c]
339instead of the
340.Dq Li c-c
341used by historic
342.Bx
343implementations and
344standardized by POSIX.
345System V shell scripts should work under this implementation as long as
346the range is intended to map in another range, i.e., the command
347.Dq Li "tr [a-z] [A-Z]"
348will work as it will map the
349.Ql \&[
350character in
351.Ar string1
352to the
353.Ql \&[
354character in
355.Ar string2 .
356However, if the shell script is deleting or squeezing characters as in
357the command
358.Dq Li "tr -d [a-z]" ,
359the characters
360.Ql \&[
361and
362.Ql \&]
363will be
364included in the deletion or compression list which would not have happened
365under a historic System V implementation.
366Additionally, any scripts that depended on the sequence
367.Dq Li a-z
368to
369represent the three characters
370.Ql a ,
371.Ql \-
372and
373.Ql z
374will have to be
375rewritten as
376.Dq Li a\e-z .
377.Pp
378The
379.Nm
380utility has historically not permitted the manipulation of NUL bytes in
381its input and, additionally, stripped NUL's from its input stream.
382This implementation has removed this behavior as a bug.
383.Pp
384The
385.Nm
386utility has historically been extremely forgiving of syntax errors,
387for example, the
388.Fl c
389and
390.Fl s
391options were ignored unless two strings were specified.
392This implementation will not permit illegal syntax.
393.Sh STANDARDS
394The
395.Nm
396utility conforms to
397.St -p1003.1-2001 .
398The
399.Dq ideogram ,
400.Dq phonogram ,
401.Dq rune ,
402and
403.Dq special
404character classes are extensions.
405.Pp
406It should be noted that the feature wherein the last character of
407.Ar string2
408is duplicated if
409.Ar string2
410has less characters than
411.Ar string1
412is permitted by POSIX but is not required.
413Shell scripts attempting to be portable to other POSIX systems should use
414the
415.Dq Li [#*]
416convention instead of relying on this behavior.
417The
418.Fl u
419option is an extension to the
420.St -p1003.1-2001
421standard.
422