xref: /freebsd/usr.bin/tr/tr.1 (revision 884a2a699669ec61e2366e3e358342dbc94be24a)
1.\" Copyright (c) 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" the Institute of Electrical and Electronics Engineers, Inc.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 4. Neither the name of the University nor the names of its contributors
16.\"    may be used to endorse or promote products derived from this software
17.\"    without specific prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
29.\" SUCH DAMAGE.
30.\"
31.\"     @(#)tr.1	8.1 (Berkeley) 6/6/93
32.\" $FreeBSD$
33.\"
34.Dd October 13, 2006
35.Dt TR 1
36.Os
37.Sh NAME
38.Nm tr
39.Nd translate characters
40.Sh SYNOPSIS
41.Nm
42.Op Fl Ccsu
43.Ar string1 string2
44.Nm
45.Op Fl Ccu
46.Fl d
47.Ar string1
48.Nm
49.Op Fl Ccu
50.Fl s
51.Ar string1
52.Nm
53.Op Fl Ccu
54.Fl ds
55.Ar string1 string2
56.Sh DESCRIPTION
57The
58.Nm
59utility copies the standard input to the standard output with substitution
60or deletion of selected characters.
61.Pp
62The following options are available:
63.Bl -tag -width Ds
64.It Fl C
65Complement the set of characters in
66.Ar string1 ,
67that is
68.Dq Fl C Li ab
69includes every character except for
70.Ql a
71and
72.Ql b .
73.It Fl c
74Same as
75.Fl C
76but complement the set of values in
77.Ar string1 .
78.It Fl d
79Delete characters in
80.Ar string1
81from the input.
82.It Fl s
83Squeeze multiple occurrences of the characters listed in the last
84operand (either
85.Ar string1
86or
87.Ar string2 )
88in the input into a single instance of the character.
89This occurs after all deletion and translation is completed.
90.It Fl u
91Guarantee that any output is unbuffered.
92.El
93.Pp
94In the first synopsis form, the characters in
95.Ar string1
96are translated into the characters in
97.Ar string2
98where the first character in
99.Ar string1
100is translated into the first character in
101.Ar string2
102and so on.
103If
104.Ar string1
105is longer than
106.Ar string2 ,
107the last character found in
108.Ar string2
109is duplicated until
110.Ar string1
111is exhausted.
112.Pp
113In the second synopsis form, the characters in
114.Ar string1
115are deleted from the input.
116.Pp
117In the third synopsis form, the characters in
118.Ar string1
119are compressed as described for the
120.Fl s
121option.
122.Pp
123In the fourth synopsis form, the characters in
124.Ar string1
125are deleted from the input, and the characters in
126.Ar string2
127are compressed as described for the
128.Fl s
129option.
130.Pp
131The following conventions can be used in
132.Ar string1
133and
134.Ar string2
135to specify sets of characters:
136.Bl -tag -width [:equiv:]
137.It character
138Any character not described by one of the following conventions
139represents itself.
140.It \eoctal
141A backslash followed by 1, 2 or 3 octal digits represents a character
142with that encoded value.
143To follow an octal sequence with a digit as a character, left zero-pad
144the octal sequence to the full 3 octal digits.
145.It \echaracter
146A backslash followed by certain special characters maps to special
147values.
148.Pp
149.Bl -column "\ea"
150.It "\ea	<alert character>
151.It "\eb	<backspace>
152.It "\ef	<form-feed>
153.It "\en	<newline>
154.It "\er	<carriage return>
155.It "\et	<tab>
156.It "\ev	<vertical tab>
157.El
158.Pp
159A backslash followed by any other character maps to that character.
160.It c-c
161For non-octal range endpoints
162represents the range of characters between the range endpoints, inclusive,
163in ascending order,
164as defined by the collation sequence.
165If either or both of the range endpoints are octal sequences, it
166represents the range of specific coded values between the
167range endpoints, inclusive.
168.Pp
169.Bf Em
170See the
171.Sx COMPATIBILITY
172section below for an important note regarding
173differences in the way the current
174implementation interprets range expressions differently from
175previous implementations.
176.Ef
177.It [:class:]
178Represents all characters belonging to the defined character class.
179Class names are:
180.Pp
181.Bl -column "phonogram"
182.It "alnum	<alphanumeric characters>
183.It "alpha	<alphabetic characters>
184.It "blank	<whitespace characters>
185.It "cntrl	<control characters>
186.It "digit	<numeric characters>
187.It "graph	<graphic characters>
188.It "ideogram	<ideographic characters>
189.It "lower	<lower-case alphabetic characters>
190.It "phonogram	<phonographic characters>
191.It "print	<printable characters>
192.It "punct	<punctuation characters>
193.It "rune	<valid characters>
194.It "space	<space characters>
195.It "special	<special characters>
196.It "upper	<upper-case characters>
197.It "xdigit	<hexadecimal characters>
198.El
199.Pp
200.\" All classes may be used in
201.\" .Ar string1 ,
202.\" and in
203.\" .Ar string2
204.\" when both the
205.\" .Fl d
206.\" and
207.\" .Fl s
208.\" options are specified.
209.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
210.\" .Ar string2
211.\" and then only when the corresponding class (``upper'' for ``lower''
212.\" and vice-versa) is specified in the same relative position in
213.\" .Ar string1 .
214.\" .Pp
215When
216.Dq Li [:lower:]
217appears in
218.Ar string1
219and
220.Dq Li [:upper:]
221appears in the same relative position in
222.Ar string2 ,
223it represents the characters pairs from the
224.Dv toupper
225mapping in the
226.Ev LC_CTYPE
227category of the current locale.
228When
229.Dq Li [:upper:]
230appears in
231.Ar string1
232and
233.Dq Li [:lower:]
234appears in the same relative position in
235.Ar string2 ,
236it represents the characters pairs from the
237.Dv tolower
238mapping in the
239.Ev LC_CTYPE
240category of the current locale.
241.Pp
242With the exception of case conversion,
243characters in the classes are in unspecified order.
244.Pp
245For specific information as to which
246.Tn ASCII
247characters are included
248in these classes, see
249.Xr ctype 3
250and related manual pages.
251.It [=equiv=]
252Represents all characters belonging to the same equivalence class as
253.Ar equiv ,
254ordered by their encoded values.
255.It [#*n]
256Represents
257.Ar n
258repeated occurrences of the character represented by
259.Ar # .
260This
261expression is only valid when it occurs in
262.Ar string2 .
263If
264.Ar n
265is omitted or is zero, it is be interpreted as large enough to extend
266.Ar string2
267sequence to the length of
268.Ar string1 .
269If
270.Ar n
271has a leading zero, it is interpreted as an octal value, otherwise,
272it is interpreted as a decimal value.
273.El
274.Sh ENVIRONMENT
275The
276.Ev LANG , LC_ALL , LC_CTYPE
277and
278.Ev LC_COLLATE
279environment variables affect the execution of
280.Nm
281as described in
282.Xr environ 7 .
283.Sh EXIT STATUS
284.Ex -std
285.Sh EXAMPLES
286The following examples are shown as given to the shell:
287.Pp
288Create a list of the words in file1, one per line, where a word is taken to
289be a maximal string of letters.
290.Pp
291.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1"
292.Pp
293Translate the contents of file1 to upper-case.
294.Pp
295.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1"
296.Pp
297(This should be preferred over the traditional
298.Ux
299idiom of
300.Dq Li "tr a-z A-Z" ,
301since it works correctly in all locales.)
302.Pp
303Strip out non-printable characters from file1.
304.Pp
305.D1 Li "tr -cd \*q[:print:]\*q < file1"
306.Pp
307Remove diacritical marks from all accented variants of the letter
308.Ql e :
309.Pp
310.Dl "tr \*q[=e=]\*q \*qe\*q"
311.Sh COMPATIBILITY
312Previous
313.Fx
314implementations of
315.Nm
316did not order characters in range expressions according to the current
317locale's collation order, making it possible to convert unaccented Latin
318characters (esp.\& as found in English text) from upper to lower case using
319the traditional
320.Ux
321idiom of
322.Dq Li "tr A-Z a-z" .
323Since
324.Nm
325now obeys the locale's collation order, this idiom may not produce
326correct results when there is not a 1:1 mapping between lower and
327upper case, or when the order of characters within the two cases differs.
328As noted in the
329.Sx EXAMPLES
330section above, the character class expressions
331.Dq Li [:lower:]
332and
333.Dq Li [:upper:]
334should be used instead of explicit character ranges like
335.Dq Li a-z
336and
337.Dq Li A-Z .
338.Pp
339System V has historically implemented character ranges using the syntax
340.Dq Li [c-c]
341instead of the
342.Dq Li c-c
343used by historic
344.Bx
345implementations and
346standardized by POSIX.
347System V shell scripts should work under this implementation as long as
348the range is intended to map in another range, i.e., the command
349.Dq Li "tr [a-z] [A-Z]"
350will work as it will map the
351.Ql \&[
352character in
353.Ar string1
354to the
355.Ql \&[
356character in
357.Ar string2 .
358However, if the shell script is deleting or squeezing characters as in
359the command
360.Dq Li "tr -d [a-z]" ,
361the characters
362.Ql \&[
363and
364.Ql \&]
365will be
366included in the deletion or compression list which would not have happened
367under a historic System V implementation.
368Additionally, any scripts that depended on the sequence
369.Dq Li a-z
370to
371represent the three characters
372.Ql a ,
373.Ql \-
374and
375.Ql z
376will have to be
377rewritten as
378.Dq Li a\e-z .
379.Pp
380The
381.Nm
382utility has historically not permitted the manipulation of NUL bytes in
383its input and, additionally, stripped NUL's from its input stream.
384This implementation has removed this behavior as a bug.
385.Pp
386The
387.Nm
388utility has historically been extremely forgiving of syntax errors,
389for example, the
390.Fl c
391and
392.Fl s
393options were ignored unless two strings were specified.
394This implementation will not permit illegal syntax.
395.Sh STANDARDS
396The
397.Nm
398utility conforms to
399.St -p1003.1-2001 .
400The
401.Dq ideogram ,
402.Dq phonogram ,
403.Dq rune ,
404and
405.Dq special
406character classes are extensions.
407.Pp
408It should be noted that the feature wherein the last character of
409.Ar string2
410is duplicated if
411.Ar string2
412has less characters than
413.Ar string1
414is permitted by POSIX but is not required.
415Shell scripts attempting to be portable to other POSIX systems should use
416the
417.Dq Li [#*]
418convention instead of relying on this behavior.
419The
420.Fl u
421option is an extension to the
422.St -p1003.1-2001
423standard.
424