1.\" Copyright (c) 1991, 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" This code is derived from software contributed to Berkeley by 5.\" the Institute of Electrical and Electronics Engineers, Inc. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 4. Neither the name of the University nor the names of its contributors 16.\" may be used to endorse or promote products derived from this software 17.\" without specific prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 22.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 29.\" SUCH DAMAGE. 30.\" 31.\" @(#)tr.1 8.1 (Berkeley) 6/6/93 32.\" $FreeBSD$ 33.\" 34.Dd October 13, 2006 35.Dt TR 1 36.Os 37.Sh NAME 38.Nm tr 39.Nd translate characters 40.Sh SYNOPSIS 41.Nm 42.Op Fl Ccsu 43.Ar string1 string2 44.Nm 45.Op Fl Ccu 46.Fl d 47.Ar string1 48.Nm 49.Op Fl Ccu 50.Fl s 51.Ar string1 52.Nm 53.Op Fl Ccu 54.Fl ds 55.Ar string1 string2 56.Sh DESCRIPTION 57The 58.Nm 59utility copies the standard input to the standard output with substitution 60or deletion of selected characters. 61.Pp 62The following options are available: 63.Bl -tag -width Ds 64.It Fl C 65Complement the set of characters in 66.Ar string1 , 67that is 68.Dq Fl C Li ab 69includes every character except for 70.Ql a 71and 72.Ql b . 73.It Fl c 74Same as 75.Fl C 76but complement the set of values in 77.Ar string1 . 78.It Fl d 79Delete characters in 80.Ar string1 81from the input. 82.It Fl s 83Squeeze multiple occurrences of the characters listed in the last 84operand (either 85.Ar string1 86or 87.Ar string2 ) 88in the input into a single instance of the character. 89This occurs after all deletion and translation is completed. 90.It Fl u 91Guarantee that any output is unbuffered. 92.El 93.Pp 94In the first synopsis form, the characters in 95.Ar string1 96are translated into the characters in 97.Ar string2 98where the first character in 99.Ar string1 100is translated into the first character in 101.Ar string2 102and so on. 103If 104.Ar string1 105is longer than 106.Ar string2 , 107the last character found in 108.Ar string2 109is duplicated until 110.Ar string1 111is exhausted. 112.Pp 113In the second synopsis form, the characters in 114.Ar string1 115are deleted from the input. 116.Pp 117In the third synopsis form, the characters in 118.Ar string1 119are compressed as described for the 120.Fl s 121option. 122.Pp 123In the fourth synopsis form, the characters in 124.Ar string1 125are deleted from the input, and the characters in 126.Ar string2 127are compressed as described for the 128.Fl s 129option. 130.Pp 131The following conventions can be used in 132.Ar string1 133and 134.Ar string2 135to specify sets of characters: 136.Bl -tag -width [:equiv:] 137.It character 138Any character not described by one of the following conventions 139represents itself. 140.It \eoctal 141A backslash followed by 1, 2 or 3 octal digits represents a character 142with that encoded value. 143To follow an octal sequence with a digit as a character, left zero-pad 144the octal sequence to the full 3 octal digits. 145.It \echaracter 146A backslash followed by certain special characters maps to special 147values. 148.Pp 149.Bl -column "\ea" 150.It "\ea <alert character> 151.It "\eb <backspace> 152.It "\ef <form-feed> 153.It "\en <newline> 154.It "\er <carriage return> 155.It "\et <tab> 156.It "\ev <vertical tab> 157.El 158.Pp 159A backslash followed by any other character maps to that character. 160.It c-c 161For non-octal range endpoints 162represents the range of characters between the range endpoints, inclusive, 163in ascending order, 164as defined by the collation sequence. 165If either or both of the range endpoints are octal sequences, it 166represents the range of specific coded values between the 167range endpoints, inclusive. 168.Pp 169.Bf Em 170See the 171.Sx COMPATIBILITY 172section below for an important note regarding 173differences in the way the current 174implementation interprets range expressions differently from 175previous implementations. 176.Ef 177.It [:class:] 178Represents all characters belonging to the defined character class. 179Class names are: 180.Pp 181.Bl -column "phonogram" 182.It "alnum <alphanumeric characters> 183.It "alpha <alphabetic characters> 184.It "blank <whitespace characters> 185.It "cntrl <control characters> 186.It "digit <numeric characters> 187.It "graph <graphic characters> 188.It "ideogram <ideographic characters> 189.It "lower <lower-case alphabetic characters> 190.It "phonogram <phonographic characters> 191.It "print <printable characters> 192.It "punct <punctuation characters> 193.It "rune <valid characters> 194.It "space <space characters> 195.It "special <special characters> 196.It "upper <upper-case characters> 197.It "xdigit <hexadecimal characters> 198.El 199.Pp 200.\" All classes may be used in 201.\" .Ar string1 , 202.\" and in 203.\" .Ar string2 204.\" when both the 205.\" .Fl d 206.\" and 207.\" .Fl s 208.\" options are specified. 209.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in 210.\" .Ar string2 211.\" and then only when the corresponding class (``upper'' for ``lower'' 212.\" and vice-versa) is specified in the same relative position in 213.\" .Ar string1 . 214.\" .Pp 215When 216.Dq Li [:lower:] 217appears in 218.Ar string1 219and 220.Dq Li [:upper:] 221appears in the same relative position in 222.Ar string2 , 223it represents the characters pairs from the 224.Dv toupper 225mapping in the 226.Ev LC_CTYPE 227category of the current locale. 228When 229.Dq Li [:upper:] 230appears in 231.Ar string1 232and 233.Dq Li [:lower:] 234appears in the same relative position in 235.Ar string2 , 236it represents the characters pairs from the 237.Dv tolower 238mapping in the 239.Ev LC_CTYPE 240category of the current locale. 241.Pp 242With the exception of case conversion, 243characters in the classes are in unspecified order. 244.Pp 245For specific information as to which 246.Tn ASCII 247characters are included 248in these classes, see 249.Xr ctype 3 250and related manual pages. 251.It [=equiv=] 252Represents all characters belonging to the same equivalence class as 253.Ar equiv , 254ordered by their encoded values. 255.It [#*n] 256Represents 257.Ar n 258repeated occurrences of the character represented by 259.Ar # . 260This 261expression is only valid when it occurs in 262.Ar string2 . 263If 264.Ar n 265is omitted or is zero, it is be interpreted as large enough to extend 266.Ar string2 267sequence to the length of 268.Ar string1 . 269If 270.Ar n 271has a leading zero, it is interpreted as an octal value, otherwise, 272it is interpreted as a decimal value. 273.El 274.Sh ENVIRONMENT 275The 276.Ev LANG , LC_ALL , LC_CTYPE 277and 278.Ev LC_COLLATE 279environment variables affect the execution of 280.Nm 281as described in 282.Xr environ 7 . 283.Sh EXIT STATUS 284.Ex -std 285.Sh EXAMPLES 286The following examples are shown as given to the shell: 287.Pp 288Create a list of the words in file1, one per line, where a word is taken to 289be a maximal string of letters. 290.Pp 291.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1" 292.Pp 293Translate the contents of file1 to upper-case. 294.Pp 295.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1" 296.Pp 297(This should be preferred over the traditional 298.Ux 299idiom of 300.Dq Li "tr a-z A-Z" , 301since it works correctly in all locales.) 302.Pp 303Strip out non-printable characters from file1. 304.Pp 305.D1 Li "tr -cd \*q[:print:]\*q < file1" 306.Pp 307Remove diacritical marks from all accented variants of the letter 308.Ql e : 309.Pp 310.Dl "tr \*q[=e=]\*q \*qe\*q" 311.Sh COMPATIBILITY 312Previous 313.Fx 314implementations of 315.Nm 316did not order characters in range expressions according to the current 317locale's collation order, making it possible to convert unaccented Latin 318characters (esp.\& as found in English text) from upper to lower case using 319the traditional 320.Ux 321idiom of 322.Dq Li "tr A-Z a-z" . 323Since 324.Nm 325now obeys the locale's collation order, this idiom may not produce 326correct results when there is not a 1:1 mapping between lower and 327upper case, or when the order of characters within the two cases differs. 328As noted in the 329.Sx EXAMPLES 330section above, the character class expressions 331.Dq Li [:lower:] 332and 333.Dq Li [:upper:] 334should be used instead of explicit character ranges like 335.Dq Li a-z 336and 337.Dq Li A-Z . 338.Pp 339System V has historically implemented character ranges using the syntax 340.Dq Li [c-c] 341instead of the 342.Dq Li c-c 343used by historic 344.Bx 345implementations and 346standardized by POSIX. 347System V shell scripts should work under this implementation as long as 348the range is intended to map in another range, i.e., the command 349.Dq Li "tr [a-z] [A-Z]" 350will work as it will map the 351.Ql \&[ 352character in 353.Ar string1 354to the 355.Ql \&[ 356character in 357.Ar string2 . 358However, if the shell script is deleting or squeezing characters as in 359the command 360.Dq Li "tr -d [a-z]" , 361the characters 362.Ql \&[ 363and 364.Ql \&] 365will be 366included in the deletion or compression list which would not have happened 367under a historic System V implementation. 368Additionally, any scripts that depended on the sequence 369.Dq Li a-z 370to 371represent the three characters 372.Ql a , 373.Ql \- 374and 375.Ql z 376will have to be 377rewritten as 378.Dq Li a\e-z . 379.Pp 380The 381.Nm 382utility has historically not permitted the manipulation of NUL bytes in 383its input and, additionally, stripped NUL's from its input stream. 384This implementation has removed this behavior as a bug. 385.Pp 386The 387.Nm 388utility has historically been extremely forgiving of syntax errors, 389for example, the 390.Fl c 391and 392.Fl s 393options were ignored unless two strings were specified. 394This implementation will not permit illegal syntax. 395.Sh STANDARDS 396The 397.Nm 398utility conforms to 399.St -p1003.1-2001 . 400The 401.Dq ideogram , 402.Dq phonogram , 403.Dq rune , 404and 405.Dq special 406character classes are extensions. 407.Pp 408It should be noted that the feature wherein the last character of 409.Ar string2 410is duplicated if 411.Ar string2 412has less characters than 413.Ar string1 414is permitted by POSIX but is not required. 415Shell scripts attempting to be portable to other POSIX systems should use 416the 417.Dq Li [#*] 418convention instead of relying on this behavior. 419The 420.Fl u 421option is an extension to the 422.St -p1003.1-2001 423standard. 424