1.\" Copyright (c) 1991, 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" This code is derived from software contributed to Berkeley by 5.\" the Institute of Electrical and Electronics Engineers, Inc. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 3. Neither the name of the University nor the names of its contributors 16.\" may be used to endorse or promote products derived from this software 17.\" without specific prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 22.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 29.\" SUCH DAMAGE. 30.\" 31.\" @(#)tr.1 8.1 (Berkeley) 6/6/93 32.\" $FreeBSD$ 33.\" 34.Dd October 13, 2006 35.Dt TR 1 36.Os 37.Sh NAME 38.Nm tr 39.Nd translate characters 40.Sh SYNOPSIS 41.Nm 42.Op Fl Ccsu 43.Ar string1 string2 44.Nm 45.Op Fl Ccu 46.Fl d 47.Ar string1 48.Nm 49.Op Fl Ccu 50.Fl s 51.Ar string1 52.Nm 53.Op Fl Ccu 54.Fl ds 55.Ar string1 string2 56.Sh DESCRIPTION 57The 58.Nm 59utility copies the standard input to the standard output with substitution 60or deletion of selected characters. 61.Pp 62The following options are available: 63.Bl -tag -width Ds 64.It Fl C 65Complement the set of characters in 66.Ar string1 , 67that is 68.Dq Fl C Li ab 69includes every character except for 70.Ql a 71and 72.Ql b . 73.It Fl c 74Same as 75.Fl C 76but complement the set of values in 77.Ar string1 . 78.It Fl d 79Delete characters in 80.Ar string1 81from the input. 82.It Fl s 83Squeeze multiple occurrences of the characters listed in the last 84operand (either 85.Ar string1 86or 87.Ar string2 ) 88in the input into a single instance of the character. 89This occurs after all deletion and translation is completed. 90.It Fl u 91Guarantee that any output is unbuffered. 92.El 93.Pp 94In the first synopsis form, the characters in 95.Ar string1 96are translated into the characters in 97.Ar string2 98where the first character in 99.Ar string1 100is translated into the first character in 101.Ar string2 102and so on. 103If 104.Ar string1 105is longer than 106.Ar string2 , 107the last character found in 108.Ar string2 109is duplicated until 110.Ar string1 111is exhausted. 112.Pp 113In the second synopsis form, the characters in 114.Ar string1 115are deleted from the input. 116.Pp 117In the third synopsis form, the characters in 118.Ar string1 119are compressed as described for the 120.Fl s 121option. 122.Pp 123In the fourth synopsis form, the characters in 124.Ar string1 125are deleted from the input, and the characters in 126.Ar string2 127are compressed as described for the 128.Fl s 129option. 130.Pp 131The following conventions can be used in 132.Ar string1 133and 134.Ar string2 135to specify sets of characters: 136.Bl -tag -width [:equiv:] 137.It character 138Any character not described by one of the following conventions 139represents itself. 140.It \eoctal 141A backslash followed by 1, 2 or 3 octal digits represents a character 142with that encoded value. 143To follow an octal sequence with a digit as a character, left zero-pad 144the octal sequence to the full 3 octal digits. 145.It \echaracter 146A backslash followed by certain special characters maps to special 147values. 148.Bl -column "\ea" 149.It "\ea <alert character>" 150.It "\eb <backspace>" 151.It "\ef <form-feed>" 152.It "\en <newline>" 153.It "\er <carriage return>" 154.It "\et <tab>" 155.It "\ev <vertical tab>" 156.El 157.Pp 158A backslash followed by any other character maps to that character. 159.It c-c 160For non-octal range endpoints 161represents the range of characters between the range endpoints, inclusive, 162in ascending order, 163as defined by the collation sequence. 164If either or both of the range endpoints are octal sequences, it 165represents the range of specific coded values between the 166range endpoints, inclusive. 167.Pp 168.Bf Em 169See the 170.Sx COMPATIBILITY 171section below for an important note regarding 172differences in the way the current 173implementation interprets range expressions differently from 174previous implementations. 175.Ef 176.It [:class:] 177Represents all characters belonging to the defined character class. 178Class names are: 179.Bl -column "phonogram" 180.It "alnum <alphanumeric characters>" 181.It "alpha <alphabetic characters>" 182.It "blank <whitespace characters>" 183.It "cntrl <control characters>" 184.It "digit <numeric characters>" 185.It "graph <graphic characters>" 186.It "ideogram <ideographic characters>" 187.It "lower <lower-case alphabetic characters>" 188.It "phonogram <phonographic characters>" 189.It "print <printable characters>" 190.It "punct <punctuation characters>" 191.It "rune <valid characters>" 192.It "space <space characters>" 193.It "special <special characters>" 194.It "upper <upper-case characters>" 195.It "xdigit <hexadecimal characters>" 196.El 197.Pp 198.\" All classes may be used in 199.\" .Ar string1 , 200.\" and in 201.\" .Ar string2 202.\" when both the 203.\" .Fl d 204.\" and 205.\" .Fl s 206.\" options are specified. 207.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in 208.\" .Ar string2 209.\" and then only when the corresponding class (``upper'' for ``lower'' 210.\" and vice-versa) is specified in the same relative position in 211.\" .Ar string1 . 212.\" .Pp 213When 214.Dq Li [:lower:] 215appears in 216.Ar string1 217and 218.Dq Li [:upper:] 219appears in the same relative position in 220.Ar string2 , 221it represents the characters pairs from the 222.Dv toupper 223mapping in the 224.Ev LC_CTYPE 225category of the current locale. 226When 227.Dq Li [:upper:] 228appears in 229.Ar string1 230and 231.Dq Li [:lower:] 232appears in the same relative position in 233.Ar string2 , 234it represents the characters pairs from the 235.Dv tolower 236mapping in the 237.Ev LC_CTYPE 238category of the current locale. 239.Pp 240With the exception of case conversion, 241characters in the classes are in unspecified order. 242.Pp 243For specific information as to which 244.Tn ASCII 245characters are included 246in these classes, see 247.Xr ctype 3 248and related manual pages. 249.It [=equiv=] 250Represents all characters belonging to the same equivalence class as 251.Ar equiv , 252ordered by their encoded values. 253.It [#*n] 254Represents 255.Ar n 256repeated occurrences of the character represented by 257.Ar # . 258This 259expression is only valid when it occurs in 260.Ar string2 . 261If 262.Ar n 263is omitted or is zero, it is be interpreted as large enough to extend 264.Ar string2 265sequence to the length of 266.Ar string1 . 267If 268.Ar n 269has a leading zero, it is interpreted as an octal value, otherwise, 270it is interpreted as a decimal value. 271.El 272.Sh ENVIRONMENT 273The 274.Ev LANG , LC_ALL , LC_CTYPE 275and 276.Ev LC_COLLATE 277environment variables affect the execution of 278.Nm 279as described in 280.Xr environ 7 . 281.Sh EXIT STATUS 282.Ex -std 283.Sh EXAMPLES 284The following examples are shown as given to the shell: 285.Pp 286Create a list of the words in file1, one per line, where a word is taken to 287be a maximal string of letters. 288.Pp 289.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1" 290.Pp 291Translate the contents of file1 to upper-case. 292.Pp 293.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1" 294.Pp 295(This should be preferred over the traditional 296.Ux 297idiom of 298.Dq Li "tr a-z A-Z" , 299since it works correctly in all locales.) 300.Pp 301Strip out non-printable characters from file1. 302.Pp 303.D1 Li "tr -cd \*q[:print:]\*q < file1" 304.Pp 305Remove diacritical marks from all accented variants of the letter 306.Ql e : 307.Pp 308.Dl "tr \*q[=e=]\*q \*qe\*q" 309.Sh COMPATIBILITY 310Previous 311.Fx 312implementations of 313.Nm 314did not order characters in range expressions according to the current 315locale's collation order, making it possible to convert unaccented Latin 316characters (esp.\& as found in English text) from upper to lower case using 317the traditional 318.Ux 319idiom of 320.Dq Li "tr A-Z a-z" . 321Since 322.Nm 323now obeys the locale's collation order, this idiom may not produce 324correct results when there is not a 1:1 mapping between lower and 325upper case, or when the order of characters within the two cases differs. 326As noted in the 327.Sx EXAMPLES 328section above, the character class expressions 329.Dq Li [:lower:] 330and 331.Dq Li [:upper:] 332should be used instead of explicit character ranges like 333.Dq Li a-z 334and 335.Dq Li A-Z . 336.Pp 337.Dq Li [=equiv=] 338expression and collation for ranges 339are implemented for single byte locales only. 340.Pp 341System V has historically implemented character ranges using the syntax 342.Dq Li [c-c] 343instead of the 344.Dq Li c-c 345used by historic 346.Bx 347implementations and 348standardized by POSIX. 349System V shell scripts should work under this implementation as long as 350the range is intended to map in another range, i.e., the command 351.Dq Li "tr [a-z] [A-Z]" 352will work as it will map the 353.Ql \&[ 354character in 355.Ar string1 356to the 357.Ql \&[ 358character in 359.Ar string2 . 360However, if the shell script is deleting or squeezing characters as in 361the command 362.Dq Li "tr -d [a-z]" , 363the characters 364.Ql \&[ 365and 366.Ql \&] 367will be 368included in the deletion or compression list which would not have happened 369under a historic System V implementation. 370Additionally, any scripts that depended on the sequence 371.Dq Li a-z 372to 373represent the three characters 374.Ql a , 375.Ql \- 376and 377.Ql z 378will have to be 379rewritten as 380.Dq Li a\e-z . 381.Pp 382The 383.Nm 384utility has historically not permitted the manipulation of NUL bytes in 385its input and, additionally, stripped NUL's from its input stream. 386This implementation has removed this behavior as a bug. 387.Pp 388The 389.Nm 390utility has historically been extremely forgiving of syntax errors, 391for example, the 392.Fl c 393and 394.Fl s 395options were ignored unless two strings were specified. 396This implementation will not permit illegal syntax. 397.Sh STANDARDS 398The 399.Nm 400utility conforms to 401.St -p1003.1-2001 . 402The 403.Dq ideogram , 404.Dq phonogram , 405.Dq rune , 406and 407.Dq special 408character classes are extensions. 409.Pp 410It should be noted that the feature wherein the last character of 411.Ar string2 412is duplicated if 413.Ar string2 414has less characters than 415.Ar string1 416is permitted by POSIX but is not required. 417Shell scripts attempting to be portable to other POSIX systems should use 418the 419.Dq Li [#*] 420convention instead of relying on this behavior. 421The 422.Fl u 423option is an extension to the 424.St -p1003.1-2001 425standard. 426