1.\" Copyright (c) 1991, 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" This code is derived from software contributed to Berkeley by 5.\" the Institute of Electrical and Electronics Engineers, Inc. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 4. Neither the name of the University nor the names of its contributors 16.\" may be used to endorse or promote products derived from this software 17.\" without specific prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 22.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 29.\" SUCH DAMAGE. 30.\" 31.\" @(#)tr.1 8.1 (Berkeley) 6/6/93 32.\" $FreeBSD$ 33.\" 34.Dd October 13, 2006 35.Dt TR 1 36.Os 37.Sh NAME 38.Nm tr 39.Nd translate characters 40.Sh SYNOPSIS 41.Nm 42.Op Fl Ccsu 43.Ar string1 string2 44.Nm 45.Op Fl Ccu 46.Fl d 47.Ar string1 48.Nm 49.Op Fl Ccu 50.Fl s 51.Ar string1 52.Nm 53.Op Fl Ccu 54.Fl ds 55.Ar string1 string2 56.Sh DESCRIPTION 57The 58.Nm 59utility copies the standard input to the standard output with substitution 60or deletion of selected characters. 61.Pp 62The following options are available: 63.Bl -tag -width Ds 64.It Fl C 65Complement the set of characters in 66.Ar string1 , 67that is 68.Dq Fl C Li ab 69includes every character except for 70.Ql a 71and 72.Ql b . 73.It Fl c 74Same as 75.Fl C 76but complement the set of values in 77.Ar string1 . 78.It Fl d 79Delete characters in 80.Ar string1 81from the input. 82.It Fl s 83Squeeze multiple occurrences of the characters listed in the last 84operand (either 85.Ar string1 86or 87.Ar string2 ) 88in the input into a single instance of the character. 89This occurs after all deletion and translation is completed. 90.It Fl u 91Guarantee that any output is unbuffered. 92.El 93.Pp 94In the first synopsis form, the characters in 95.Ar string1 96are translated into the characters in 97.Ar string2 98where the first character in 99.Ar string1 100is translated into the first character in 101.Ar string2 102and so on. 103If 104.Ar string1 105is longer than 106.Ar string2 , 107the last character found in 108.Ar string2 109is duplicated until 110.Ar string1 111is exhausted. 112.Pp 113In the second synopsis form, the characters in 114.Ar string1 115are deleted from the input. 116.Pp 117In the third synopsis form, the characters in 118.Ar string1 119are compressed as described for the 120.Fl s 121option. 122.Pp 123In the fourth synopsis form, the characters in 124.Ar string1 125are deleted from the input, and the characters in 126.Ar string2 127are compressed as described for the 128.Fl s 129option. 130.Pp 131The following conventions can be used in 132.Ar string1 133and 134.Ar string2 135to specify sets of characters: 136.Bl -tag -width [:equiv:] 137.It character 138Any character not described by one of the following conventions 139represents itself. 140.It \eoctal 141A backslash followed by 1, 2 or 3 octal digits represents a character 142with that encoded value. 143To follow an octal sequence with a digit as a character, left zero-pad 144the octal sequence to the full 3 octal digits. 145.It \echaracter 146A backslash followed by certain special characters maps to special 147values. 148.Bl -column "\ea" 149.It "\ea <alert character>" 150.It "\eb <backspace>" 151.It "\ef <form-feed>" 152.It "\en <newline>" 153.It "\er <carriage return>" 154.It "\et <tab>" 155.It "\ev <vertical tab>" 156.El 157.Pp 158A backslash followed by any other character maps to that character. 159.It c-c 160For non-octal range endpoints 161represents the range of characters between the range endpoints, inclusive, 162in ascending order, 163as defined by the collation sequence. 164If either or both of the range endpoints are octal sequences, it 165represents the range of specific coded values between the 166range endpoints, inclusive. 167.Pp 168.Bf Em 169See the 170.Sx COMPATIBILITY 171section below for an important note regarding 172differences in the way the current 173implementation interprets range expressions differently from 174previous implementations. 175.Ef 176.It [:class:] 177Represents all characters belonging to the defined character class. 178Class names are: 179.Bl -column "phonogram" 180.It "alnum <alphanumeric characters>" 181.It "alpha <alphabetic characters>" 182.It "blank <whitespace characters>" 183.It "cntrl <control characters>" 184.It "digit <numeric characters>" 185.It "graph <graphic characters>" 186.It "ideogram <ideographic characters>" 187.It "lower <lower-case alphabetic characters>" 188.It "phonogram <phonographic characters>" 189.It "print <printable characters>" 190.It "punct <punctuation characters>" 191.It "rune <valid characters>" 192.It "space <space characters>" 193.It "special <special characters>" 194.It "upper <upper-case characters>" 195.It "xdigit <hexadecimal characters>" 196.El 197.Pp 198.\" All classes may be used in 199.\" .Ar string1 , 200.\" and in 201.\" .Ar string2 202.\" when both the 203.\" .Fl d 204.\" and 205.\" .Fl s 206.\" options are specified. 207.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in 208.\" .Ar string2 209.\" and then only when the corresponding class (``upper'' for ``lower'' 210.\" and vice-versa) is specified in the same relative position in 211.\" .Ar string1 . 212.\" .Pp 213When 214.Dq Li [:lower:] 215appears in 216.Ar string1 217and 218.Dq Li [:upper:] 219appears in the same relative position in 220.Ar string2 , 221it represents the characters pairs from the 222.Dv toupper 223mapping in the 224.Ev LC_CTYPE 225category of the current locale. 226When 227.Dq Li [:upper:] 228appears in 229.Ar string1 230and 231.Dq Li [:lower:] 232appears in the same relative position in 233.Ar string2 , 234it represents the characters pairs from the 235.Dv tolower 236mapping in the 237.Ev LC_CTYPE 238category of the current locale. 239.Pp 240With the exception of case conversion, 241characters in the classes are in unspecified order. 242.Pp 243For specific information as to which 244.Tn ASCII 245characters are included 246in these classes, see 247.Xr ctype 3 248and related manual pages. 249.It [=equiv=] 250Represents all characters belonging to the same equivalence class as 251.Ar equiv , 252ordered by their encoded values. 253.It [#*n] 254Represents 255.Ar n 256repeated occurrences of the character represented by 257.Ar # . 258This 259expression is only valid when it occurs in 260.Ar string2 . 261If 262.Ar n 263is omitted or is zero, it is be interpreted as large enough to extend 264.Ar string2 265sequence to the length of 266.Ar string1 . 267If 268.Ar n 269has a leading zero, it is interpreted as an octal value, otherwise, 270it is interpreted as a decimal value. 271.El 272.Sh ENVIRONMENT 273The 274.Ev LANG , LC_ALL , LC_CTYPE 275and 276.Ev LC_COLLATE 277environment variables affect the execution of 278.Nm 279as described in 280.Xr environ 7 . 281.Sh EXIT STATUS 282.Ex -std 283.Sh EXAMPLES 284The following examples are shown as given to the shell: 285.Pp 286Create a list of the words in file1, one per line, where a word is taken to 287be a maximal string of letters. 288.Pp 289.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1" 290.Pp 291Translate the contents of file1 to upper-case. 292.Pp 293.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1" 294.Pp 295(This should be preferred over the traditional 296.Ux 297idiom of 298.Dq Li "tr a-z A-Z" , 299since it works correctly in all locales.) 300.Pp 301Strip out non-printable characters from file1. 302.Pp 303.D1 Li "tr -cd \*q[:print:]\*q < file1" 304.Pp 305Remove diacritical marks from all accented variants of the letter 306.Ql e : 307.Pp 308.Dl "tr \*q[=e=]\*q \*qe\*q" 309.Sh COMPATIBILITY 310Previous 311.Fx 312implementations of 313.Nm 314did not order characters in range expressions according to the current 315locale's collation order, making it possible to convert unaccented Latin 316characters (esp.\& as found in English text) from upper to lower case using 317the traditional 318.Ux 319idiom of 320.Dq Li "tr A-Z a-z" . 321Since 322.Nm 323now obeys the locale's collation order, this idiom may not produce 324correct results when there is not a 1:1 mapping between lower and 325upper case, or when the order of characters within the two cases differs. 326As noted in the 327.Sx EXAMPLES 328section above, the character class expressions 329.Dq Li [:lower:] 330and 331.Dq Li [:upper:] 332should be used instead of explicit character ranges like 333.Dq Li a-z 334and 335.Dq Li A-Z . 336.Pp 337System V has historically implemented character ranges using the syntax 338.Dq Li [c-c] 339instead of the 340.Dq Li c-c 341used by historic 342.Bx 343implementations and 344standardized by POSIX. 345System V shell scripts should work under this implementation as long as 346the range is intended to map in another range, i.e., the command 347.Dq Li "tr [a-z] [A-Z]" 348will work as it will map the 349.Ql \&[ 350character in 351.Ar string1 352to the 353.Ql \&[ 354character in 355.Ar string2 . 356However, if the shell script is deleting or squeezing characters as in 357the command 358.Dq Li "tr -d [a-z]" , 359the characters 360.Ql \&[ 361and 362.Ql \&] 363will be 364included in the deletion or compression list which would not have happened 365under a historic System V implementation. 366Additionally, any scripts that depended on the sequence 367.Dq Li a-z 368to 369represent the three characters 370.Ql a , 371.Ql \- 372and 373.Ql z 374will have to be 375rewritten as 376.Dq Li a\e-z . 377.Pp 378The 379.Nm 380utility has historically not permitted the manipulation of NUL bytes in 381its input and, additionally, stripped NUL's from its input stream. 382This implementation has removed this behavior as a bug. 383.Pp 384The 385.Nm 386utility has historically been extremely forgiving of syntax errors, 387for example, the 388.Fl c 389and 390.Fl s 391options were ignored unless two strings were specified. 392This implementation will not permit illegal syntax. 393.Sh STANDARDS 394The 395.Nm 396utility conforms to 397.St -p1003.1-2001 . 398The 399.Dq ideogram , 400.Dq phonogram , 401.Dq rune , 402and 403.Dq special 404character classes are extensions. 405.Pp 406It should be noted that the feature wherein the last character of 407.Ar string2 408is duplicated if 409.Ar string2 410has less characters than 411.Ar string1 412is permitted by POSIX but is not required. 413Shell scripts attempting to be portable to other POSIX systems should use 414the 415.Dq Li [#*] 416convention instead of relying on this behavior. 417The 418.Fl u 419option is an extension to the 420.St -p1003.1-2001 421standard. 422