1.\" Copyright (c) 1991, 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" This code is derived from software contributed to Berkeley by 5.\" the Institute of Electrical and Electronics Engineers, Inc. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 3. Neither the name of the University nor the names of its contributors 16.\" may be used to endorse or promote products derived from this software 17.\" without specific prior written permission. 18.\" 19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 22.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 29.\" SUCH DAMAGE. 30.\" 31.Dd October 13, 2006 32.Dt TR 1 33.Os 34.Sh NAME 35.Nm tr 36.Nd translate characters 37.Sh SYNOPSIS 38.Nm 39.Op Fl Ccsu 40.Ar string1 string2 41.Nm 42.Op Fl Ccu 43.Fl d 44.Ar string1 45.Nm 46.Op Fl Ccu 47.Fl s 48.Ar string1 49.Nm 50.Op Fl Ccu 51.Fl ds 52.Ar string1 string2 53.Sh DESCRIPTION 54The 55.Nm 56utility copies the standard input to the standard output with substitution 57or deletion of selected characters. 58.Pp 59The following options are available: 60.Bl -tag -width Ds 61.It Fl C 62Complement the set of characters in 63.Ar string1 , 64that is 65.Dq Fl C Li ab 66includes every character except for 67.Ql a 68and 69.Ql b . 70.It Fl c 71Same as 72.Fl C 73but complement the set of values in 74.Ar string1 . 75.It Fl d 76Delete characters in 77.Ar string1 78from the input. 79.It Fl s 80Squeeze multiple occurrences of the characters listed in the last 81operand (either 82.Ar string1 83or 84.Ar string2 ) 85in the input into a single instance of the character. 86This occurs after all deletion and translation is completed. 87.It Fl u 88Guarantee that any output is unbuffered. 89.El 90.Pp 91In the first synopsis form, the characters in 92.Ar string1 93are translated into the characters in 94.Ar string2 95where the first character in 96.Ar string1 97is translated into the first character in 98.Ar string2 99and so on. 100If 101.Ar string1 102is longer than 103.Ar string2 , 104the last character found in 105.Ar string2 106is duplicated until 107.Ar string1 108is exhausted. 109.Pp 110In the second synopsis form, the characters in 111.Ar string1 112are deleted from the input. 113.Pp 114In the third synopsis form, the characters in 115.Ar string1 116are compressed as described for the 117.Fl s 118option. 119.Pp 120In the fourth synopsis form, the characters in 121.Ar string1 122are deleted from the input, and the characters in 123.Ar string2 124are compressed as described for the 125.Fl s 126option. 127.Pp 128The following conventions can be used in 129.Ar string1 130and 131.Ar string2 132to specify sets of characters: 133.Bl -tag -width [:equiv:] 134.It character 135Any character not described by one of the following conventions 136represents itself. 137.It \eoctal 138A backslash followed by 1, 2 or 3 octal digits represents a character 139with that encoded value. 140To follow an octal sequence with a digit as a character, left zero-pad 141the octal sequence to the full 3 octal digits. 142.It \echaracter 143A backslash followed by certain special characters maps to special 144values. 145.Bl -column "\ea" 146.It "\ea <alert character>" 147.It "\eb <backspace>" 148.It "\ef <form-feed>" 149.It "\en <newline>" 150.It "\er <carriage return>" 151.It "\et <tab>" 152.It "\ev <vertical tab>" 153.El 154.Pp 155A backslash followed by any other character maps to that character. 156.It c-c 157For non-octal range endpoints 158represents the range of characters between the range endpoints, inclusive, 159in ascending order, 160as defined by the collation sequence. 161If either or both of the range endpoints are octal sequences, it 162represents the range of specific coded values between the 163range endpoints, inclusive. 164.Pp 165.Bf Em 166See the 167.Sx COMPATIBILITY 168section below for an important note regarding 169differences in the way the current 170implementation interprets range expressions differently from 171previous implementations. 172.Ef 173.It [:class:] 174Represents all characters belonging to the defined character class. 175Class names are: 176.Bl -column "phonogram" 177.It "alnum <alphanumeric characters>" 178.It "alpha <alphabetic characters>" 179.It "blank <whitespace characters>" 180.It "cntrl <control characters>" 181.It "digit <numeric characters>" 182.It "graph <graphic characters>" 183.It "ideogram <ideographic characters>" 184.It "lower <lower-case alphabetic characters>" 185.It "phonogram <phonographic characters>" 186.It "print <printable characters>" 187.It "punct <punctuation characters>" 188.It "rune <valid characters>" 189.It "space <space characters>" 190.It "special <special characters>" 191.It "upper <upper-case characters>" 192.It "xdigit <hexadecimal characters>" 193.El 194.Pp 195.\" All classes may be used in 196.\" .Ar string1 , 197.\" and in 198.\" .Ar string2 199.\" when both the 200.\" .Fl d 201.\" and 202.\" .Fl s 203.\" options are specified. 204.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in 205.\" .Ar string2 206.\" and then only when the corresponding class (``upper'' for ``lower'' 207.\" and vice-versa) is specified in the same relative position in 208.\" .Ar string1 . 209.\" .Pp 210When 211.Dq Li [:lower:] 212appears in 213.Ar string1 214and 215.Dq Li [:upper:] 216appears in the same relative position in 217.Ar string2 , 218it represents the characters pairs from the 219.Dv toupper 220mapping in the 221.Ev LC_CTYPE 222category of the current locale. 223When 224.Dq Li [:upper:] 225appears in 226.Ar string1 227and 228.Dq Li [:lower:] 229appears in the same relative position in 230.Ar string2 , 231it represents the characters pairs from the 232.Dv tolower 233mapping in the 234.Ev LC_CTYPE 235category of the current locale. 236.Pp 237With the exception of case conversion, 238characters in the classes are in unspecified order. 239.Pp 240For specific information as to which 241.Tn ASCII 242characters are included 243in these classes, see 244.Xr ctype 3 245and related manual pages. 246.It [=equiv=] 247Represents all characters belonging to the same equivalence class as 248.Ar equiv , 249ordered by their encoded values. 250.It [#*n] 251Represents 252.Ar n 253repeated occurrences of the character represented by 254.Ar # . 255This 256expression is only valid when it occurs in 257.Ar string2 . 258If 259.Ar n 260is omitted or is zero, it is be interpreted as large enough to extend 261.Ar string2 262sequence to the length of 263.Ar string1 . 264If 265.Ar n 266has a leading zero, it is interpreted as an octal value, otherwise, 267it is interpreted as a decimal value. 268.El 269.Sh ENVIRONMENT 270The 271.Ev LANG , LC_ALL , LC_CTYPE 272and 273.Ev LC_COLLATE 274environment variables affect the execution of 275.Nm 276as described in 277.Xr environ 7 . 278.Sh EXIT STATUS 279.Ex -std 280.Sh EXAMPLES 281The following examples are shown as given to the shell: 282.Pp 283Create a list of the words in file1, one per line, where a word is taken to 284be a maximal string of letters. 285.Pp 286.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1" 287.Pp 288Translate the contents of file1 to upper-case. 289.Pp 290.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1" 291.Pp 292(This should be preferred over the traditional 293.Ux 294idiom of 295.Dq Li "tr a-z A-Z" , 296since it works correctly in all locales.) 297.Pp 298Strip out non-printable characters from file1. 299.Pp 300.D1 Li "tr -cd \*q[:print:]\*q < file1" 301.Pp 302Remove diacritical marks from all accented variants of the letter 303.Ql e : 304.Pp 305.Dl "tr \*q[=e=]\*q \*qe\*q" 306.Sh COMPATIBILITY 307Previous 308.Fx 309implementations of 310.Nm 311did not order characters in range expressions according to the current 312locale's collation order, making it possible to convert unaccented Latin 313characters (esp.\& as found in English text) from upper to lower case using 314the traditional 315.Ux 316idiom of 317.Dq Li "tr A-Z a-z" . 318Since 319.Nm 320now obeys the locale's collation order, this idiom may not produce 321correct results when there is not a 1:1 mapping between lower and 322upper case, or when the order of characters within the two cases differs. 323As noted in the 324.Sx EXAMPLES 325section above, the character class expressions 326.Dq Li [:lower:] 327and 328.Dq Li [:upper:] 329should be used instead of explicit character ranges like 330.Dq Li a-z 331and 332.Dq Li A-Z . 333.Pp 334.Dq Li [=equiv=] 335expression and collation for ranges 336are implemented for single byte locales only. 337.Pp 338System V has historically implemented character ranges using the syntax 339.Dq Li [c-c] 340instead of the 341.Dq Li c-c 342used by historic 343.Bx 344implementations and 345standardized by POSIX. 346System V shell scripts should work under this implementation as long as 347the range is intended to map in another range, i.e., the command 348.Dq Li "tr [a-z] [A-Z]" 349will work as it will map the 350.Ql \&[ 351character in 352.Ar string1 353to the 354.Ql \&[ 355character in 356.Ar string2 . 357However, if the shell script is deleting or squeezing characters as in 358the command 359.Dq Li "tr -d [a-z]" , 360the characters 361.Ql \&[ 362and 363.Ql \&] 364will be 365included in the deletion or compression list which would not have happened 366under a historic System V implementation. 367Additionally, any scripts that depended on the sequence 368.Dq Li a-z 369to 370represent the three characters 371.Ql a , 372.Ql \- 373and 374.Ql z 375will have to be 376rewritten as 377.Dq Li a\e-z . 378.Pp 379The 380.Nm 381utility has historically not permitted the manipulation of NUL bytes in 382its input and, additionally, stripped NUL's from its input stream. 383This implementation has removed this behavior as a bug. 384.Pp 385The 386.Nm 387utility has historically been extremely forgiving of syntax errors, 388for example, the 389.Fl c 390and 391.Fl s 392options were ignored unless two strings were specified. 393This implementation will not permit illegal syntax. 394.Sh STANDARDS 395The 396.Nm 397utility conforms to 398.St -p1003.1-2001 . 399The 400.Dq ideogram , 401.Dq phonogram , 402.Dq rune , 403and 404.Dq special 405character classes are extensions. 406.Pp 407It should be noted that the feature wherein the last character of 408.Ar string2 409is duplicated if 410.Ar string2 411has less characters than 412.Ar string1 413is permitted by POSIX but is not required. 414Shell scripts attempting to be portable to other POSIX systems should use 415the 416.Dq Li [#*] 417convention instead of relying on this behavior. 418The 419.Fl u 420option is an extension to the 421.St -p1003.1-2001 422standard. 423