1.\" $OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $ 2.\" 3.\" Copyright (C) Lucent Technologies 1997 4.\" All Rights Reserved 5.\" 6.\" Permission to use, copy, modify, and distribute this software and 7.\" its documentation for any purpose and without fee is hereby 8.\" granted, provided that the above copyright notice appear in all 9.\" copies and that both that the copyright notice and this 10.\" permission notice and warranty disclaimer appear in supporting 11.\" documentation, and that the name Lucent Technologies or any of 12.\" its entities not be used in advertising or publicity pertaining 13.\" to distribution of the software without specific, written prior 14.\" permission. 15.\" 16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 23.\" THIS SOFTWARE. 24.Dd July 30, 2021 25.Dt AWK 1 26.Os 27.Sh NAME 28.Nm awk 29.Nd pattern-directed scanning and processing language 30.Sh SYNOPSIS 31.Nm awk 32.Op Fl safe 33.Op Fl version 34.Op Fl d Ns Op Ar n 35.Op Fl F Ar fs 36.Op Fl v Ar var Ns = Ns Ar value 37.Op Ar prog | Fl f Ar progfile 38.Ar 39.Sh DESCRIPTION 40.Nm 41scans each input 42.Ar file 43for lines that match any of a set of patterns specified literally in 44.Ar prog 45or in one or more files specified as 46.Fl f Ar progfile . 47With each pattern there can be an associated action that will be performed 48when a line of a 49.Ar file 50matches the pattern. 51Each line is matched against the 52pattern portion of every pattern-action statement; 53the associated action is performed for each matched pattern. 54The file name 55.Sq - 56means the standard input. 57Any 58.Ar file 59of the form 60.Ar var Ns = Ns Ar value 61is treated as an assignment, not a filename, 62and is executed at the time it would have been opened if it were a filename. 63.Pp 64The options are as follows: 65.Bl -tag -width "-safe " 66.It Fl d Ns Op Ar n 67Debug mode. 68Set debug level to 69.Ar n , 70or 1 if 71.Ar n 72is not specified. 73A value greater than 1 causes 74.Nm 75to dump core on fatal errors. 76.It Fl F Ar fs 77Define the input field separator to be the regular expression 78.Ar fs . 79.It Fl f Ar progfile 80Read program code from the specified file 81.Ar progfile 82instead of from the command line. 83.It Fl safe 84Disable file output 85.Pf ( Ic print No > , 86.Ic print No >> ) , 87process creation 88.Po 89.Ar cmd | Ic getline , 90.Ic print | , 91.Ic system 92.Pc 93and access to the environment 94.Pf ( Va ENVIRON ; 95see the section on variables below). 96This is a first 97.Pq and not very reliable 98approximation to a 99.Dq safe 100version of 101.Nm . 102.It Fl version 103Print the version number of 104.Nm 105to standard output and exit. 106.It Fl v Ar var Ns = Ns Ar value 107Assign 108.Ar value 109to variable 110.Ar var 111before 112.Ar prog 113is executed; 114any number of 115.Fl v 116options may be present. 117.El 118.Pp 119The input is normally made up of input lines 120.Pq records 121separated by newlines, or by the value of 122.Va RS . 123If 124.Va RS 125is null, then any number of blank lines are used as the record separator, 126and newlines are used as field separators 127(in addition to the value of 128.Va FS ) . 129This is convenient when working with multi-line records. 130.Pp 131An input line is normally made up of fields separated by whitespace, 132or by the extended regular expression 133.Va FS 134as described below. 135The fields are denoted 136.Va $1 , $2 , ... , 137while 138.Va $0 139refers to the entire line. 140If 141.Va FS 142is null, the input line is split into one field per character. 143While both gawk and mawk have the same behavior, it is unspecified in the 144.St -p1003.1-2008 145standard. 146If 147.Va FS 148is a single space, then leading and trailing blank and newline characters are 149skipped. 150Fields are delimited by one or more blank or newline characters. 151A blank character is a space or a tab. 152If 153.Va FS 154is a single character, other than space, fields are delimited by each single 155occurrence of that character. 156The 157.Va FS 158variable defaults to a single space. 159.Pp 160Normally, any number of blanks separate fields. 161In order to set the field separator to a single blank, use the 162.Fl F 163option with a value of 164.Sq [\ \&] . 165If a field separator of 166.Sq t 167is specified, 168.Nm 169treats it as if 170.Sq \et 171had been specified and uses 172.Aq TAB 173as the field separator. 174In order to use a literal 175.Sq t 176as the field separator, use the 177.Fl F 178option with a value of 179.Sq [t] . 180.Pp 181A pattern-action statement has the form 182.Pp 183.D1 Ar pattern Ic \&{ Ar action Ic \&} 184.Pp 185A missing 186.Ic \&{ Ar action Ic \&} 187means print the line; 188a missing pattern always matches. 189Pattern-action statements are separated by newlines or semicolons. 190.Pp 191Newlines are permitted after a terminating statement or following a comma 192.Pq Sq ,\& , 193an open brace 194.Pq Sq { , 195a logical AND 196.Pq Sq && , 197a logical OR 198.Pq Sq || , 199after the 200.Sq do 201or 202.Sq else 203keywords, 204or after the closing parenthesis of an 205.Sq if , 206.Sq for , 207or 208.Sq while 209statement. 210Additionally, a backslash 211.Pq Sq \e 212can be used to escape a newline between tokens. 213.Pp 214An action is a sequence of statements. 215A statement can be one of the following: 216.Pp 217.Bl -tag -width Ds -offset indent -compact 218.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement 219.It Ic while Ar ( expression ) Ar statement 220.It Ic for Ar ( expression ; expression ; expression ) statement 221.It Ic for Ar ( var Ic in Ar array ) statement 222.It Ic do Ar statement Ic while Ar ( expression ) 223.It Ic break 224.It Ic continue 225.It Xo Ic { 226.Op Ar statement ... 227.Ic } 228.Xc 229.It Xo Ar expression 230.No # commonly 231.Ar var No = Ar expression 232.Xc 233.It Xo Ic print 234.Op Ar expression-list 235.Op > Ns Ar expression 236.Xc 237.It Xo Ic printf Ar format 238.Op Ar ... , expression-list 239.Op > Ns Ar expression 240.Xc 241.It Ic return Op Ar expression 242.It Xo Ic next 243.No # skip remaining patterns on this input line 244.Xc 245.It Xo Ic nextfile 246.No # skip rest of this file, open next, start at top 247.Xc 248.It Xo Ic delete 249.Sm off 250.Ar array Ic \&[ Ar expression Ic \&] 251.Sm on 252.No # delete an array element 253.Xc 254.It Xo Ic delete Ar array 255.No # delete all elements of array 256.Xc 257.It Xo Ic exit 258.Op Ar expression 259.No # exit immediately; status is Ar expression 260.Xc 261.El 262.Pp 263Statements are terminated by 264semicolons, newlines or right braces. 265An empty 266.Ar expression-list 267stands for 268.Ar $0 . 269String constants are quoted 270.Li \&"" , 271with the usual C escapes recognized within 272(see 273.Xr printf 1 274for a complete list of these). 275Expressions take on string or numeric values as appropriate, 276and are built using the operators 277.Ic + \- * / % ^ 278.Pq exponentiation , 279and concatenation 280.Pq indicated by whitespace . 281The operators 282.Ic \&! ++ \-\- += \-= *= /= %= ^= 283.Ic > >= < <= == != ?\&: 284are also available in expressions. 285Variables may be scalars, array elements 286(denoted 287.Li x[i] ) 288or fields. 289Variables are initialized to the null string. 290Array subscripts may be any string, 291not necessarily numeric; 292this allows for a form of associative memory. 293Multiple subscripts such as 294.Li [i,j,k] 295are permitted; the constituents are concatenated, 296separated by the value of 297.Va SUBSEP 298.Pq see the section on variables below . 299.Pp 300The 301.Ic print 302statement prints its arguments on the standard output 303(or on a file if 304.Pf > Ar file 305or 306.Pf >> Ar file 307is present or on a pipe if 308.Pf |\ \& Ar cmd 309is present), separated by the current output field separator, 310and terminated by the output record separator. 311.Ar file 312and 313.Ar cmd 314may be literal names or parenthesized expressions; 315identical string values in different statements denote 316the same open file. 317The 318.Ic printf 319statement formats its expression list according to the format 320(see 321.Xr printf 1 ) . 322.Pp 323Patterns are arbitrary Boolean combinations 324(with 325.Ic "\&! || &&" ) 326of regular expressions and 327relational expressions. 328.Nm 329supports extended regular expressions 330.Pq EREs . 331See 332.Xr re_format 7 333for more information on regular expressions. 334Isolated regular expressions 335in a pattern apply to the entire line. 336Regular expressions may also occur in 337relational expressions, using the operators 338.Ic ~ 339and 340.Ic !~ . 341.Pf / Ar re Ns / 342is a constant regular expression; 343any string (constant or variable) may be used 344as a regular expression, except in the position of an isolated regular expression 345in a pattern. 346.Pp 347A pattern may consist of two patterns separated by a comma; 348in this case, the action is performed for all lines 349from an occurrence of the first pattern 350through an occurrence of the second. 351.Pp 352A relational expression is one of the following: 353.Pp 354.Bl -tag -width Ds -offset indent -compact 355.It Ar expression matchop regular-expression 356.It Ar expression relop expression 357.It Ar expression Ic in Ar array-name 358.It Xo Ic \&( Ns 359.Ar expr , expr , \&... Ns Ic \&) in 360.Ar array-name 361.Xc 362.El 363.Pp 364where a 365.Ar relop 366is any of the six relational operators in C, and a 367.Ar matchop 368is either 369.Ic ~ 370(matches) 371or 372.Ic !~ 373(does not match). 374A conditional is an arithmetic expression, 375a relational expression, 376or a Boolean combination 377of these. 378.Pp 379The special patterns 380.Ic BEGIN 381and 382.Ic END 383may be used to capture control before the first input line is read 384and after the last. 385.Ic BEGIN 386and 387.Ic END 388do not combine with other patterns. 389.Pp 390Variable names with special meanings: 391.Pp 392.Bl -tag -width "FILENAME " -compact 393.It Va ARGC 394Argument count, assignable. 395.It Va ARGV 396Argument array, assignable; 397non-null members are taken as filenames. 398.It Va CONVFMT 399Conversion format when converting numbers 400(default 401.Qq Li %.6g ) . 402.It Va ENVIRON 403Array of environment variables; subscripts are names. 404.It Va FILENAME 405The name of the current input file. 406.It Va FNR 407Ordinal number of the current record in the current file. 408.It Va FS 409Regular expression used to separate fields; also settable 410by option 411.Fl F Ar fs . 412.It Va NF 413Number of fields in the current record. 414.Va $NF 415can be used to obtain the value of the last field in the current record. 416.It Va NR 417Ordinal number of the current record. 418.It Va OFMT 419Output format for numbers (default 420.Qq Li %.6g ) . 421.It Va OFS 422Output field separator (default blank). 423.It Va ORS 424Output record separator (default newline). 425.It Va RLENGTH 426The length of the string matched by the 427.Fn match 428function. 429.It Va RS 430Input record separator (default newline). 431.It Va RSTART 432The starting position of the string matched by the 433.Fn match 434function. 435.It Va SUBSEP 436Separates multiple subscripts (default 034). 437.El 438.Sh FUNCTIONS 439The awk language has a variety of built-in functions: 440arithmetic, string, input/output, general, and bit-operation. 441.Pp 442Functions may be defined (at the position of a pattern-action statement) 443thusly: 444.Pp 445.Dl function foo(a, b, c) { ...; return x } 446.Pp 447Parameters are passed by value if scalar, and by reference if array name; 448functions may be called recursively. 449Parameters are local to the function; all other variables are global. 450Thus local variables may be created by providing excess parameters in 451the function definition. 452.Ss Arithmetic Functions 453.Bl -tag -width "atan2(y, x)" 454.It Fn atan2 y x 455Return the arctangent of 456.Fa y Ns / Ns Fa x 457in radians. 458.It Fn cos x 459Return the cosine of 460.Fa x , 461where 462.Fa x 463is in radians. 464.It Fn exp x 465Return the exponential of 466.Fa x . 467.It Fn int x 468Return 469.Fa x 470truncated to an integer value. 471.It Fn log x 472Return the natural logarithm of 473.Fa x . 474.It Fn rand 475Return a random number, 476.Fa n , 477such that 478.Sm off 479.Pf 0 \*(Le Fa n No \*(Lt 1 . 480.Sm on 481.It Fn sin x 482Return the sine of 483.Fa x , 484where 485.Fa x 486is in radians. 487.It Fn sqrt x 488Return the square root of 489.Fa x . 490.It Fn srand expr 491Sets seed for 492.Fn rand 493to 494.Fa expr 495and returns the previous seed. 496If 497.Fa expr 498is omitted, the time of day is used instead. 499.El 500.Ss String Functions 501.Bl -tag -width "split(s, a, fs)" 502.It Fn gsub r t s 503The same as 504.Fn sub 505except that all occurrences of the regular expression are replaced. 506.Fn gsub 507returns the number of replacements. 508.It Fn index s t 509The position in 510.Fa s 511where the string 512.Fa t 513occurs, or 0 if it does not. 514.It Fn length s 515The length of 516.Fa s 517taken as a string, 518or of 519.Va $0 520if no argument is given. 521.It Fn match s r 522The position in 523.Fa s 524where the regular expression 525.Fa r 526occurs, or 0 if it does not. 527The variable 528.Va RSTART 529is set to the starting position of the matched string 530.Pq which is the same as the returned value 531or zero if no match is found. 532The variable 533.Va RLENGTH 534is set to the length of the matched string, 535or \-1 if no match is found. 536.It Fn split s a fs 537Splits the string 538.Fa s 539into array elements 540.Va a[1] , a[2] , ... , a[n] 541and returns 542.Va n . 543The separation is done with the regular expression 544.Ar fs 545or with the field separator 546.Va FS 547if 548.Ar fs 549is not given. 550An empty string as field separator splits the string 551into one array element per character. 552.It Fn sprintf fmt expr ... 553The string resulting from formatting 554.Fa expr , ... 555according to the 556.Xr printf 1 557format 558.Fa fmt . 559.It Fn sub r t s 560Substitutes 561.Fa t 562for the first occurrence of the regular expression 563.Fa r 564in the string 565.Fa s . 566If 567.Fa s 568is not given, 569.Va $0 570is used. 571An ampersand 572.Pq Sq & 573in 574.Fa t 575is replaced in string 576.Fa s 577with regular expression 578.Fa r . 579A literal ampersand can be specified by preceding it with two backslashes 580.Pq Sq \e\e . 581A literal backslash can be specified by preceding it with another backslash 582.Pq Sq \e\e . 583.Fn sub 584returns the number of replacements. 585.It Fn substr s m n 586Return at most the 587.Fa n Ns -character 588substring of 589.Fa s 590that begins at position 591.Fa m 592counted from 1. 593If 594.Fa n 595is omitted, or if 596.Fa n 597specifies more characters than are left in the string, 598the length of the substring is limited by the length of 599.Fa s . 600.It Fn tolower str 601Returns a copy of 602.Fa str 603with all upper-case characters translated to their 604corresponding lower-case equivalents. 605.It Fn toupper str 606Returns a copy of 607.Fa str 608with all lower-case characters translated to their 609corresponding upper-case equivalents. 610.El 611.Ss Input/Output and General Functions 612.Bl -tag -width "getline [var] < file" 613.It Fn close expr 614Closes the file or pipe 615.Fa expr . 616.Fa expr 617should match the string that was used to open the file or pipe. 618.It Ar cmd | Ic getline Op Va var 619Read a record of input from a stream piped from the output of 620.Ar cmd . 621If 622.Va var 623is omitted, the variables 624.Va $0 625and 626.Va NF 627are set. 628Otherwise 629.Va var 630is set. 631If the stream is not open, it is opened. 632As long as the stream remains open, subsequent calls 633will read subsequent records from the stream. 634The stream remains open until explicitly closed with a call to 635.Fn close . 636.Ic getline 637returns 1 for a successful input, 0 for end of file, and \-1 for an error. 638.It Fn fflush [expr] 639Flushes any buffered output for the file or pipe 640.Fa expr , 641or all open files or pipes if 642.Fa expr 643is omitted. 644.Fa expr 645should match the string that was used to open the file or pipe. 646.It Ic getline 647Sets 648.Va $0 649to the next input record from the current input file. 650This form of 651.Ic getline 652sets the variables 653.Va NF , 654.Va NR , 655and 656.Va FNR . 657.Ic getline 658returns 1 for a successful input, 0 for end of file, and \-1 for an error. 659.It Ic getline Va var 660Sets 661.Va $0 662to variable 663.Va var . 664This form of 665.Ic getline 666sets the variables 667.Va NR 668and 669.Va FNR . 670.Ic getline 671returns 1 for a successful input, 0 for end of file, and \-1 for an error. 672.It Xo 673.Ic getline Op Va var 674.Pf \ \&< Ar file 675.Xc 676Sets 677.Va $0 678to the next record from 679.Ar file . 680If 681.Va var 682is omitted, the variables 683.Va $0 684and 685.Va NF 686are set. 687Otherwise 688.Va var 689is set. 690If 691.Ar file 692is not open, it is opened. 693As long as the stream remains open, subsequent calls will read subsequent 694records from 695.Ar file . 696.Ar file 697remains open until explicitly closed with a call to 698.Fn close . 699.It Fn system cmd 700Executes 701.Fa cmd 702and returns its exit status. 703.El 704.Ss Bit-Operation Functions 705.Bl -tag -width "lshift(a, b)" 706.It Fn compl x 707Returns the bitwise complement of integer argument x. 708.It Fn and v1 v2 ... 709Performs a bitwise AND on all arguments provided, as integers. 710There must be at least two values. 711.It Fn or v1 v2 ... 712Performs a bitwise OR on all arguments provided, as integers. 713There must be at least two values. 714.It Fn xor v1 v2 ... 715Performs a bitwise Exclusive-OR on all arguments provided, as integers. 716There must be at least two values. 717.It Fn lshift x n 718Returns integer argument x shifted by n bits to the left. 719.It Fn rshift x n 720Returns integer argument x shifted by n bits to the right. 721.El 722.Sh EXIT STATUS 723.Ex -std awk 724.Pp 725But note that the 726.Ic exit 727expression can modify the exit status. 728.Sh EXAMPLES 729Print lines longer than 72 characters: 730.Pp 731.Dl length($0) > 72 732.Pp 733Print first two fields in opposite order: 734.Pp 735.Dl { print $2, $1 } 736.Pp 737Same, with input fields separated by comma and/or blanks and tabs: 738.Bd -literal -offset indent 739BEGIN { FS = ",[ \et]*|[ \et]+" } 740 { print $2, $1 } 741.Ed 742.Pp 743Add up first column, print sum and average: 744.Bd -literal -offset indent 745{ s += $1 } 746END { print "sum is", s, " average is", s/NR } 747.Ed 748.Pp 749Print all lines between start/stop pairs: 750.Pp 751.Dl /start/, /stop/ 752.Pp 753Simulate echo(1): 754.Bd -literal -offset indent 755BEGIN { # Simulate echo(1) 756 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 757 printf "\en" 758 exit } 759.Ed 760.Pp 761Print an error message to standard error: 762.Bd -literal -offset indent 763{ print "error!" > "/dev/stderr" } 764.Ed 765.Sh SEE ALSO 766.Xr cut 1 , 767.Xr lex 1 , 768.Xr printf 1 , 769.Xr sed 1 , 770.Xr re_format 7 771.Rs 772.%A A. V. Aho 773.%A B. W. Kernighan 774.%A P. J. Weinberger 775.%T The AWK Programming Language 776.%I Addison-Wesley 777.%D 1988 778.%O ISBN 0-201-07981-X 779.Re 780.Sh STANDARDS 781The 782.Nm 783utility is compliant with the 784.St -p1003.1-2008 785specification, 786except 787.Nm 788does not support {n,m} pattern matching. 789.Pp 790The flags 791.Fl d , 792.Fl safe , 793and 794.Fl version 795as well as the commands 796.Cm fflush , compl , and , or , 797.Cm xor , lshift , rshift , 798are extensions to that specification. 799.Sh HISTORY 800An 801.Nm 802utility appeared in 803.At v7 . 804.Sh BUGS 805There are no explicit conversions between numbers and strings. 806To force an expression to be treated as a number add 0 to it; 807to force it to be treated as a string concatenate 808.Li \&"" 809to it. 810.Pp 811The scope rules for variables in functions are a botch; 812the syntax is worse. 813.Sh DEPRECATED BEHAVIOR 814One True Awk has accpeted 815.Fl F Ar t 816to mean the same as 817.Fl F Ar <TAB> 818to make it easier to specify tabs as the separator character. 819Upstream One True Awk has deprecated this wart in the name of better 820compatibility with other awk implementations like gawk and mawk. 821.Pp 822Historically, 823.Nm 824did not accept 825.Dq 0x 826as a hex string. 827However, since One True Awk used strtod to convert strings to floats, and since 828.Dq 0x12 829is a valid hexadecimal representation of a floating point number, 830On 831.Fx , 832.Nm 833has accepted this notation as an extension since One True Awk was imported in 834.Fx 5.0 . 835Upstream One True Awk has restored the historical behavior for better 836compatibility between the different awk implementations. 837Both gawk and mawk already behave similarly. 838Starting with 839.Fx 14.0 840.Nm 841will no longer accept this extension. 842.Pp 843The 844.Fx 845.Nm 846sets the locale for many years to match the environment it was running in. 847This lead to pattern ranges, like 848.Dq "[A-Z]" 849sometimes matching lower case characters in some locales. 850This misbehavior was never in upstream One True Awk and has been removed as a 851bug in 852.Fx 12.3 , 853.Fx 13.1 , 854and 855.Fx 14.0 . 856