1.\" $OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $ 2.\" 3.\" Copyright (C) Lucent Technologies 1997 4.\" All Rights Reserved 5.\" 6.\" Permission to use, copy, modify, and distribute this software and 7.\" its documentation for any purpose and without fee is hereby 8.\" granted, provided that the above copyright notice appear in all 9.\" copies and that both that the copyright notice and this 10.\" permission notice and warranty disclaimer appear in supporting 11.\" documentation, and that the name Lucent Technologies or any of 12.\" its entities not be used in advertising or publicity pertaining 13.\" to distribution of the software without specific, written prior 14.\" permission. 15.\" 16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 23.\" THIS SOFTWARE. 24.\" 25.\" $FreeBSD$ 26.Dd July 30, 2021 27.Dt AWK 1 28.Os 29.Sh NAME 30.Nm awk 31.Nd pattern-directed scanning and processing language 32.Sh SYNOPSIS 33.Nm awk 34.Op Fl safe 35.Op Fl version 36.Op Fl d Ns Op Ar n 37.Op Fl F Ar fs 38.Op Fl v Ar var Ns = Ns Ar value 39.Op Ar prog | Fl f Ar progfile 40.Ar 41.Sh DESCRIPTION 42.Nm 43scans each input 44.Ar file 45for lines that match any of a set of patterns specified literally in 46.Ar prog 47or in one or more files specified as 48.Fl f Ar progfile . 49With each pattern there can be an associated action that will be performed 50when a line of a 51.Ar file 52matches the pattern. 53Each line is matched against the 54pattern portion of every pattern-action statement; 55the associated action is performed for each matched pattern. 56The file name 57.Sq - 58means the standard input. 59Any 60.Ar file 61of the form 62.Ar var Ns = Ns Ar value 63is treated as an assignment, not a filename, 64and is executed at the time it would have been opened if it were a filename. 65.Pp 66The options are as follows: 67.Bl -tag -width "-safe " 68.It Fl d Ns Op Ar n 69Debug mode. 70Set debug level to 71.Ar n , 72or 1 if 73.Ar n 74is not specified. 75A value greater than 1 causes 76.Nm 77to dump core on fatal errors. 78.It Fl F Ar fs 79Define the input field separator to be the regular expression 80.Ar fs . 81.It Fl f Ar progfile 82Read program code from the specified file 83.Ar progfile 84instead of from the command line. 85.It Fl safe 86Disable file output 87.Pf ( Ic print No > , 88.Ic print No >> ) , 89process creation 90.Po 91.Ar cmd | Ic getline , 92.Ic print | , 93.Ic system 94.Pc 95and access to the environment 96.Pf ( Va ENVIRON ; 97see the section on variables below). 98This is a first 99.Pq and not very reliable 100approximation to a 101.Dq safe 102version of 103.Nm . 104.It Fl version 105Print the version number of 106.Nm 107to standard output and exit. 108.It Fl v Ar var Ns = Ns Ar value 109Assign 110.Ar value 111to variable 112.Ar var 113before 114.Ar prog 115is executed; 116any number of 117.Fl v 118options may be present. 119.El 120.Pp 121The input is normally made up of input lines 122.Pq records 123separated by newlines, or by the value of 124.Va RS . 125If 126.Va RS 127is null, then any number of blank lines are used as the record separator, 128and newlines are used as field separators 129(in addition to the value of 130.Va FS ) . 131This is convenient when working with multi-line records. 132.Pp 133An input line is normally made up of fields separated by whitespace, 134or by the extended regular expression 135.Va FS 136as described below. 137The fields are denoted 138.Va $1 , $2 , ... , 139while 140.Va $0 141refers to the entire line. 142If 143.Va FS 144is null, the input line is split into one field per character. 145While both gawk and mawk have the same behavior, it is unspecified in the 146.St -p1003.1-2008 147standard. 148If 149.Va FS 150is a single space, then leading and trailing blank and newline characters are 151skipped. 152Fields are delimited by one or more blank or newline characters. 153A blank character is a space or a tab. 154If 155.Va FS 156is a single character, other than space, fields are delimited by each single 157occurrence of that character. 158The 159.Va FS 160variable defaults to a single space. 161.Pp 162Normally, any number of blanks separate fields. 163In order to set the field separator to a single blank, use the 164.Fl F 165option with a value of 166.Sq [\ \&] . 167If a field separator of 168.Sq t 169is specified, 170.Nm 171treats it as if 172.Sq \et 173had been specified and uses 174.Aq TAB 175as the field separator. 176In order to use a literal 177.Sq t 178as the field separator, use the 179.Fl F 180option with a value of 181.Sq [t] . 182.Pp 183A pattern-action statement has the form 184.Pp 185.D1 Ar pattern Ic \&{ Ar action Ic \&} 186.Pp 187A missing 188.Ic \&{ Ar action Ic \&} 189means print the line; 190a missing pattern always matches. 191Pattern-action statements are separated by newlines or semicolons. 192.Pp 193Newlines are permitted after a terminating statement or following a comma 194.Pq Sq ,\& , 195an open brace 196.Pq Sq { , 197a logical AND 198.Pq Sq && , 199a logical OR 200.Pq Sq || , 201after the 202.Sq do 203or 204.Sq else 205keywords, 206or after the closing parenthesis of an 207.Sq if , 208.Sq for , 209or 210.Sq while 211statement. 212Additionally, a backslash 213.Pq Sq \e 214can be used to escape a newline between tokens. 215.Pp 216An action is a sequence of statements. 217A statement can be one of the following: 218.Pp 219.Bl -tag -width Ds -offset indent -compact 220.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement 221.It Ic while Ar ( expression ) Ar statement 222.It Ic for Ar ( expression ; expression ; expression ) statement 223.It Ic for Ar ( var Ic in Ar array ) statement 224.It Ic do Ar statement Ic while Ar ( expression ) 225.It Ic break 226.It Ic continue 227.It Xo Ic { 228.Op Ar statement ... 229.Ic } 230.Xc 231.It Xo Ar expression 232.No # commonly 233.Ar var No = Ar expression 234.Xc 235.It Xo Ic print 236.Op Ar expression-list 237.Op > Ns Ar expression 238.Xc 239.It Xo Ic printf Ar format 240.Op Ar ... , expression-list 241.Op > Ns Ar expression 242.Xc 243.It Ic return Op Ar expression 244.It Xo Ic next 245.No # skip remaining patterns on this input line 246.Xc 247.It Xo Ic nextfile 248.No # skip rest of this file, open next, start at top 249.Xc 250.It Xo Ic delete 251.Sm off 252.Ar array Ic \&[ Ar expression Ic \&] 253.Sm on 254.No # delete an array element 255.Xc 256.It Xo Ic delete Ar array 257.No # delete all elements of array 258.Xc 259.It Xo Ic exit 260.Op Ar expression 261.No # exit immediately; status is Ar expression 262.Xc 263.El 264.Pp 265Statements are terminated by 266semicolons, newlines or right braces. 267An empty 268.Ar expression-list 269stands for 270.Ar $0 . 271String constants are quoted 272.Li \&"" , 273with the usual C escapes recognized within 274(see 275.Xr printf 1 276for a complete list of these). 277Expressions take on string or numeric values as appropriate, 278and are built using the operators 279.Ic + \- * / % ^ 280.Pq exponentiation , 281and concatenation 282.Pq indicated by whitespace . 283The operators 284.Ic \&! ++ \-\- += \-= *= /= %= ^= 285.Ic > >= < <= == != ?\&: 286are also available in expressions. 287Variables may be scalars, array elements 288(denoted 289.Li x[i] ) 290or fields. 291Variables are initialized to the null string. 292Array subscripts may be any string, 293not necessarily numeric; 294this allows for a form of associative memory. 295Multiple subscripts such as 296.Li [i,j,k] 297are permitted; the constituents are concatenated, 298separated by the value of 299.Va SUBSEP 300.Pq see the section on variables below . 301.Pp 302The 303.Ic print 304statement prints its arguments on the standard output 305(or on a file if 306.Pf > Ar file 307or 308.Pf >> Ar file 309is present or on a pipe if 310.Pf |\ \& Ar cmd 311is present), separated by the current output field separator, 312and terminated by the output record separator. 313.Ar file 314and 315.Ar cmd 316may be literal names or parenthesized expressions; 317identical string values in different statements denote 318the same open file. 319The 320.Ic printf 321statement formats its expression list according to the format 322(see 323.Xr printf 1 ) . 324.Pp 325Patterns are arbitrary Boolean combinations 326(with 327.Ic "\&! || &&" ) 328of regular expressions and 329relational expressions. 330.Nm 331supports extended regular expressions 332.Pq EREs . 333See 334.Xr re_format 7 335for more information on regular expressions. 336Isolated regular expressions 337in a pattern apply to the entire line. 338Regular expressions may also occur in 339relational expressions, using the operators 340.Ic ~ 341and 342.Ic !~ . 343.Pf / Ar re Ns / 344is a constant regular expression; 345any string (constant or variable) may be used 346as a regular expression, except in the position of an isolated regular expression 347in a pattern. 348.Pp 349A pattern may consist of two patterns separated by a comma; 350in this case, the action is performed for all lines 351from an occurrence of the first pattern 352through an occurrence of the second. 353.Pp 354A relational expression is one of the following: 355.Pp 356.Bl -tag -width Ds -offset indent -compact 357.It Ar expression matchop regular-expression 358.It Ar expression relop expression 359.It Ar expression Ic in Ar array-name 360.It Xo Ic \&( Ns 361.Ar expr , expr , \&... Ns Ic \&) in 362.Ar array-name 363.Xc 364.El 365.Pp 366where a 367.Ar relop 368is any of the six relational operators in C, and a 369.Ar matchop 370is either 371.Ic ~ 372(matches) 373or 374.Ic !~ 375(does not match). 376A conditional is an arithmetic expression, 377a relational expression, 378or a Boolean combination 379of these. 380.Pp 381The special patterns 382.Ic BEGIN 383and 384.Ic END 385may be used to capture control before the first input line is read 386and after the last. 387.Ic BEGIN 388and 389.Ic END 390do not combine with other patterns. 391.Pp 392Variable names with special meanings: 393.Pp 394.Bl -tag -width "FILENAME " -compact 395.It Va ARGC 396Argument count, assignable. 397.It Va ARGV 398Argument array, assignable; 399non-null members are taken as filenames. 400.It Va CONVFMT 401Conversion format when converting numbers 402(default 403.Qq Li %.6g ) . 404.It Va ENVIRON 405Array of environment variables; subscripts are names. 406.It Va FILENAME 407The name of the current input file. 408.It Va FNR 409Ordinal number of the current record in the current file. 410.It Va FS 411Regular expression used to separate fields; also settable 412by option 413.Fl F Ar fs . 414.It Va NF 415Number of fields in the current record. 416.Va $NF 417can be used to obtain the value of the last field in the current record. 418.It Va NR 419Ordinal number of the current record. 420.It Va OFMT 421Output format for numbers (default 422.Qq Li %.6g ) . 423.It Va OFS 424Output field separator (default blank). 425.It Va ORS 426Output record separator (default newline). 427.It Va RLENGTH 428The length of the string matched by the 429.Fn match 430function. 431.It Va RS 432Input record separator (default newline). 433.It Va RSTART 434The starting position of the string matched by the 435.Fn match 436function. 437.It Va SUBSEP 438Separates multiple subscripts (default 034). 439.El 440.Sh FUNCTIONS 441The awk language has a variety of built-in functions: 442arithmetic, string, input/output, general, and bit-operation. 443.Pp 444Functions may be defined (at the position of a pattern-action statement) 445thusly: 446.Pp 447.Dl function foo(a, b, c) { ...; return x } 448.Pp 449Parameters are passed by value if scalar, and by reference if array name; 450functions may be called recursively. 451Parameters are local to the function; all other variables are global. 452Thus local variables may be created by providing excess parameters in 453the function definition. 454.Ss Arithmetic Functions 455.Bl -tag -width "atan2(y, x)" 456.It Fn atan2 y x 457Return the arctangent of 458.Fa y Ns / Ns Fa x 459in radians. 460.It Fn cos x 461Return the cosine of 462.Fa x , 463where 464.Fa x 465is in radians. 466.It Fn exp x 467Return the exponential of 468.Fa x . 469.It Fn int x 470Return 471.Fa x 472truncated to an integer value. 473.It Fn log x 474Return the natural logarithm of 475.Fa x . 476.It Fn rand 477Return a random number, 478.Fa n , 479such that 480.Sm off 481.Pf 0 \*(Le Fa n No \*(Lt 1 . 482.Sm on 483.It Fn sin x 484Return the sine of 485.Fa x , 486where 487.Fa x 488is in radians. 489.It Fn sqrt x 490Return the square root of 491.Fa x . 492.It Fn srand expr 493Sets seed for 494.Fn rand 495to 496.Fa expr 497and returns the previous seed. 498If 499.Fa expr 500is omitted, the time of day is used instead. 501.El 502.Ss String Functions 503.Bl -tag -width "split(s, a, fs)" 504.It Fn gsub r t s 505The same as 506.Fn sub 507except that all occurrences of the regular expression are replaced. 508.Fn gsub 509returns the number of replacements. 510.It Fn index s t 511The position in 512.Fa s 513where the string 514.Fa t 515occurs, or 0 if it does not. 516.It Fn length s 517The length of 518.Fa s 519taken as a string, 520or of 521.Va $0 522if no argument is given. 523.It Fn match s r 524The position in 525.Fa s 526where the regular expression 527.Fa r 528occurs, or 0 if it does not. 529The variable 530.Va RSTART 531is set to the starting position of the matched string 532.Pq which is the same as the returned value 533or zero if no match is found. 534The variable 535.Va RLENGTH 536is set to the length of the matched string, 537or \-1 if no match is found. 538.It Fn split s a fs 539Splits the string 540.Fa s 541into array elements 542.Va a[1] , a[2] , ... , a[n] 543and returns 544.Va n . 545The separation is done with the regular expression 546.Ar fs 547or with the field separator 548.Va FS 549if 550.Ar fs 551is not given. 552An empty string as field separator splits the string 553into one array element per character. 554.It Fn sprintf fmt expr ... 555The string resulting from formatting 556.Fa expr , ... 557according to the 558.Xr printf 1 559format 560.Fa fmt . 561.It Fn sub r t s 562Substitutes 563.Fa t 564for the first occurrence of the regular expression 565.Fa r 566in the string 567.Fa s . 568If 569.Fa s 570is not given, 571.Va $0 572is used. 573An ampersand 574.Pq Sq & 575in 576.Fa t 577is replaced in string 578.Fa s 579with regular expression 580.Fa r . 581A literal ampersand can be specified by preceding it with two backslashes 582.Pq Sq \e\e . 583A literal backslash can be specified by preceding it with another backslash 584.Pq Sq \e\e . 585.Fn sub 586returns the number of replacements. 587.It Fn substr s m n 588Return at most the 589.Fa n Ns -character 590substring of 591.Fa s 592that begins at position 593.Fa m 594counted from 1. 595If 596.Fa n 597is omitted, or if 598.Fa n 599specifies more characters than are left in the string, 600the length of the substring is limited by the length of 601.Fa s . 602.It Fn tolower str 603Returns a copy of 604.Fa str 605with all upper-case characters translated to their 606corresponding lower-case equivalents. 607.It Fn toupper str 608Returns a copy of 609.Fa str 610with all lower-case characters translated to their 611corresponding upper-case equivalents. 612.El 613.Ss Input/Output and General Functions 614.Bl -tag -width "getline [var] < file" 615.It Fn close expr 616Closes the file or pipe 617.Fa expr . 618.Fa expr 619should match the string that was used to open the file or pipe. 620.It Ar cmd | Ic getline Op Va var 621Read a record of input from a stream piped from the output of 622.Ar cmd . 623If 624.Va var 625is omitted, the variables 626.Va $0 627and 628.Va NF 629are set. 630Otherwise 631.Va var 632is set. 633If the stream is not open, it is opened. 634As long as the stream remains open, subsequent calls 635will read subsequent records from the stream. 636The stream remains open until explicitly closed with a call to 637.Fn close . 638.Ic getline 639returns 1 for a successful input, 0 for end of file, and \-1 for an error. 640.It Fn fflush [expr] 641Flushes any buffered output for the file or pipe 642.Fa expr , 643or all open files or pipes if 644.Fa expr 645is omitted. 646.Fa expr 647should match the string that was used to open the file or pipe. 648.It Ic getline 649Sets 650.Va $0 651to the next input record from the current input file. 652This form of 653.Ic getline 654sets the variables 655.Va NF , 656.Va NR , 657and 658.Va FNR . 659.Ic getline 660returns 1 for a successful input, 0 for end of file, and \-1 for an error. 661.It Ic getline Va var 662Sets 663.Va $0 664to variable 665.Va var . 666This form of 667.Ic getline 668sets the variables 669.Va NR 670and 671.Va FNR . 672.Ic getline 673returns 1 for a successful input, 0 for end of file, and \-1 for an error. 674.It Xo 675.Ic getline Op Va var 676.Pf \ \&< Ar file 677.Xc 678Sets 679.Va $0 680to the next record from 681.Ar file . 682If 683.Va var 684is omitted, the variables 685.Va $0 686and 687.Va NF 688are set. 689Otherwise 690.Va var 691is set. 692If 693.Ar file 694is not open, it is opened. 695As long as the stream remains open, subsequent calls will read subsequent 696records from 697.Ar file . 698.Ar file 699remains open until explicitly closed with a call to 700.Fn close . 701.It Fn system cmd 702Executes 703.Fa cmd 704and returns its exit status. 705.El 706.Ss Bit-Operation Functions 707.Bl -tag -width "lshift(a, b)" 708.It Fn compl x 709Returns the bitwise complement of integer argument x. 710.It Fn and v1 v2 ... 711Performs a bitwise AND on all arguments provided, as integers. 712There must be at least two values. 713.It Fn or v1 v2 ... 714Performs a bitwise OR on all arguments provided, as integers. 715There must be at least two values. 716.It Fn xor v1 v2 ... 717Performs a bitwise Exclusive-OR on all arguments provided, as integers. 718There must be at least two values. 719.It Fn lshift x n 720Returns integer argument x shifted by n bits to the left. 721.It Fn rshift x n 722Returns integer argument x shifted by n bits to the right. 723.El 724.Sh EXIT STATUS 725.Ex -std awk 726.Pp 727But note that the 728.Ic exit 729expression can modify the exit status. 730.Sh EXAMPLES 731Print lines longer than 72 characters: 732.Pp 733.Dl length($0) > 72 734.Pp 735Print first two fields in opposite order: 736.Pp 737.Dl { print $2, $1 } 738.Pp 739Same, with input fields separated by comma and/or blanks and tabs: 740.Bd -literal -offset indent 741BEGIN { FS = ",[ \et]*|[ \et]+" } 742 { print $2, $1 } 743.Ed 744.Pp 745Add up first column, print sum and average: 746.Bd -literal -offset indent 747{ s += $1 } 748END { print "sum is", s, " average is", s/NR } 749.Ed 750.Pp 751Print all lines between start/stop pairs: 752.Pp 753.Dl /start/, /stop/ 754.Pp 755Simulate echo(1): 756.Bd -literal -offset indent 757BEGIN { # Simulate echo(1) 758 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 759 printf "\en" 760 exit } 761.Ed 762.Pp 763Print an error message to standard error: 764.Bd -literal -offset indent 765{ print "error!" > "/dev/stderr" } 766.Ed 767.Sh SEE ALSO 768.Xr cut 1 , 769.Xr lex 1 , 770.Xr printf 1 , 771.Xr sed 1 , 772.Xr re_format 7 773.Rs 774.%A A. V. Aho 775.%A B. W. Kernighan 776.%A P. J. Weinberger 777.%T The AWK Programming Language 778.%I Addison-Wesley 779.%D 1988 780.%O ISBN 0-201-07981-X 781.Re 782.Sh STANDARDS 783The 784.Nm 785utility is compliant with the 786.St -p1003.1-2008 787specification, 788except 789.Nm 790does not support {n,m} pattern matching. 791.Pp 792The flags 793.Fl d , 794.Fl safe , 795and 796.Fl version 797as well as the commands 798.Cm fflush , compl , and , or , 799.Cm xor , lshift , rshift , 800are extensions to that specification. 801.Sh HISTORY 802An 803.Nm 804utility appeared in 805.At v7 . 806.Sh BUGS 807There are no explicit conversions between numbers and strings. 808To force an expression to be treated as a number add 0 to it; 809to force it to be treated as a string concatenate 810.Li \&"" 811to it. 812.Pp 813The scope rules for variables in functions are a botch; 814the syntax is worse. 815.Sh DEPRECATED BEHAVIOR 816One True Awk has accpeted 817.Fl F Ar t 818to mean the same as 819.Fl F Ar <TAB> 820to make it easier to specify tabs as the separator character. 821Upstream One True Awk has deprecated this wart in the name of better 822compatibility with other awk implementations like gawk and mawk. 823.Pp 824Historically, 825.Nm 826did not accept 827.Dq 0x 828as a hex string. 829However, since One True Awk used strtod to convert strings to floats, and since 830.Dq 0x12 831is a valid hexadecimal representation of a floating point number, 832On 833.Fx , 834.Nm 835has accepted this notation as an extension since One True Awk was imported in 836.Fx 5.0 . 837Upstream One True Awk has restored the historical behavior for better 838compatibility between the different awk implementations. 839Both gawk and mawk already behave similarly. 840Starting with 841.Fx 14.0 842.Nm 843will no longer accept this extension. 844.Pp 845The 846.Fx 847.Nm 848sets the locale for many years to match the environment it was running in. 849This lead to pattern ranges, like 850.Dq "[A-Z]" 851sometimes matching lower case characters in some locales. 852This misbehavior was never in upstream One True Awk and has been removed as a 853bug in 854.Fx 12.3 , 855.Fx 13.1 , 856and 857.Fx 14.0 . 858