1.\" $OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $ 2.\" 3.\" Copyright (C) Lucent Technologies 1997 4.\" All Rights Reserved 5.\" 6.\" Permission to use, copy, modify, and distribute this software and 7.\" its documentation for any purpose and without fee is hereby 8.\" granted, provided that the above copyright notice appear in all 9.\" copies and that both that the copyright notice and this 10.\" permission notice and warranty disclaimer appear in supporting 11.\" documentation, and that the name Lucent Technologies or any of 12.\" its entities not be used in advertising or publicity pertaining 13.\" to distribution of the software without specific, written prior 14.\" permission. 15.\" 16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 23.\" THIS SOFTWARE. 24.\" 25.\" $FreeBSD$ 26.Dd $Mdocdate: June 6 2020 $ 27.Dt AWK 1 28.Os 29.Sh NAME 30.Nm awk 31.Nd pattern-directed scanning and processing language 32.Sh SYNOPSIS 33.Nm awk 34.Op Fl safe 35.Op Fl version 36.Op Fl d Ns Op Ar n 37.Op Fl F Ar fs 38.Op Fl v Ar var Ns = Ns Ar value 39.Op Ar prog | Fl f Ar progfile 40.Ar 41.Sh DESCRIPTION 42.Nm 43scans each input 44.Ar file 45for lines that match any of a set of patterns specified literally in 46.Ar prog 47or in one or more files specified as 48.Fl f Ar progfile . 49With each pattern there can be an associated action that will be performed 50when a line of a 51.Ar file 52matches the pattern. 53Each line is matched against the 54pattern portion of every pattern-action statement; 55the associated action is performed for each matched pattern. 56The file name 57.Sq - 58means the standard input. 59Any 60.Ar file 61of the form 62.Ar var Ns = Ns Ar value 63is treated as an assignment, not a filename, 64and is executed at the time it would have been opened if it were a filename. 65.Pp 66The options are as follows: 67.Bl -tag -width "-safe " 68.It Fl d Ns Op Ar n 69Debug mode. 70Set debug level to 71.Ar n , 72or 1 if 73.Ar n 74is not specified. 75A value greater than 1 causes 76.Nm 77to dump core on fatal errors. 78.It Fl F Ar fs 79Define the input field separator to be the regular expression 80.Ar fs . 81.It Fl f Ar progfile 82Read program code from the specified file 83.Ar progfile 84instead of from the command line. 85.It Fl safe 86Disable file output 87.Pf ( Ic print No > , 88.Ic print No >> ) , 89process creation 90.Po 91.Ar cmd | Ic getline , 92.Ic print | , 93.Ic system 94.Pc 95and access to the environment 96.Pf ( Va ENVIRON ; 97see the section on variables below). 98This is a first 99.Pq and not very reliable 100approximation to a 101.Dq safe 102version of 103.Nm . 104.It Fl version 105Print the version number of 106.Nm 107to standard output and exit. 108.It Fl v Ar var Ns = Ns Ar value 109Assign 110.Ar value 111to variable 112.Ar var 113before 114.Ar prog 115is executed; 116any number of 117.Fl v 118options may be present. 119.El 120.Pp 121The input is normally made up of input lines 122.Pq records 123separated by newlines, or by the value of 124.Va RS . 125If 126.Va RS 127is null, then any number of blank lines are used as the record separator, 128and newlines are used as field separators 129(in addition to the value of 130.Va FS ) . 131This is convenient when working with multi-line records. 132.Pp 133An input line is normally made up of fields separated by whitespace, 134or by the regular expression 135.Va FS . 136The fields are denoted 137.Va $1 , $2 , ... , 138while 139.Va $0 140refers to the entire line. 141If 142.Va FS 143is null, the input line is split into one field per character. 144.Pp 145Normally, any number of blanks separate fields. 146In order to set the field separator to a single blank, use the 147.Fl F 148option with a value of 149.Sq [\ \&] . 150If a field separator of 151.Sq t 152is specified, 153.Nm 154treats it as if 155.Sq \et 156had been specified and uses 157.Aq TAB 158as the field separator. 159In order to use a literal 160.Sq t 161as the field separator, use the 162.Fl F 163option with a value of 164.Sq [t] . 165.Pp 166A pattern-action statement has the form 167.Pp 168.D1 Ar pattern Ic \&{ Ar action Ic \&} 169.Pp 170A missing 171.Ic \&{ Ar action Ic \&} 172means print the line; 173a missing pattern always matches. 174Pattern-action statements are separated by newlines or semicolons. 175.Pp 176Newlines are permitted after a terminating statement or following a comma 177.Pq Sq ,\& , 178an open brace 179.Pq Sq { , 180a logical AND 181.Pq Sq && , 182a logical OR 183.Pq Sq || , 184after the 185.Sq do 186or 187.Sq else 188keywords, 189or after the closing parenthesis of an 190.Sq if , 191.Sq for , 192or 193.Sq while 194statement. 195Additionally, a backslash 196.Pq Sq \e 197can be used to escape a newline between tokens. 198.Pp 199An action is a sequence of statements. 200A statement can be one of the following: 201.Pp 202.Bl -tag -width Ds -offset indent -compact 203.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement 204.It Ic while Ar ( expression ) Ar statement 205.It Ic for Ar ( expression ; expression ; expression ) statement 206.It Ic for Ar ( var Ic in Ar array ) statement 207.It Ic do Ar statement Ic while Ar ( expression ) 208.It Ic break 209.It Ic continue 210.It Xo Ic { 211.Op Ar statement ... 212.Ic } 213.Xc 214.It Xo Ar expression 215.No # commonly 216.Ar var No = Ar expression 217.Xc 218.It Xo Ic print 219.Op Ar expression-list 220.Op > Ns Ar expression 221.Xc 222.It Xo Ic printf Ar format 223.Op Ar ... , expression-list 224.Op > Ns Ar expression 225.Xc 226.It Ic return Op Ar expression 227.It Xo Ic next 228.No # skip remaining patterns on this input line 229.Xc 230.It Xo Ic nextfile 231.No # skip rest of this file, open next, start at top 232.Xc 233.It Xo Ic delete 234.Sm off 235.Ar array Ic \&[ Ar expression Ic \&] 236.Sm on 237.No # delete an array element 238.Xc 239.It Xo Ic delete Ar array 240.No # delete all elements of array 241.Xc 242.It Xo Ic exit 243.Op Ar expression 244.No # exit immediately; status is Ar expression 245.Xc 246.El 247.Pp 248Statements are terminated by 249semicolons, newlines or right braces. 250An empty 251.Ar expression-list 252stands for 253.Ar $0 . 254String constants are quoted 255.Li \&"" , 256with the usual C escapes recognized within 257(see 258.Xr printf 1 259for a complete list of these). 260Expressions take on string or numeric values as appropriate, 261and are built using the operators 262.Ic + \- * / % ^ 263.Pq exponentiation , 264and concatenation 265.Pq indicated by whitespace . 266The operators 267.Ic \&! ++ \-\- += \-= *= /= %= ^= 268.Ic > >= < <= == != ?\&: 269are also available in expressions. 270Variables may be scalars, array elements 271(denoted 272.Li x[i] ) 273or fields. 274Variables are initialized to the null string. 275Array subscripts may be any string, 276not necessarily numeric; 277this allows for a form of associative memory. 278Multiple subscripts such as 279.Li [i,j,k] 280are permitted; the constituents are concatenated, 281separated by the value of 282.Va SUBSEP 283.Pq see the section on variables below . 284.Pp 285The 286.Ic print 287statement prints its arguments on the standard output 288(or on a file if 289.Pf > Ar file 290or 291.Pf >> Ar file 292is present or on a pipe if 293.Pf |\ \& Ar cmd 294is present), separated by the current output field separator, 295and terminated by the output record separator. 296.Ar file 297and 298.Ar cmd 299may be literal names or parenthesized expressions; 300identical string values in different statements denote 301the same open file. 302The 303.Ic printf 304statement formats its expression list according to the format 305(see 306.Xr printf 1 ) . 307.Pp 308Patterns are arbitrary Boolean combinations 309(with 310.Ic "\&! || &&" ) 311of regular expressions and 312relational expressions. 313.Nm 314supports extended regular expressions 315.Pq EREs . 316See 317.Xr re_format 7 318for more information on regular expressions. 319Isolated regular expressions 320in a pattern apply to the entire line. 321Regular expressions may also occur in 322relational expressions, using the operators 323.Ic ~ 324and 325.Ic !~ . 326.Pf / Ar re Ns / 327is a constant regular expression; 328any string (constant or variable) may be used 329as a regular expression, except in the position of an isolated regular expression 330in a pattern. 331.Pp 332A pattern may consist of two patterns separated by a comma; 333in this case, the action is performed for all lines 334from an occurrence of the first pattern 335through an occurrence of the second. 336.Pp 337A relational expression is one of the following: 338.Pp 339.Bl -tag -width Ds -offset indent -compact 340.It Ar expression matchop regular-expression 341.It Ar expression relop expression 342.It Ar expression Ic in Ar array-name 343.It Xo Ic \&( Ns 344.Ar expr , expr , \&... Ns Ic \&) in 345.Ar array-name 346.Xc 347.El 348.Pp 349where a 350.Ar relop 351is any of the six relational operators in C, and a 352.Ar matchop 353is either 354.Ic ~ 355(matches) 356or 357.Ic !~ 358(does not match). 359A conditional is an arithmetic expression, 360a relational expression, 361or a Boolean combination 362of these. 363.Pp 364The special patterns 365.Ic BEGIN 366and 367.Ic END 368may be used to capture control before the first input line is read 369and after the last. 370.Ic BEGIN 371and 372.Ic END 373do not combine with other patterns. 374.Pp 375Variable names with special meanings: 376.Pp 377.Bl -tag -width "FILENAME " -compact 378.It Va ARGC 379Argument count, assignable. 380.It Va ARGV 381Argument array, assignable; 382non-null members are taken as filenames. 383.It Va CONVFMT 384Conversion format when converting numbers 385(default 386.Qq Li %.6g ) . 387.It Va ENVIRON 388Array of environment variables; subscripts are names. 389.It Va FILENAME 390The name of the current input file. 391.It Va FNR 392Ordinal number of the current record in the current file. 393.It Va FS 394Regular expression used to separate fields; also settable 395by option 396.Fl F Ar fs . 397.It Va NF 398Number of fields in the current record. 399.Va $NF 400can be used to obtain the value of the last field in the current record. 401.It Va NR 402Ordinal number of the current record. 403.It Va OFMT 404Output format for numbers (default 405.Qq Li %.6g ) . 406.It Va OFS 407Output field separator (default blank). 408.It Va ORS 409Output record separator (default newline). 410.It Va RLENGTH 411The length of the string matched by the 412.Fn match 413function. 414.It Va RS 415Input record separator (default newline). 416.It Va RSTART 417The starting position of the string matched by the 418.Fn match 419function. 420.It Va SUBSEP 421Separates multiple subscripts (default 034). 422.El 423.Sh FUNCTIONS 424The awk language has a variety of built-in functions: 425arithmetic, string, input/output, general, and bit-operation. 426.Pp 427Functions may be defined (at the position of a pattern-action statement) 428thusly: 429.Pp 430.Dl function foo(a, b, c) { ...; return x } 431.Pp 432Parameters are passed by value if scalar, and by reference if array name; 433functions may be called recursively. 434Parameters are local to the function; all other variables are global. 435Thus local variables may be created by providing excess parameters in 436the function definition. 437.Ss Arithmetic Functions 438.Bl -tag -width "atan2(y, x)" 439.It Fn atan2 y x 440Return the arctangent of 441.Fa y Ns / Ns Fa x 442in radians. 443.It Fn cos x 444Return the cosine of 445.Fa x , 446where 447.Fa x 448is in radians. 449.It Fn exp x 450Return the exponential of 451.Fa x . 452.It Fn int x 453Return 454.Fa x 455truncated to an integer value. 456.It Fn log x 457Return the natural logarithm of 458.Fa x . 459.It Fn rand 460Return a random number, 461.Fa n , 462such that 463.Sm off 464.Pf 0 \*(Le Fa n No \*(Lt 1 . 465.Sm on 466.It Fn sin x 467Return the sine of 468.Fa x , 469where 470.Fa x 471is in radians. 472.It Fn sqrt x 473Return the square root of 474.Fa x . 475.It Fn srand expr 476Sets seed for 477.Fn rand 478to 479.Fa expr 480and returns the previous seed. 481If 482.Fa expr 483is omitted, the time of day is used instead. 484.El 485.Ss String Functions 486.Bl -tag -width "split(s, a, fs)" 487.It Fn gsub r t s 488The same as 489.Fn sub 490except that all occurrences of the regular expression are replaced. 491.Fn gsub 492returns the number of replacements. 493.It Fn index s t 494The position in 495.Fa s 496where the string 497.Fa t 498occurs, or 0 if it does not. 499.It Fn length s 500The length of 501.Fa s 502taken as a string, 503or of 504.Va $0 505if no argument is given. 506.It Fn match s r 507The position in 508.Fa s 509where the regular expression 510.Fa r 511occurs, or 0 if it does not. 512The variable 513.Va RSTART 514is set to the starting position of the matched string 515.Pq which is the same as the returned value 516or zero if no match is found. 517The variable 518.Va RLENGTH 519is set to the length of the matched string, 520or \-1 if no match is found. 521.It Fn split s a fs 522Splits the string 523.Fa s 524into array elements 525.Va a[1] , a[2] , ... , a[n] 526and returns 527.Va n . 528The separation is done with the regular expression 529.Ar fs 530or with the field separator 531.Va FS 532if 533.Ar fs 534is not given. 535An empty string as field separator splits the string 536into one array element per character. 537.It Fn sprintf fmt expr ... 538The string resulting from formatting 539.Fa expr , ... 540according to the 541.Xr printf 1 542format 543.Fa fmt . 544.It Fn sub r t s 545Substitutes 546.Fa t 547for the first occurrence of the regular expression 548.Fa r 549in the string 550.Fa s . 551If 552.Fa s 553is not given, 554.Va $0 555is used. 556An ampersand 557.Pq Sq & 558in 559.Fa t 560is replaced in string 561.Fa s 562with regular expression 563.Fa r . 564A literal ampersand can be specified by preceding it with two backslashes 565.Pq Sq \e\e . 566A literal backslash can be specified by preceding it with another backslash 567.Pq Sq \e\e . 568.Fn sub 569returns the number of replacements. 570.It Fn substr s m n 571Return at most the 572.Fa n Ns -character 573substring of 574.Fa s 575that begins at position 576.Fa m 577counted from 1. 578If 579.Fa n 580is omitted, or if 581.Fa n 582specifies more characters than are left in the string, 583the length of the substring is limited by the length of 584.Fa s . 585.It Fn tolower str 586Returns a copy of 587.Fa str 588with all upper-case characters translated to their 589corresponding lower-case equivalents. 590.It Fn toupper str 591Returns a copy of 592.Fa str 593with all lower-case characters translated to their 594corresponding upper-case equivalents. 595.El 596.Ss Input/Output and General Functions 597.Bl -tag -width "getline [var] < file" 598.It Fn close expr 599Closes the file or pipe 600.Fa expr . 601.Fa expr 602should match the string that was used to open the file or pipe. 603.It Ar cmd | Ic getline Op Va var 604Read a record of input from a stream piped from the output of 605.Ar cmd . 606If 607.Va var 608is omitted, the variables 609.Va $0 610and 611.Va NF 612are set. 613Otherwise 614.Va var 615is set. 616If the stream is not open, it is opened. 617As long as the stream remains open, subsequent calls 618will read subsequent records from the stream. 619The stream remains open until explicitly closed with a call to 620.Fn close . 621.Ic getline 622returns 1 for a successful input, 0 for end of file, and \-1 for an error. 623.It Fn fflush [expr] 624Flushes any buffered output for the file or pipe 625.Fa expr , 626or all open files or pipes if 627.Fa expr 628is omitted. 629.Fa expr 630should match the string that was used to open the file or pipe. 631.It Ic getline 632Sets 633.Va $0 634to the next input record from the current input file. 635This form of 636.Ic getline 637sets the variables 638.Va NF , 639.Va NR , 640and 641.Va FNR . 642.Ic getline 643returns 1 for a successful input, 0 for end of file, and \-1 for an error. 644.It Ic getline Va var 645Sets 646.Va $0 647to variable 648.Va var . 649This form of 650.Ic getline 651sets the variables 652.Va NR 653and 654.Va FNR . 655.Ic getline 656returns 1 for a successful input, 0 for end of file, and \-1 for an error. 657.It Xo 658.Ic getline Op Va var 659.Pf \ \&< Ar file 660.Xc 661Sets 662.Va $0 663to the next record from 664.Ar file . 665If 666.Va var 667is omitted, the variables 668.Va $0 669and 670.Va NF 671are set. 672Otherwise 673.Va var 674is set. 675If 676.Ar file 677is not open, it is opened. 678As long as the stream remains open, subsequent calls will read subsequent 679records from 680.Ar file . 681.Ar file 682remains open until explicitly closed with a call to 683.Fn close . 684.It Fn system cmd 685Executes 686.Fa cmd 687and returns its exit status. 688.El 689.Ss Bit-Operation Functions 690.Bl -tag -width "lshift(a, b)" 691.It Fn compl x 692Returns the bitwise complement of integer argument x. 693.It Fn and v1 v2 ... 694Performs a bitwise AND on all arguments provided, as integers. 695There must be at least two values. 696.It Fn or v1 v2 ... 697Performs a bitwise OR on all arguments provided, as integers. 698There must be at least two values. 699.It Fn xor v1 v2 ... 700Performs a bitwise Exclusive-OR on all arguments provided, as integers. 701There must be at least two values. 702.It Fn lshift x n 703Returns integer argument x shifted by n bits to the left. 704.It Fn rshift x n 705Returns integer argument x shifted by n bits to the right. 706.El 707.Sh EXIT STATUS 708.Ex -std awk 709.Pp 710But note that the 711.Ic exit 712expression can modify the exit status. 713.Sh EXAMPLES 714Print lines longer than 72 characters: 715.Pp 716.Dl length($0) > 72 717.Pp 718Print first two fields in opposite order: 719.Pp 720.Dl { print $2, $1 } 721.Pp 722Same, with input fields separated by comma and/or blanks and tabs: 723.Bd -literal -offset indent 724BEGIN { FS = ",[ \et]*|[ \et]+" } 725 { print $2, $1 } 726.Ed 727.Pp 728Add up first column, print sum and average: 729.Bd -literal -offset indent 730{ s += $1 } 731END { print "sum is", s, " average is", s/NR } 732.Ed 733.Pp 734Print all lines between start/stop pairs: 735.Pp 736.Dl /start/, /stop/ 737.Pp 738Simulate echo(1): 739.Bd -literal -offset indent 740BEGIN { # Simulate echo(1) 741 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 742 printf "\en" 743 exit } 744.Ed 745.Pp 746Print an error message to standard error: 747.Bd -literal -offset indent 748{ print "error!" > "/dev/stderr" } 749.Ed 750.Sh SEE ALSO 751.Xr cut 1 , 752.Xr lex 1 , 753.Xr printf 1 , 754.Xr sed 1 , 755.Xr re_format 7 756.Rs 757.%A A. V. Aho 758.%A B. W. Kernighan 759.%A P. J. Weinberger 760.%T The AWK Programming Language 761.%I Addison-Wesley 762.%D 1988 763.%O ISBN 0-201-07981-X 764.Re 765.Sh STANDARDS 766The 767.Nm 768utility is compliant with the 769.St -p1003.1-2008 770specification, 771except 772.Nm 773does not support {n,m} pattern matching. 774.Pp 775The flags 776.Fl d , 777.Fl safe , 778and 779.Fl version 780as well as the commands 781.Cm fflush , compl , and , or , 782.Cm xor , lshift , rshift , 783are extensions to that specification. 784.Sh HISTORY 785An 786.Nm 787utility appeared in 788.At v7 . 789.Sh BUGS 790There are no explicit conversions between numbers and strings. 791To force an expression to be treated as a number add 0 to it; 792to force it to be treated as a string concatenate 793.Li \&"" 794to it. 795.Pp 796The scope rules for variables in functions are a botch; 797the syntax is worse. 798