xref: /freebsd/usr.bin/awk/awk.1 (revision 76afb20c58adb296f09857aed214b91464242264)
1.\"	$OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $
2.\"
3.\" Copyright (C) Lucent Technologies 1997
4.\" All Rights Reserved
5.\"
6.\" Permission to use, copy, modify, and distribute this software and
7.\" its documentation for any purpose and without fee is hereby
8.\" granted, provided that the above copyright notice appear in all
9.\" copies and that both that the copyright notice and this
10.\" permission notice and warranty disclaimer appear in supporting
11.\" documentation, and that the name Lucent Technologies or any of
12.\" its entities not be used in advertising or publicity pertaining
13.\" to distribution of the software without specific, written prior
14.\" permission.
15.\"
16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
23.\" THIS SOFTWARE.
24.\"
25.\"	$FreeBSD$
26.Dd $Mdocdate: September 14 2015 $
27.Dt AWK 1
28.Os
29.Sh NAME
30.Nm awk
31.Nd pattern-directed scanning and processing language
32.Sh SYNOPSIS
33.Nm awk
34.Op Fl safe
35.Op Fl V
36.Op Fl d Ns Op Ar n
37.Op Fl F Ar fs
38.Op Fl v Ar var Ns = Ns Ar value
39.Op Ar prog | Fl f Ar progfile
40.Ar
41.Sh DESCRIPTION
42.Nm
43scans each input
44.Ar file
45for lines that match any of a set of patterns specified literally in
46.Ar prog
47or in one or more files specified as
48.Fl f Ar progfile .
49With each pattern there can be an associated action that will be performed
50when a line of a
51.Ar file
52matches the pattern.
53Each line is matched against the
54pattern portion of every pattern-action statement;
55the associated action is performed for each matched pattern.
56The file name
57.Sq -
58means the standard input.
59Any
60.Ar file
61of the form
62.Ar var Ns = Ns Ar value
63is treated as an assignment, not a filename,
64and is executed at the time it would have been opened if it were a filename.
65.Pp
66The options are as follows:
67.Bl -tag -width "-safe "
68.It Fl d Ns Op Ar n
69Debug mode.
70Set debug level to
71.Ar n ,
72or 1 if
73.Ar n
74is not specified.
75A value greater than 1 causes
76.Nm
77to dump core on fatal errors.
78.It Fl F Ar fs
79Define the input field separator to be the regular expression
80.Ar fs .
81.It Fl f Ar progfile
82Read program code from the specified file
83.Ar progfile
84instead of from the command line.
85.It Fl safe
86Disable file output
87.Pf ( Ic print No > ,
88.Ic print No >> ) ,
89process creation
90.Po
91.Ar cmd | Ic getline ,
92.Ic print | ,
93.Ic system
94.Pc
95and access to the environment
96.Pf ( Va ENVIRON ;
97see the section on variables below).
98This is a first
99.Pq and not very reliable
100approximation to a
101.Dq safe
102version of
103.Nm .
104.It Fl V
105Print the version number of
106.Nm
107to standard output and exit.
108.It Fl v Ar var Ns = Ns Ar value
109Assign
110.Ar value
111to variable
112.Ar var
113before
114.Ar prog
115is executed;
116any number of
117.Fl v
118options may be present.
119.El
120.Pp
121The input is normally made up of input lines
122.Pq records
123separated by newlines, or by the value of
124.Va RS .
125If
126.Va RS
127is null, then any number of blank lines are used as the record separator,
128and newlines are used as field separators
129(in addition to the value of
130.Va FS ) .
131This is convenient when working with multi-line records.
132.Pp
133An input line is normally made up of fields separated by whitespace,
134or by the regular expression
135.Va FS .
136The fields are denoted
137.Va $1 , $2 , ... ,
138while
139.Va $0
140refers to the entire line.
141If
142.Va FS
143is null, the input line is split into one field per character.
144.Pp
145Normally, any number of blanks separate fields.
146In order to set the field separator to a single blank, use the
147.Fl F
148option with a value of
149.Sq [\ \&] .
150If a field separator of
151.Sq t
152is specified,
153.Nm
154treats it as if
155.Sq \et
156had been specified and uses
157.Aq TAB
158as the field separator.
159In order to use a literal
160.Sq t
161as the field separator, use the
162.Fl F
163option with a value of
164.Sq [t] .
165.Pp
166A pattern-action statement has the form
167.Pp
168.D1 Ar pattern Ic \&{ Ar action Ic \&}
169.Pp
170A missing
171.Ic \&{ Ar action Ic \&}
172means print the line;
173a missing pattern always matches.
174Pattern-action statements are separated by newlines or semicolons.
175.Pp
176Newlines are permitted after a terminating statement or following a comma
177.Pq Sq ,\& ,
178an open brace
179.Pq Sq { ,
180a logical AND
181.Pq Sq && ,
182a logical OR
183.Pq Sq || ,
184after the
185.Sq do
186or
187.Sq else
188keywords,
189or after the closing parenthesis of an
190.Sq if ,
191.Sq for ,
192or
193.Sq while
194statement.
195Additionally, a backslash
196.Pq Sq \e
197can be used to escape a newline between tokens.
198.Pp
199An action is a sequence of statements.
200A statement can be one of the following:
201.Pp
202.Bl -tag -width Ds -offset indent -compact
203.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement
204.It Ic while Ar ( expression ) Ar statement
205.It Ic for Ar ( expression ; expression ; expression ) statement
206.It Ic for Ar ( var Ic in Ar array ) statement
207.It Ic do Ar statement Ic while Ar ( expression )
208.It Ic break
209.It Ic continue
210.It Xo Ic {
211.Op Ar statement ...
212.Ic }
213.Xc
214.It Xo Ar expression
215.No # commonly
216.Ar var No = Ar expression
217.Xc
218.It Xo Ic print
219.Op Ar expression-list
220.Op > Ns Ar expression
221.Xc
222.It Xo Ic printf Ar format
223.Op Ar ... , expression-list
224.Op > Ns Ar expression
225.Xc
226.It Ic return Op Ar expression
227.It Xo Ic next
228.No # skip remaining patterns on this input line
229.Xc
230.It Xo Ic nextfile
231.No # skip rest of this file, open next, start at top
232.Xc
233.It Xo Ic delete
234.Sm off
235.Ar array Ic \&[ Ar expression Ic \&]
236.Sm on
237.No # delete an array element
238.Xc
239.It Xo Ic delete Ar array
240.No # delete all elements of array
241.Xc
242.It Xo Ic exit
243.Op Ar expression
244.No # exit immediately; status is Ar expression
245.Xc
246.El
247.Pp
248Statements are terminated by
249semicolons, newlines or right braces.
250An empty
251.Ar expression-list
252stands for
253.Ar $0 .
254String constants are quoted
255.Li \&"" ,
256with the usual C escapes recognized within
257(see
258.Xr printf 1
259for a complete list of these).
260Expressions take on string or numeric values as appropriate,
261and are built using the operators
262.Ic + \- * / % ^
263.Pq exponentiation ,
264and concatenation
265.Pq indicated by whitespace .
266The operators
267.Ic \&! ++ \-\- += \-= *= /= %= ^=
268.Ic > >= < <= == != ?:
269are also available in expressions.
270Variables may be scalars, array elements
271(denoted
272.Li x[i] )
273or fields.
274Variables are initialized to the null string.
275Array subscripts may be any string,
276not necessarily numeric;
277this allows for a form of associative memory.
278Multiple subscripts such as
279.Li [i,j,k]
280are permitted; the constituents are concatenated,
281separated by the value of
282.Va SUBSEP
283.Pq see the section on variables below .
284.Pp
285The
286.Ic print
287statement prints its arguments on the standard output
288(or on a file if
289.Pf > Ar file
290or
291.Pf >> Ar file
292is present or on a pipe if
293.Pf |\ \& Ar cmd
294is present), separated by the current output field separator,
295and terminated by the output record separator.
296.Ar file
297and
298.Ar cmd
299may be literal names or parenthesized expressions;
300identical string values in different statements denote
301the same open file.
302The
303.Ic printf
304statement formats its expression list according to the format
305(see
306.Xr printf 1 ) .
307.Pp
308Patterns are arbitrary Boolean combinations
309(with
310.Ic "\&! || &&" )
311of regular expressions and
312relational expressions.
313.Nm
314supports extended regular expressions
315.Pq EREs .
316See
317.Xr re_format 7
318for more information on regular expressions.
319Isolated regular expressions
320in a pattern apply to the entire line.
321Regular expressions may also occur in
322relational expressions, using the operators
323.Ic ~
324and
325.Ic !~ .
326.Pf / Ar re Ns /
327is a constant regular expression;
328any string (constant or variable) may be used
329as a regular expression, except in the position of an isolated regular expression
330in a pattern.
331.Pp
332A pattern may consist of two patterns separated by a comma;
333in this case, the action is performed for all lines
334from an occurrence of the first pattern
335through an occurrence of the second.
336.Pp
337A relational expression is one of the following:
338.Pp
339.Bl -tag -width Ds -offset indent -compact
340.It Ar expression matchop regular-expression
341.It Ar expression relop expression
342.It Ar expression Ic in Ar array-name
343.It Xo Ic \&( Ns
344.Ar expr , expr , \&... Ns Ic \&) in
345.Ar array-name
346.Xc
347.El
348.Pp
349where a
350.Ar relop
351is any of the six relational operators in C, and a
352.Ar matchop
353is either
354.Ic ~
355(matches)
356or
357.Ic !~
358(does not match).
359A conditional is an arithmetic expression,
360a relational expression,
361or a Boolean combination
362of these.
363.Pp
364The special patterns
365.Ic BEGIN
366and
367.Ic END
368may be used to capture control before the first input line is read
369and after the last.
370.Ic BEGIN
371and
372.Ic END
373do not combine with other patterns.
374.Pp
375Variable names with special meanings:
376.Pp
377.Bl -tag -width "FILENAME " -compact
378.It Va ARGC
379Argument count, assignable.
380.It Va ARGV
381Argument array, assignable;
382non-null members are taken as filenames.
383.It Va CONVFMT
384Conversion format when converting numbers
385(default
386.Qq Li %.6g ) .
387.It Va ENVIRON
388Array of environment variables; subscripts are names.
389.It Va FILENAME
390The name of the current input file.
391.It Va FNR
392Ordinal number of the current record in the current file.
393.It Va FS
394Regular expression used to separate fields; also settable
395by option
396.Fl F Ar fs .
397.It Va NF
398Number of fields in the current record.
399.Va $NF
400can be used to obtain the value of the last field in the current record.
401.It Va NR
402Ordinal number of the current record.
403.It Va OFMT
404Output format for numbers (default
405.Qq Li %.6g ) .
406.It Va OFS
407Output field separator (default blank).
408.It Va ORS
409Output record separator (default newline).
410.It Va RLENGTH
411The length of the string matched by the
412.Fn match
413function.
414.It Va RS
415Input record separator (default newline).
416.It Va RSTART
417The starting position of the string matched by the
418.Fn match
419function.
420.It Va SUBSEP
421Separates multiple subscripts (default 034).
422.El
423.Sh FUNCTIONS
424The awk language has a variety of built-in functions:
425arithmetic, string, input/output, general, and bit-operation.
426.Pp
427Functions may be defined (at the position of a pattern-action statement)
428thusly:
429.Pp
430.Dl function foo(a, b, c) { ...; return x }
431.Pp
432Parameters are passed by value if scalar, and by reference if array name;
433functions may be called recursively.
434Parameters are local to the function; all other variables are global.
435Thus local variables may be created by providing excess parameters in
436the function definition.
437.Ss Arithmetic Functions
438.Bl -tag -width "atan2(y, x)"
439.It Fn atan2 y x
440Return the arctangent of
441.Fa y Ns / Ns Fa x
442in radians.
443.It Fn cos x
444Return the cosine of
445.Fa x ,
446where
447.Fa x
448is in radians.
449.It Fn exp x
450Return the exponential of
451.Fa x .
452.It Fn int x
453Return
454.Fa x
455truncated to an integer value.
456.It Fn log x
457Return the natural logarithm of
458.Fa x .
459.It Fn rand
460Return a random number,
461.Fa n ,
462such that
463.Sm off
464.Pf 0 \*(Le Fa n No \*(Lt 1 .
465.Sm on
466.It Fn sin x
467Return the sine of
468.Fa x ,
469where
470.Fa x
471is in radians.
472.It Fn sqrt x
473Return the square root of
474.Fa x .
475.It Fn srand expr
476Sets seed for
477.Fn rand
478to
479.Fa expr
480and returns the previous seed.
481If
482.Fa expr
483is omitted, the time of day is used instead.
484.El
485.Ss String Functions
486.Bl -tag -width "split(s, a, fs)"
487.It Fn gsub r t s
488The same as
489.Fn sub
490except that all occurrences of the regular expression are replaced.
491.Fn gsub
492returns the number of replacements.
493.It Fn index s t
494The position in
495.Fa s
496where the string
497.Fa t
498occurs, or 0 if it does not.
499.It Fn length s
500The length of
501.Fa s
502taken as a string,
503or of
504.Va $0
505if no argument is given.
506.It Fn match s r
507The position in
508.Fa s
509where the regular expression
510.Fa r
511occurs, or 0 if it does not.
512The variable
513.Va RSTART
514is set to the starting position of the matched string
515.Pq which is the same as the returned value
516or zero if no match is found.
517The variable
518.Va RLENGTH
519is set to the length of the matched string,
520or \-1 if no match is found.
521.It Fn split s a fs
522Splits the string
523.Fa s
524into array elements
525.Va a[1] , a[2] , ... , a[n]
526and returns
527.Va n .
528The separation is done with the regular expression
529.Ar fs
530or with the field separator
531.Va FS
532if
533.Ar fs
534is not given.
535An empty string as field separator splits the string
536into one array element per character.
537.It Fn sprintf fmt expr ...
538The string resulting from formatting
539.Fa expr , ...
540according to the
541.Xr printf 1
542format
543.Fa fmt .
544.It Fn sub r t s
545Substitutes
546.Fa t
547for the first occurrence of the regular expression
548.Fa r
549in the string
550.Fa s .
551If
552.Fa s
553is not given,
554.Va $0
555is used.
556An ampersand
557.Pq Sq &
558in
559.Fa t
560is replaced in string
561.Fa s
562with regular expression
563.Fa r .
564A literal ampersand can be specified by preceding it with two backslashes
565.Pq Sq \e\e .
566A literal backslash can be specified by preceding it with another backslash
567.Pq Sq \e\e .
568.Fn sub
569returns the number of replacements.
570.It Fn substr s m n
571Return at most the
572.Fa n Ns -character
573substring of
574.Fa s
575that begins at position
576.Fa m
577counted from 1.
578If
579.Fa n
580is omitted, or if
581.Fa n
582specifies more characters than are left in the string,
583the length of the substring is limited by the length of
584.Fa s .
585.It Fn tolower str
586Returns a copy of
587.Fa str
588with all upper-case characters translated to their
589corresponding lower-case equivalents.
590.It Fn toupper str
591Returns a copy of
592.Fa str
593with all lower-case characters translated to their
594corresponding upper-case equivalents.
595.El
596.Ss Input/Output and General Functions
597.Bl -tag -width "getline [var] < file"
598.It Fn close expr
599Closes the file or pipe
600.Fa expr .
601.Fa expr
602should match the string that was used to open the file or pipe.
603.It Ar cmd | Ic getline Op Va var
604Read a record of input from a stream piped from the output of
605.Ar cmd .
606If
607.Va var
608is omitted, the variables
609.Va $0
610and
611.Va NF
612are set.
613Otherwise
614.Va var
615is set.
616If the stream is not open, it is opened.
617As long as the stream remains open, subsequent calls
618will read subsequent records from the stream.
619The stream remains open until explicitly closed with a call to
620.Fn close .
621.Ic getline
622returns 1 for a successful input, 0 for end of file, and \-1 for an error.
623.It Fn fflush [expr]
624Flushes any buffered output for the file or pipe
625.Fa expr ,
626or all open files or pipes if
627.Fa expr
628is omitted.
629.Fa expr
630should match the string that was used to open the file or pipe.
631.It Ic getline
632Sets
633.Va $0
634to the next input record from the current input file.
635This form of
636.Ic getline
637sets the variables
638.Va NF ,
639.Va NR ,
640and
641.Va FNR .
642.Ic getline
643returns 1 for a successful input, 0 for end of file, and \-1 for an error.
644.It Ic getline Va var
645Sets
646.Va $0
647to variable
648.Va var .
649This form of
650.Ic getline
651sets the variables
652.Va NR
653and
654.Va FNR .
655.Ic getline
656returns 1 for a successful input, 0 for end of file, and \-1 for an error.
657.It Xo
658.Ic getline Op Va var
659.Pf \ \&< Ar file
660.Xc
661Sets
662.Va $0
663to the next record from
664.Ar file .
665If
666.Va var
667is omitted, the variables
668.Va $0
669and
670.Va NF
671are set.
672Otherwise
673.Va var
674is set.
675If
676.Ar file
677is not open, it is opened.
678As long as the stream remains open, subsequent calls will read subsequent
679records from
680.Ar file .
681.Ar file
682remains open until explicitly closed with a call to
683.Fn close .
684.It Fn system cmd
685Executes
686.Fa cmd
687and returns its exit status.
688.El
689.Ss Bit-Operation Functions
690.Bl -tag -width "lshift(a, b)"
691.It Fn compl x
692Returns the bitwise complement of integer argument x.
693.It Fn and v1 v2 ...
694Performs a bitwise AND on all arguments provided, as integers.
695There must be at least two values.
696.It Fn or v1 v2 ...
697Performs a bitwise OR on all arguments provided, as integers.
698There must be at least two values.
699.It Fn xor v1 v2 ...
700Performs a bitwise Exclusive-OR on all arguments provided, as integers.
701There must be at least two values.
702.It Fn lshift x n
703Returns integer argument x shifted by n bits to the left.
704.It Fn rshift x n
705Returns integer argument x shifted by n bits to the right.
706.El
707.Sh EXIT STATUS
708.Ex -std awk
709.Pp
710But note that the
711.Ic exit
712expression can modify the exit status.
713.Sh EXAMPLES
714Print lines longer than 72 characters:
715.Pp
716.Dl length($0) > 72
717.Pp
718Print first two fields in opposite order:
719.Pp
720.Dl { print $2, $1 }
721.Pp
722Same, with input fields separated by comma and/or blanks and tabs:
723.Bd -literal -offset indent
724BEGIN { FS = ",[ \et]*|[ \et]+" }
725      { print $2, $1 }
726.Ed
727.Pp
728Add up first column, print sum and average:
729.Bd -literal -offset indent
730{ s += $1 }
731END { print "sum is", s, " average is", s/NR }
732.Ed
733.Pp
734Print all lines between start/stop pairs:
735.Pp
736.Dl /start/, /stop/
737.Pp
738Simulate echo(1):
739.Bd -literal -offset indent
740BEGIN { # Simulate echo(1)
741        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
742        printf "\en"
743        exit }
744.Ed
745.Pp
746Print an error message to standard error:
747.Bd -literal -offset indent
748{ print "error!" > "/dev/stderr" }
749.Ed
750.Sh SEE ALSO
751.Xr cut 1 ,
752.Xr lex 1 ,
753.Xr printf 1 ,
754.Xr sed 1 ,
755.Xr re_format 7 ,
756.Xr script 7
757.Rs
758.%A A. V. Aho
759.%A B. W. Kernighan
760.%A P. J. Weinberger
761.%T The AWK Programming Language
762.%I Addison-Wesley
763.%D 1988
764.%O ISBN 0-201-07981-X
765.Re
766.Sh STANDARDS
767The
768.Nm
769utility is compliant with the
770.St -p1003.1-2008
771specification,
772except
773.Nm
774does not support {n,m} pattern matching.
775.Pp
776The flags
777.Op Fl \&dV
778and
779.Op Fl safe ,
780as well as the commands
781.Cm fflush , compl , and , or ,
782.Cm xor , lshift , rshift ,
783are extensions to that specification.
784.Sh HISTORY
785An
786.Nm
787utility appeared in
788.At v7 .
789.Sh BUGS
790There are no explicit conversions between numbers and strings.
791To force an expression to be treated as a number add 0 to it;
792to force it to be treated as a string concatenate
793.Li \&""
794to it.
795.Pp
796The scope rules for variables in functions are a botch;
797the syntax is worse.
798