The GNU Awk User’s Guide
Table of Contents
- Preface
- Getting Started
- Running
awk
andgawk
- Regular Expressions
- Reading Input Files
- Printing Output
- Expressions
- Patterns, Actions, and Variables
- Arrays in
awk
- Functions
- Problem Solving with
awk
Preface
The GNU implementation of awk is called gawk; if you invoke it with the proper options or environment variables, it is fully compatible with the POSIX specification of the awk language and with the Unix version of awk maintained by Brian Kernighan.
Getting Started
- Programs in awk consist of pattern–action pairs.
- An action without a pattern always runs.
- The default action for a pattern without one is
{ print $0 }
. - If several patterns match, then several actions execute in the order in which they appear in the awk program.
- If no patterns match, then no actions run.
- Use either
awk 'program' files
orawk -f program-file files
to run awk. - You may use the special
#! /bin/awk -f
header line. - You can add the extension
.awk
to the file name. - Comments in awk programs start with
#
. - You may use backslash continuation to continue a source line.
- Lines are automatically continued after a
,
,{
,?
,:
,||
,&&
,do
, andelse
.
Running awk
and gawk
awk [options] -f progfile [--] file …
awk [options] [--] 'program' file …
-F fs
/--field-separator fs
- Set the
FS
variable tofs
-f source-file
/--file source-file
- may be given multiple times; (codes are concatenated)
-v var=val
/--assign var=val
- Don't override built-in variable names.
--
- Signal the end of the command-line options. useful if you have file names that start with
-
, etc.
- All nonoption command-line arguments, excluding the program text, are placed in the
ARGV
array. - Adjusting
ARGC
andARGV
affects how awk processes input. - Any additional arguments on the command line are normally treated as input files to be processed in the order specified.
- However, an argument that has the form
var=value
, assigns the value value to the variable var. It does not specify a file at all. - This variable assiginment is evaluated inbetween processing input files, while
-v var=val
is evaluated beforeBEGIN
.
printf "\n" > pass1.txt
printf "\n\n" > pass2.txt
awk 'pass == 1 { print "hello" }
pass == 2 { print "world" }' pass=1 pass1.txt pass=2 pass2.txt
hello
world
world
AWKPATH
andAWKLIBPATH
for specifying library path.- These corresponds to
@include
and@load
.
This is script test1.
This is script test2.
Regular Expressions
$ awk '$1 ~ /J/' inventory-shipped
-| Jan 13 25 15 115
-| Jun 31 42 75 492
-| Jul 24 34 67 436
-| Jan 21 36 64 620
$ awk '$1 !~ /J/' inventory-shipped
-| Feb 15 32 24 226
-| Mar 15 24 34 228
-| Apr 31 52 63 420
-| May 16 34 29 208
…
\<symbol>
(no special)^
,$
,.
[..]
,[^...]
(...)
,|
*
,+
,?
{n}
,{n,}
,{n,m}
,[:alpha:]
,[:alnum:]
,[:digit:]
,[:xdigit:]
[:lower:]
,[:upper:]
[:blank:]
(spaces and tabs)[:space:]
(space)[:print:]
(printable, including spaces)[:graph:]
(printable and visible, excluding spaces)[:punct:]
(punctuation),[:cntrl:]
(control)
<A>bcd
- https://www.gnu.org/software/gawk/manual/html_node/Regexp.html#Regexp
- https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html#Escape-Sequences
- https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html#Regexp-Operators
- https://www.gnu.org/software/gawk/manual/html_node/GNU-Regexp-Operators.html#GNU-Regexp-Operators
Reading Input Files
Value of RS |
Records are split on … | awk / gawk |
---|---|---|
Any single character | That character | awk |
The empty string ("" ) |
Runs of two or more newlines | awk |
- By default, the record separator is the newline character.
FNR
indicates how many records have been read from the current input file;NR
indicates how many records have been read in to
Field separator value | Fields are split … | awk / gawk |
---|---|---|
FS == " " |
On runs of whitespace | awk |
FS == any single character |
On that character | awk |
FS == regexp |
On text matching the regexp | awk |
- By default, fields are separated by
whitespace
. FS
may be set from the command line using the-F
option.$1
refers to the first field,$2
to the second,$0
is the whole record, etc.NF
is a predefined variable whose value is the number of fields in the current record.- So,
$NF
refers to the last field. - Fields may also be assigned values, which causes the value of
$0
to be recomputed when it is later referenced. OFS
, output field separator, is used to recompute the record.- Use
getline
in its various forms to read additional records from the default input stream, from a file, or from a pipe or coprocess.
Printing Output
- The simple statement
print
with no items is equivalent toprint $0
- To print a blank line, use
print ""
. - If you use
print
with a list separated by commas, the output will be a string separated by single spaces, followed by a newline. - If you provide a list separated by spaces to
print
, it results in concatenated string.
foo bar
foobar
OFS
,ORS
are output separators- Use
printf
to print values formatted. print
andprintf
can output to other files using>
,>>
,|
.- Use
|&
to communicate with external processes.(&
seems to stand for a kind of background process) - Use
close()
to clean up the communication with external processes.
test
Expressions
awk
supplies three kinds of constants:numeric
,string
, andregexp
.- Numbers are automatically converted to strings, and strings to numbers, as needed by
awk
. - Testing an unassigned varaible against
""
and0
results intrue
.
Patterns, Actions, and Variables
/regular expression/
- match
expression
- boolean
begpat
,endpat
- range
BEGIN
,END
- per program
BEGINFILE
,ENDFILE
- per file (
gawk
specific) <empty>
- all
foo
on
bar
off
baz
on
bar
off
- Supports
if
for
while
,switch
, etc. They are similar to those ofC
.
- https://www.gnu.org/software/gawk/manual/html_node/Pattern-Overview.html#Pattern-Overview
- https://www.gnu.org/software/gawk/manual/html_node/User_002dmodified.html#User_002dmodified
- https://www.gnu.org/software/gawk/manual/html_node/Auto_002dset.html#Auto_002dset
- https://www.gnu.org/software/gawk/manual/html_node/Pattern-Action-Summary.html#Pattern-Action-Summary
Arrays in awk
- Arrays in
awk
are associative. - Arrays are indexed by string values.
- Use the
isarray()
built-in function to determine if an array element is itself a subarray.
Index | Value |
---|---|
3 |
30 |
1 |
"foo" |
0 |
8 |
2 |
"" |
Functions
- function parameters cannot have the same name as one of the special predefined variables
5
As there are no local variable, mimic local variable with redundant arguments. And in this case, as a convention, put extra spaces after the actual arguments. In this case, i
is used as a local variable.
As when arrays are the parameters to functions, they are not copied. So we can use this behavior to mimic call by reference.
Problem Solving with awk
This part of the guide dedicates to example studies.