The GNU Awk User’s Guide

Table of Contents

Preface

The GNU implementation of awk is called gawk; if you invoke it with the proper options or environment variables, it is fully compatible with the POSIX specification of the awk language and with the Unix version of awk maintained by Brian Kernighan.

Getting Started

pattern { action }
pattern { action }

Running awk and gawk

awk [options] -f progfile [--] file …
awk [options] [--] 'program' file …
-F fs / --field-separator fs
Set the FS variable to fs
-f source-file / --file source-file
may be given multiple times; (codes are concatenated)
-v var=val / --assign var=val
Don't override built-in variable names.
--
Signal the end of the command-line options. useful if you have file names that start with -, etc.
printf "\n" > pass1.txt
printf "\n\n" > pass2.txt
awk 'pass == 1  { print "hello" }
     pass == 2  { print "world" }' pass=1 pass1.txt pass=2 pass2.txt
hello
world
world
# The stdin is treated as the second file.
some_command | awk -f myprog.awk file1 - file2
test1.awk
BEGIN {
    print "This is script test1."
}
test2.awk
@include "test1"
BEGIN {
  print "This is script test2."
}
awk -f test2.awk
This is script test1.
This is script test2.

Regular Expressions

/li/ { print $2 }  # match against the line
exp ~ /regexp/     # match against the variable, or use `!~`
$ awk '$1 ~ /J/' inventory-shipped
-| Jan  13  25  15 115
-| Jun  31  42  75 492
-| Jul  24  34  67 436
-| Jan  21  36  64 620
$ awk '$1 !~ /J/' inventory-shipped
-| Feb  15  32  24 226
-| Mar  15  24  34 228
-| Apr  31  52  63 420
-| May  16  34  29 208
echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
<A>bcd
BEGIN { digits_regexp = "[[:digit:]]+" }
$0 ~ digits_regexp    { print }
x = "aB"
if (x ~ /ab/) …   # this test will fail

IGNORECASE = 1
if (x ~ /ab/) …   # now it will succeed

Reading Input Files

Value of RS Records are split on … awk / gawk
Any single character That character awk
The empty string ("") Runs of two or more newlines awk
Field separator value Fields are split … awk / gawk
FS == " " On runs of whitespace awk
FS == any single character On that character awk
FS == regexp On text matching the regexp awk

Printing Output

awk 'BEGIN { print "foo", "bar" }'
awk 'BEGIN { print "foo" "bar" }'
foo bar
foobar
awk 'BEGIN { print "test" > "test.txt" }'
cat 'test.txt'
test

Expressions

Patterns, Actions, and Variables

/regular expression/
match
expression
boolean
begpat, endpat
range
BEGIN, END
per program
BEGINFILE, ENDFILE
per file (gawk specific)
<empty>
all
onoff.txt
foo
on
bar
off
baz
awk '/^on$/, /^off$/ { print }' onoff.txt
on
bar
off

Arrays in awk

Index Value
3 30
1 "foo"
0 8
2 ""
if (2 in frequencies)
    print "Subscript 2 is present."
# The iteration order is undefined
# gawk supports some options
for (var in array)
    body
delete array[index-expression]
delete array  # deletes all element, not the array itself

Functions

function name([parameter-list])
{
    body-of-function
}
foo.awk
function foo(s) {
    return length(s)
}
echo "apple" | awk -f foo.awk -e '{ print foo($0) }' 
5

As there are no local variable, mimic local variable with redundant arguments. And in this case, as a convention, put extra spaces after the actual arguments. In this case, i is used as a local variable.

function foo(j,    i)
{
    i = j + 1
    print "foo's i=" i
    bar()
    print "foo's i=" i
}

As when arrays are the parameters to functions, they are not copied. So we can use this behavior to mimic call by reference.

Problem Solving with awk

This part of the guide dedicates to example studies.