Go to the previous, next section.
Control statements such as if
, while
, and so on
control the flow of execution in awk
programs. Most of the
control statements in awk
are patterned on similar statements in
C.
All the control statements start with special keywords such as if
and while
, to distinguish them from simple expressions.
Many control statements contain other statements; for example, the
if
statement contains another statement which may or may not be
executed. The contained statement is called the body. If you
want to include more than one statement in the body, group them into a
single compound statement with curly braces, separating them with
newlines or semicolons.
if
Statement
The if
-else
statement is awk
's decision-making
statement. It looks like this:
if (condition) then-body [else else-body]
condition is an expression that controls what the rest of the
statement will do. If condition is true, then-body is
executed; otherwise, else-body is executed (assuming that the
else
clause is present). The else
part of the statement is
optional. The condition is considered false if its value is zero or
the null string, and true otherwise.
Here is an example:
if (x % 2 == 0) print "x is even" else print "x is odd"
In this example, if the expression x % 2 == 0
is true (that is,
the value of x
is divisible by 2), then the first print
statement is executed, otherwise the second print
statement is
performed.
If the else
appears on the same line as then-body, and
then-body is not a compound statement (i.e., not surrounded by
curly braces), then a semicolon must separate then-body from
else
. To illustrate this, let's rewrite the previous example:
awk '{ if (x % 2 == 0) print "x is even"; else print "x is odd" }'
If you forget the `;', awk
won't be able to parse the
statement, and you will get a syntax error.
We would not actually write this example this way, because a human
reader might fail to see the else
if it were not the first thing
on its line.
while
StatementIn programming, a loop means a part of a program that is (or at least can be) executed two or more times in succession.
The while
statement is the simplest looping statement in
awk
. It repeatedly executes a statement as long as a condition is
true. It looks like this:
while (condition) body
Here body is a statement that we call the body of the loop, and condition is an expression that controls how long the loop keeps running.
The first thing the while
statement does is test condition.
If condition is true, it executes the statement body.
(condition is true when the value
is not zero and not a null string.) After body has been executed,
condition is tested again, and if it is still true, body is
executed again. This process repeats until condition is no longer
true. If condition is initially false, the body of the loop is
never executed.
This example prints the first three fields of each record, one per line.
awk '{ i = 1 while (i <= 3) { print $i i++ } }'
Here the body of the loop is a compound statement enclosed in braces, containing two statements.
The loop works like this: first, the value of i
is set to 1.
Then, the while
tests whether i
is less than or equal to
three. This is the case when i
equals one, so the i
-th
field is printed. Then the i++
increments the value of i
and the loop repeats. The loop terminates when i
reaches 4.
As you can see, a newline is not required between the condition and the body; but using one makes the program clearer unless the body is a compound statement or is very simple. The newline after the open-brace that begins the compound statement is not required either, but the program would be hard to read without it.
do
-while
Statement
The do
loop is a variation of the while
looping statement.
The do
loop executes the body once, then repeats body
as long as condition is true. It looks like this:
do body while (condition)
Even if condition is false at the start, body is executed at
least once (and only once, unless executing body makes
condition true). Contrast this with the corresponding
while
statement:
while (condition) body
This statement does not execute body even once if condition is false to begin with.
Here is an example of a do
statement:
awk '{ i = 1 do { print $0 i++ } while (i <= 10) }'
prints each input record ten times. It isn't a very realistic example,
since in this case an ordinary while
would do just as well. But
this reflects actual experience; there is only occasionally a real use
for a do
statement.
for
Statement
The for
statement makes it more convenient to count iterations of a
loop. The general form of the for
statement looks like this:
for (initialization; condition; increment) body
This statement starts by executing initialization. Then, as long as condition is true, it repeatedly executes body and then increment. Typically initialization sets a variable to either zero or one, increment adds 1 to it, and condition compares it against the desired number of iterations.
Here is an example of a for
statement:
awk '{ for (i = 1; i <= 3; i++) print $i }'
This prints the first three fields of each input record, one field per line.
In the for
statement, body stands for any statement, but
initialization, condition and increment are just
expressions. You cannot set more than one variable in the
initialization part unless you use a multiple assignment statement
such as x = y = 0
, which is possible only if all the initial values
are equal. (But you can initialize additional variables by writing
their assignments as separate statements preceding the for
loop.)
The same is true of the increment part; to increment additional
variables, you must write separate statements at the end of the loop.
The C compound expression, using C's comma operator, would be useful in
this context, but it is not supported in awk
.
Most often, increment is an increment expression, as in the example above. But this is not required; it can be any expression whatever. For example, this statement prints all the powers of 2 between 1 and 100:
for (i = 1; i <= 100; i *= 2) print i
Any of the three expressions in the parentheses following the for
may
be omitted if there is nothing to be done there. Thus, `for (;x
> 0;)' is equivalent to `while (x > 0)'. If the
condition is omitted, it is treated as true, effectively
yielding an infinite loop (i.e., a loop that will never
terminate).
In most cases, a for
loop is an abbreviation for a while
loop, as shown here:
initialization while (condition) { body increment }
The only exception is when the continue
statement
(see section The continue
Statement) is used
inside the loop; changing a for
statement to a while
statement in this way can change the effect of the continue
statement inside the loop.
There is an alternate version of the for
loop, for iterating over
all the indices of an array:
for (i in array) do something with array[i]
See section Arrays in awk
, for more information on this
version of the for
loop.
The awk
language has a for
statement in addition to a
while
statement because often a for
loop is both less work to
type and more natural to think of. Counting the number of iterations is
very common in loops. It can be easier to think of this counting as part
of looping rather than as something to do inside the loop.
The next section has more complicated examples of for
loops.
break
Statement
The break
statement jumps out of the innermost for
,
while
, or do
-while
loop that encloses it. The
following example finds the smallest divisor of any integer, and also
identifies prime numbers:
awk '# find smallest divisor of num { num = $1 for (div = 2; div*div <= num; div++) if (num % div == 0) break if (num % div == 0) printf "Smallest divisor of %d is %d\n", num, div else printf "%d is prime\n", num }'
When the remainder is zero in the first if
statement, awk
immediately breaks out of the containing for
loop. This means
that awk
proceeds immediately to the statement following the loop
and continues processing. (This is very different from the exit
statement which stops the entire awk
program.
See section The exit
Statement.)
Here is another program equivalent to the previous one. It illustrates how
the condition of a for
or while
could just as well be
replaced with a break
inside an if
:
awk '# find smallest divisor of num { num = $1 for (div = 2; ; div++) { if (num % div == 0) { printf "Smallest divisor of %d is %d\n", num, div break } if (div*div > num) { printf "%d is prime\n", num break } } }'
continue
Statement
The continue
statement, like break
, is used only inside
for
, while
, and do
-while
loops. It skips
over the rest of the loop body, causing the next cycle around the loop
to begin immediately. Contrast this with break
, which jumps out
of the loop altogether. Here is an example:
# print names that don't contain the string "ignore" # first, save the text of each line { names[NR] = $0 } # print what we're interested in END { for (x in names) { if (names[x] ~ /ignore/) continue print names[x] } }
If one of the input records contains the string `ignore', this example skips the print statement for that record, and continues back to the first statement in the loop.
This is not a practical example of continue
, since it would be
just as easy to write the loop like this:
for (x in names) if (names[x] !~ /ignore/) print names[x]
The continue
statement in a for
loop directs awk
to
skip the rest of the body of the loop, and resume execution with the
increment-expression of the for
statement. The following program
illustrates this fact:
awk 'BEGIN { for (x = 0; x <= 20; x++) { if (x == 5) continue printf ("%d ", x) } print "" }'
This program prints all the numbers from 0 to 20, except for 5, for
which the printf
is skipped. Since the increment x++
is not skipped, x
does not remain stuck at 5. Contrast the
for
loop above with the while
loop:
awk 'BEGIN { x = 0 while (x <= 20) { if (x == 5) continue printf ("%d ", x) x++ } print "" }'
This program loops forever once x
gets to 5.
As described above, the continue
statement has no meaning when
used outside the body of a loop. However, although it was never documented,
historical implementations of awk
have treated the continue
statement outside of a loop as if it were a next
statement
(see section The next
Statement).
By default, gawk
silently supports this usage. However, if
`-W posix' has been specified on the command line
(see section Invoking awk
),
it will be treated as an error, since the POSIX standard specifies
that continue
should only be used inside the body of a loop.
next
Statement
The next
statement forces awk
to immediately stop processing
the current record and go on to the next record. This means that no
further rules are executed for the current record. The rest of the
current rule's action is not executed either.
Contrast this with the effect of the getline
function
(see section Explicit Input with getline
). That too causes
awk
to read the next record immediately, but it does not alter the
flow of control in any way. So the rest of the current action executes
with a new input record.
At the highest level, awk
program execution is a loop that reads
an input record and then tests each rule's pattern against it. If you
think of this loop as a for
statement whose body contains the
rules, then the next
statement is analogous to a continue
statement: it skips to the end of the body of this implicit loop, and
executes the increment (which reads another record).
For example, if your awk
program works only on records with four
fields, and you don't want it to fail when given bad input, you might
use this rule near the beginning of the program:
NF != 4 { printf("line %d skipped: doesn't have 4 fields", FNR) > "/dev/stderr" next }
so that the following rules will not see the bad record. The error message is redirected to the standard error output stream, as error messages should be. See section Standard I/O Streams.
According to the POSIX standard, the behavior is undefined if
the next
statement is used in a BEGIN
or END
rule.
gawk
will treat it as a syntax error.
If the next
statement causes the end of the input to be reached,
then the code in the END
rules, if any, will be executed.
See section BEGIN
and END
Special Patterns.
next file
Statement
The next file
statement is similar to the next
statement.
However, instead of abandoning processing of the current record, the
next file
statement instructs awk
to stop processing the
current data file.
Upon execution of the next file
statement, FILENAME
is
updated to the name of the next data file listed on the command line,
FNR
is reset to 1, and processing starts over with the first
rule in the progam. See section Built-in Variables.
If the next file
statement causes the end of the input to be reached,
then the code in the END
rules, if any, will be executed.
See section BEGIN
and END
Special Patterns.
The next file
statement is a gawk
extension; it is not
(currently) available in any other awk
implementation. You can
simulate its behavior by creating a library file named `nextfile.awk',
with the following contents. (This sample program uses user-defined
functions, a feature that has not been presented yet.
See section User-defined Functions,
for more information.)
# nextfile -- function to skip remaining records in current file # this should be read in before the "main" awk program function nextfile() { _abandon_ = FILENAME; next } _abandon_ == FILENAME && FNR > 1 { next } _abandon_ == FILENAME && FNR == 1 { _abandon_ = "" }
The nextfile
function simply sets a "private" variable(3) to the name of the current data file, and then retrieves the next
record. Since this file is read before the main awk
program,
the rules that follows the function definition will be executed before the
rules in the main program. The first rule continues to skip records as long as
the name of the input file has not changed, and this is not the first
record in the file. This rule is sufficient most of the time. But what if
the same data file is named twice in a row on the command line?
This rule would not process the data file the second time. The second rule
catches this case: If the data file name is what was being skipped, but
FNR
is 1, then this is the second time the file is being processed,
and it should not be skipped.
The next file
statement would be useful if you have many data
files to process, and due to the nature of the data, you expect that you
would not want to process every record in the file. In order to move on to
the next data file, you would have to continue scanning the unwanted
records (as described above). The next file
statement accomplishes
this much more efficiently.
exit
Statement
The exit
statement causes awk
to immediately stop
executing the current rule and to stop processing input; any remaining input
is ignored.
If an exit
statement is executed from a BEGIN
rule the
program stops processing everything immediately. No input records are
read. However, if an END
rule is present, it is executed
(see section BEGIN
and END
Special Patterns).
If exit
is used as part of an END
rule, it causes
the program to stop immediately.
An exit
statement that is part of an ordinary rule (that is, not part
of a BEGIN
or END
rule) stops the execution of any further
automatic rules, but the END
rule is executed if there is one.
If you do not want the END
rule to do its job in this case, you
can set a variable to nonzero before the exit
statement, and check
that variable in the END
rule.
If an argument is supplied to exit
, its value is used as the exit
status code for the awk
process. If no argument is supplied,
exit
returns status zero (success).
For example, let's say you've discovered an error condition you really
don't know how to handle. Conventionally, programs report this by
exiting with a nonzero status. Your awk
program can do this
using an exit
statement with a nonzero argument. Here's an
example of this:
BEGIN { if (("date" | getline date_now) < 0) { print "Can't get system date" > "/dev/stderr" exit 4 } }
Go to the previous, next section.