Go to the previous, next section.
One of the most common things that actions do is to output or print
some or all of the input. For simple output, use the print
statement. For fancier formatting use the printf
statement.
Both are described in this chapter.
print
Statement
The print
statement does output with simple, standardized
formatting. You specify only the strings or numbers to be printed, in a
list separated by commas. They are output, separated by single spaces,
followed by a newline. The statement looks like this:
print item1, item2, ...
The entire list of items may optionally be enclosed in parentheses. The
parentheses are necessary if any of the item expressions uses a
relational operator; otherwise it could be confused with a redirection
(see section Redirecting Output of print
and printf
).
The relational operators are `==',
`!=', `<', `>', `>=', `<=', `~' and
`!~' (see section Comparison Expressions).
The items printed can be constant strings or numbers, fields of the
current record (such as $1
), variables, or any awk
expressions. The print
statement is completely general for
computing what values to print. With two exceptions,
you cannot specify how to print them--how many
columns, whether to use exponential notation or not, and so on.
(See section Output Separators, and
section Controlling Numeric Output with print
.)
For that, you need the printf
statement
(see section Using printf
Statements for Fancier Printing).
The simple statement `print' with no items is equivalent to
`print $0': it prints the entire current record. To print a blank
line, use `print ""', where ""
is the null, or empty,
string.
To print a fixed piece of text, use a string constant such as
"Hello there"
as one item. If you forget to use the
double-quote characters, your text will be taken as an awk
expression, and you will probably get an error. Keep in mind that a
space is printed between any two items.
Most often, each print
statement makes one line of output. But it
isn't limited to one line. If an item value is a string that contains a
newline, the newline is output along with the rest of the string. A
single print
can make any number of lines this way.
print
StatementsHere is an example of printing a string that contains embedded newlines:
awk 'BEGIN { print "line one\nline two\nline three" }'
produces output like this:
line one line two line three
Here is an example that prints the first two fields of each input record, with a space between them:
awk '{ print $1, $2 }' inventory-shipped
Its output looks like this:
Jan 13 Feb 15 Mar 15 ...
A common mistake in using the print
statement is to omit the comma
between two items. This often has the effect of making the items run
together in the output, with no space. The reason for this is that
juxtaposing two string expressions in awk
means to concatenate
them. For example, without the comma:
awk '{ print $1 $2 }' inventory-shipped
prints:
Jan13 Feb15 Mar15 ...
Neither example's output makes much sense to someone unfamiliar with the
file `inventory-shipped'. A heading line at the beginning would make
it clearer. Let's add some headings to our table of months ($1
) and
green crates shipped ($2
). We do this using the BEGIN
pattern
(see section BEGIN
and END
Special Patterns) to force the headings to be printed only once:
awk 'BEGIN { print "Month Crates" print "----- ------" } { print $1, $2 }' inventory-shipped
Did you already guess what happens? This program prints the following:
Month Crates ----- ------ Jan 13 Feb 15 Mar 15 ...
The headings and the table data don't line up! We can fix this by printing some spaces between the two fields:
awk 'BEGIN { print "Month Crates" print "----- ------" } { print $1, " ", $2 }' inventory-shipped
You can imagine that this way of lining up columns can get pretty
complicated when you have many columns to fix. Counting spaces for two
or three columns can be simple, but more than this and you can get
"lost" quite easily. This is why the printf
statement was
created (see section Using printf
Statements for Fancier Printing);
one of its specialties is lining up columns of data.
As mentioned previously, a print
statement contains a list
of items, separated by commas. In the output, the items are normally
separated by single spaces. But they do not have to be spaces; a
single space is only the default. You can specify any string of
characters to use as the output field separator by setting the
built-in variable OFS
. The initial value of this variable
is the string " "
, that is, just a single space.
The output from an entire print
statement is called an
output record. Each print
statement outputs one output
record and then outputs a string called the output record separator.
The built-in variable ORS
specifies this string. The initial
value of the variable is the string "\n"
containing a newline
character; thus, normally each print
statement makes a separate line.
You can change how output fields and records are separated by assigning
new values to the variables OFS
and/or ORS
. The usual
place to do this is in the BEGIN
rule
(see section BEGIN
and END
Special Patterns), so
that it happens before any input is processed. You may also do this
with assignments on the command line, before the names of your input
files.
The following example prints the first and second fields of each input record separated by a semicolon, with a blank line added after each line:
awk 'BEGIN { OFS = ";"; ORS = "\n\n" } { print $1, $2 }' BBS-list
If the value of ORS
does not contain a newline, all your output
will be run together on a single line, unless you output newlines some
other way.
print
print
statement to print numeric values,
awk
internally converts the number to a string of characters,
and prints that string. awk
uses the sprintf
function
to do this conversion. For now, it suffices to say that the sprintf
function accepts a format specification that tells it how to format
numbers (or strings), and that there are a number of different ways that
numbers can be formatted. The different format specifications are discussed
more fully in
section Using printf
Statements for Fancier Printing.
The built-in variable OFMT
contains the default format specification
that print
uses with sprintf
when it wants to convert a
number to a string for printing. By supplying different format specifications
as the value of OFMT
, you can change how print
will print
your numbers. As a brief example:
awk 'BEGIN { OFMT = "%d" # print numbers as integers print 17.23 }'
will print `17'.
printf
Statements for Fancier Printing
If you want more precise control over the output format than
print
gives you, use printf
. With printf
you can
specify the width to use for each item, and you can specify various
stylistic choices for numbers (such as what radix to use, whether to
print an exponent, whether to print a sign, and how many digits to print
after the decimal point). You do this by specifying a string, called
the format string, which controls how and where to print the other
arguments.
printf
Statement
The printf
statement looks like this:
printf format, item1, item2, ...
The entire list of arguments may optionally be enclosed in parentheses. The
parentheses are necessary if any of the item expressions uses a
relational operator; otherwise it could be confused with a redirection
(see section Redirecting Output of print
and printf
).
The relational operators are `==',
`!=', `<', `>', `>=', `<=', `~' and
`!~' (see section Comparison Expressions).
The difference between printf
and print
is the argument
format. This is an expression whose value is taken as a string; it
specifies how to output each of the other arguments. It is called
the format string.
The format string is the same as in the ANSI C library function
printf
. Most of format is text to be output verbatim.
Scattered among this text are format specifiers, one per item.
Each format specifier says to output the next item at that place in the
format.
The printf
statement does not automatically append a newline to its
output. It outputs only what the format specifies. So if you want
a newline, you must include one in the format. The output separator
variables OFS
and ORS
have no effect on printf
statements.
A format specifier starts with the character `%' and ends with a
format-control letter; it tells the printf
statement how
to output one item. (If you actually want to output a `%', write
`%%'.) The format-control letter specifies what kind of value to
print. The rest of the format specifier is made up of optional
modifiers which are parameters such as the field width to use.
Here is a list of the format-control letters:
printf "%4.3e", 1950
prints `1.950e+03', with a total of four significant figures of which three follow the decimal point. The `4.3' are modifiers, discussed below.
printf
FormatsA format specification can also include modifiers that can control how much of the item's value is printed and how much space it gets. The modifiers come between the `%' and the format-control letter. Here are the possible modifiers, in the order in which they may appear:
printf "%-4s", "foo"
prints `foo '.
printf "%4s", "foo"
prints ` foo'.
The value of width is a minimum width, not a maximum. If the item value requires more than width characters, it can be as wide as necessary. Thus,
printf "%4s", "foobar"
prints `foobar'.
Preceding the width with a minus sign causes the output to be padded with spaces on the right, instead of on the left.
The C library printf
's dynamic width and prec
capability (for example, "%*.*s"
) is supported. Instead of
supplying explicit width and/or prec values in the format
string, you pass them in the argument list. For example:
w = 5 p = 3 s = "abcdefg" printf "<%*.*s>\n", w, p, s
is exactly equivalent to
s = "abcdefg" printf "<%5.3s>\n", s
Both programs output `<**abc>'. (We have used the bullet symbol "*" to represent a space, to clearly show you that there are two spaces in the output.)
Earlier versions of awk
did not support this capability. You may
simulate it by using concatenation to build up the format string,
like so:
w = 5 p = 3 s = "abcdefg" printf "<%" w "." p "s>\n", s
This is not particularly easy to read, however.
printf
Here is how to use printf
to make an aligned table:
awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
prints the names of bulletin boards ($1
) of the file
`BBS-list' as a string of 10 characters, left justified. It also
prints the phone numbers ($2
) afterward on the line. This
produces an aligned two-column table of names and phone numbers:
aardvark 555-5553 alpo-net 555-3412 barfly 555-7685 bites 555-1675 camelot 555-0542 core 555-2912 fooey 555-1234 foot 555-6699 macfoo 555-6480 sdace 555-3430 sabafoo 555-2127
Did you notice that we did not specify that the phone numbers be printed as numbers? They had to be printed as strings because the numbers are separated by a dash. This dash would be interpreted as a minus sign if we had tried to print the phone numbers as numbers. This would have led to some pretty confusing results.
We did not specify a width for the phone numbers because they are the last things on their lines. We don't need to put spaces after them.
We could make our table look even nicer by adding headings to the tops
of the columns. To do this, use the BEGIN
pattern
(see section BEGIN
and END
Special Patterns)
to force the header to be printed only once, at the beginning of
the awk
program:
awk 'BEGIN { print "Name Number" print "---- ------" } { printf "%-10s %s\n", $1, $2 }' BBS-list
Did you notice that we mixed print
and printf
statements in
the above example? We could have used just printf
statements to get
the same results:
awk 'BEGIN { printf "%-10s %s\n", "Name", "Number" printf "%-10s %s\n", "----", "------" } { printf "%-10s %s\n", $1, $2 }' BBS-list
By outputting each column heading with the same format specification used for the elements of the column, we have made sure that the headings are aligned just like the columns.
The fact that the same format specification is used three times can be emphasized by storing it in a variable, like this:
awk 'BEGIN { format = "%-10s %s\n" printf format, "Name", "Number" printf format, "----", "------" } { printf format, $1, $2 }' BBS-list
See if you can use the printf
statement to line up the headings and
table data for our `inventory-shipped' example covered earlier in the
section on the print
statement
(see section The print
Statement).
print
and printf
So far we have been dealing only with output that prints to the standard
output, usually your terminal. Both print
and printf
can
also send their output to other places.
This is called redirection.
A redirection appears after the print
or printf
statement.
Redirections in awk
are written just like redirections in shell
commands, except that they are written inside the awk
program.
Here are the three forms of output redirection. They are all shown for
the print
statement, but they work identically for printf
also.
print items > output-file
When this type of redirection is used, the output-file is erased before the first output is written to it. Subsequent writes do not erase output-file, but append to it. If output-file does not exist, then it is created.
For example, here is how one awk
program can write a list of
BBS names to a file `name-list' and a list of phone numbers to a
file `phone-list'. Each output file contains one name or number
per line.
awk '{ print $2 > "phone-list" print $1 > "name-list" }' BBS-list
print items >> output-file
awk
output is
appended to the file.
print items | command
The redirection argument command is actually an awk
expression. Its value is converted to a string, whose contents give the
shell command to be run.
For example, this produces two files, one unsorted list of BBS names and one list sorted in reverse alphabetical order:
awk '{ print $1 > "names.unsorted" print $1 | "sort -r > names.sorted" }' BBS-list
Here the unsorted list is written with an ordinary redirection while
the sorted list is written by piping through the sort
utility.
Here is an example that uses redirection to mail a message to a mailing
list `bug-system'. This might be useful when trouble is encountered
in an awk
script run periodically for system maintenance.
report = "mail bug-system" print "Awk script failed:", $0 | report print "at record number", FNR, "of", FILENAME | report close(report)
We call the close
function here because it's a good idea to close
the pipe as soon as all the intended output has been sent to it.
See section Closing Output Files and Pipes, for more information
on this. This example also illustrates the use of a variable to represent
a file or command: it is not necessary to always
use a string constant. Using a variable is generally a good idea,
since awk
requires you to spell the string value identically
every time.
Redirecting output using `>', `>>', or `|' asks the system to open a file or pipe only if the particular file or command you've specified has not already been written to by your program, or if it has been closed since it was last written to.
When a file or pipe is opened, the file name or command associated with
it is remembered by awk
and subsequent writes to the same file or
command are appended to the previous writes. The file or pipe stays
open until awk
exits. This is usually convenient.
Sometimes there is a reason to close an output file or pipe earlier
than that. To do this, use the close
function, as follows:
close(filename)
or
close(command)
The argument filename or command can be any expression. Its value must exactly equal the string used to open the file or pipe to begin with--for example, if you open a pipe with this:
print $1 | "sort -r > names.sorted"
then you must close it with this:
close("sort -r > names.sorted")
Here are some reasons why you might need to close an output file:
awk
program. Close the file when you are finished writing it; then
you can start reading it with getline
(see section Explicit Input with getline
).
awk
program. If you don't close the files, eventually you may exceed a
system limit on the number of open files in one process. So close
each one when you are finished writing it.
mail
program, the message is not
actually sent until the pipe is closed.
For example, suppose you pipe output to the mail
program. If you
output several lines redirected to this pipe without closing it, they make
a single message of several lines. By contrast, if you close the pipe
after each line of output, then each line makes a separate message.
close
returns a value of zero if the close succeeded.
Otherwise, the value will be non-zero.
In this case, gawk
sets the variable ERRNO
to a string
describing the error that occurred.
Running programs conventionally have three input and output streams already available to them for reading and writing. These are known as the standard input, standard output, and standard error output. These streams are, by default, terminal input and output, but they are often redirected with the shell, via the `<', `<<', `>', `>>', `>&' and `|' operators. Standard error is used only for writing error messages; the reason we have two separate streams, standard output and standard error, is so that they can be redirected separately.
In other implementations of awk
, the only way to write an error
message to standard error in an awk
program is as follows:
print "Serious error detected!\n" | "cat 1>&2"
This works by opening a pipeline to a shell command which can access the
standard error stream which it inherits from the awk
process.
This is far from elegant, and is also inefficient, since it requires a
separate process. So people writing awk
programs have often
neglected to do this. Instead, they have sent the error messages to the
terminal, like this:
NF != 4 { printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/tty" }
This has the same effect most of the time, but not always: although the
standard error stream is usually the terminal, it can be redirected, and
when that happens, writing to the terminal is not correct. In fact, if
awk
is run from a background job, it may not have a terminal at all.
Then opening `/dev/tty' will fail.
gawk
provides special file names for accessing the three standard
streams. When you redirect input or output in gawk
, if the file name
matches one of these special names, then gawk
directly uses the
stream it stands for.
awk
execution (typically
the shell). Unless you take special pains, only descriptors 0, 1 and 2
are available.
The file names `/dev/stdin', `/dev/stdout', and `/dev/stderr' are aliases for `/dev/fd/0', `/dev/fd/1', and `/dev/fd/2', respectively, but they are more self-explanatory.
The proper way to write an error message in a gawk
program
is to use `/dev/stderr', like this:
NF != 4 { printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/stderr" }
gawk
also provides special file names that give access to information
about the running gawk
process. Each of these "files" provides
a single record of information. To read them more than once, you must
first close them with the close
function
(see section Closing Input Files and Pipes).
The filenames are:
$1
getuid
system call.
$2
geteuid
system call.
$3
getgid
system call.
$4
getegid
system call.
If there are any additional fields, they are the group IDs returned by
getgroups
system call.
(Multiple groups may not be supported on all systems.)
These special file names may be used on the command line as data
files, as well as for I/O redirections within an awk
program.
They may not be used as source files with the `-f' option.
Recognition of these special file names is disabled if gawk
is in
compatibility mode (see section Invoking awk
).
Caution: Unless your system actually has a `/dev/fd' directory (or any of the other above listed special files), the interpretation of these file names is done bygawk
itself. For example, using `/dev/fd/4' for output will actually write on file descriptor 4, and not on a new file descriptor that wasdup
'ed from file descriptor 4. Most of the time this does not matter; however, it is important to not close any of the files related to file descriptors 0, 1, and 2. If you do close one of these files, unpredictable behavior will result.
Go to the previous, next section.