Go to the previous, next section.
In the typical awk
program, all input is read either from the
standard input (by default the keyboard, but often a pipe from another
command) or from files whose names you specify on the awk
command
line. If you specify input files, awk
reads them in order, reading
all the data from one before going on to the next. The name of the current
input file can be found in the built-in variable FILENAME
(see section Built-in Variables).
The input is read in units called records, and processed by the rules one record at a time. By default, each record is one line. Each record is split automatically into fields, to make it more convenient for a rule to work on its parts.
On rare occasions you will need to use the getline
command,
which can do explicit input from any number of files
(see section Explicit Input with getline
).
The awk
language divides its input into records and fields.
Records are separated by a character called the record separator.
By default, the record separator is the newline character, defining
a record to be a single line of text.
Sometimes you may want to use a different character to separate your
records. You can use a different character by changing the built-in
variable RS
. The value of RS
is a string that says how
to separate records; the default value is "\n"
, the string containing
just a newline character. This is why records are, by default, single lines.
RS
can have any string as its value, but only the first character
of the string is used as the record separator. The other characters are
ignored. RS
is exceptional in this regard; awk
uses the
full value of all its other built-in variables.
You can change the value of RS
in the awk
program with the
assignment operator, `=' (see section Assignment Expressions).
The new record-separator character should be enclosed in quotation marks to make
a string constant. Often the right time to do this is at the beginning
of execution, before any input has been processed, so that the very
first record will be read with the proper separator. To do this, use
the special BEGIN
pattern
(see section BEGIN
and END
Special Patterns). For
example:
awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
changes the value of RS
to "/"
, before reading any input.
This is a string whose first character is a slash; as a result, records
are separated by slashes. Then the input file is read, and the second
rule in the awk
program (the action with no pattern) prints each
record. Since each print
statement adds a newline at the end of
its output, the effect of this awk
program is to copy the input
with each slash changed to a newline.
Another way to change the record separator is on the command line,
using the variable-assignment feature
(see section Invoking awk
).
awk '{ print $0 }' RS="/" BBS-list
This sets RS
to `/' before processing `BBS-list'.
Reaching the end of an input file terminates the current input record,
even if the last character in the file is not the character in RS
.
The empty string, ""
(a string of no characters), has a special meaning
as the value of RS
: it means that records are separated only
by blank lines. See section Multiple-Line Records, for more details.
The awk
utility keeps track of the number of records that have
been read so far from the current input file. This value is stored in a
built-in variable called FNR
. It is reset to zero when a new
file is started. Another built-in variable, NR
, is the total
number of input records read so far from all files. It starts at zero
but is never automatically reset to zero.
If you change the value of RS
in the middle of an awk
run,
the new value is used to delimit subsequent records, but the record
currently being processed (and records already processed) are not
affected.
When awk
reads an input record, the record is
automatically separated or parsed by the interpreter into chunks
called fields. By default, fields are separated by whitespace,
like words in a line.
Whitespace in awk
means any string of one or more spaces and/or
tabs; other characters such as newline, formfeed, and so on, that are
considered whitespace by other languages are not considered
whitespace by awk
.
The purpose of fields is to make it more convenient for you to refer to
these pieces of the record. You don't have to use them--you can
operate on the whole record if you wish--but fields are what make
simple awk
programs so powerful.
To refer to a field in an awk
program, you use a dollar-sign,
`$', followed by the number of the field you want. Thus, $1
refers to the first field, $2
to the second, and so on. For
example, suppose the following is a line of input:
This seems like a pretty nice example.
Here the first field, or $1
, is `This'; the second field, or
$2
, is `seems'; and so on. Note that the last field,
$7
, is `example.'. Because there is no space between the
`e' and the `.', the period is considered part of the seventh
field.
No matter how many fields there are, the last field in a record can be
represented by $NF
. So, in the example above, $NF
would
be the same as $7
, which is `example.'. Why this works is
explained below (see section Non-constant Field Numbers).
If you try to refer to a field beyond the last one, such as $8
when the record has only 7 fields, you get the empty string.
Plain NF
, with no `$', is a built-in variable whose value
is the number of fields in the current record.
$0
, which looks like an attempt to refer to the zeroth field, is
a special case: it represents the whole input record. This is what you
would use if you weren't interested in fields.
Here are some more examples:
awk '$1 ~ /foo/ { print $0 }' BBS-list
This example prints each record in the file `BBS-list' whose first
field contains the string `foo'. The operator `~' is called a
matching operator (see section Comparison Expressions);
it tests whether a string (here, the field $1
) matches a given regular
expression.
By contrast, the following example:
awk '/foo/ { print $1, $NF }' BBS-list
looks for `foo' in the entire record and prints the first field and the last field for each input record containing a match.
The number of a field does not need to be a constant. Any expression in
the awk
language can be used after a `$' to refer to a
field. The value of the expression specifies the field number. If the
value is a string, rather than a number, it is converted to a number.
Consider this example:
awk '{ print $NR }'
Recall that NR
is the number of records read so far: 1 in the
first record, 2 in the second, etc. So this example prints the first
field of the first record, the second field of the second record, and so
on. For the twentieth record, field number 20 is printed; most likely,
the record has fewer than 20 fields, so this prints a blank line.
Here is another example of using expressions as field numbers:
awk '{ print $(2*2) }' BBS-list
The awk
language must evaluate the expression (2*2)
and use
its value as the number of the field to print. The `*' sign
represents multiplication, so the expression 2*2
evaluates to 4.
The parentheses are used so that the multiplication is done before the
`$' operation; they are necessary whenever there is a binary
operator in the field-number expression. This example, then, prints the
hours of operation (the fourth field) for every line of the file
`BBS-list'.
If the field number you compute is zero, you get the entire record.
Thus, $(2-2)
has the same value as $0
. Negative field
numbers are not allowed.
The number of fields in the current record is stored in the built-in
variable NF
(see section Built-in Variables). The expression
$NF
is not a special feature: it is the direct consequence of
evaluating NF
and using its value as a field number.
You can change the contents of a field as seen by awk
within an
awk
program; this changes what awk
perceives as the
current input record. (The actual input is untouched: awk
never
modifies the input file.)
Consider this example:
awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped
The `-' sign represents subtraction, so this program reassigns
field three, $3
, to be the value of field two minus ten,
$2 - 10
. (See section Arithmetic Operators.)
Then field two, and the new value for field three, are printed.
In order for this to work, the text in field $2
must make sense
as a number; the string of characters must be converted to a number in
order for the computer to do arithmetic on it. The number resulting
from the subtraction is converted back to a string of characters which
then becomes field three.
See section Conversion of Strings and Numbers.
When you change the value of a field (as perceived by awk
), the
text of the input record is recalculated to contain the new field where
the old one was. Therefore, $0
changes to reflect the altered
field. Thus,
awk '{ $2 = $2 - 10; print $0 }' inventory-shipped
prints a copy of the input file, with 10 subtracted from the second field of each line.
You can also assign contents to fields that are out of range. For example:
awk '{ $6 = ($5 + $4 + $3 + $2) ; print $6 }' inventory-shipped
We've just created $6
, whose value is the sum of fields
$2
, $3
, $4
, and $5
. The `+' sign
represents addition. For the file `inventory-shipped', $6
represents the total number of parcels shipped for a particular month.
Creating a new field changes the internal awk
copy of the current
input record--the value of $0
. Thus, if you do `print $0'
after adding a field, the record printed includes the new field, with
the appropriate number of field separators between it and the previously
existing fields.
This recomputation affects and is affected by several features not yet
discussed, in particular, the output field separator, OFS
,
which is used to separate the fields (see section Output Separators), and
NF
(the number of fields; see section Examining Fields).
For example, the value of NF
is set to the number of the highest
field you create.
Note, however, that merely referencing an out-of-range field
does not change the value of either $0
or NF
.
Referencing an out-of-range field merely produces a null string. For
example:
if ($(NF+1) != "") print "can't happen" else print "everything is normal"
should print `everything is normal', because NF+1
is certain
to be out of range. (See section The if
Statement,
for more information about awk
's if-else
statements.)
It is important to note that assigning to a field will change the
value of $0
, but will not change the value of NF
,
even when you assign the null string to a field. For example:
echo a b c d | awk '{ OFS = ":"; $2 = "" ; print ; print NF }'
prints
a::c:d 4
The field is still there, it just has an empty value. You can tell because there are two colons in a row.
(This section is rather long; it describes one of the most fundamental
operations in awk
. If you are a novice with awk
, we
recommend that you re-read this section after you have studied the
section on regular expressions, section Regular Expressions as Patterns.)
The way awk
splits an input record into fields is controlled by
the field separator, which is a single character or a regular
expression. awk
scans the input record for matches for the
separator; the fields themselves are the text between the matches. For
example, if the field separator is `oo', then the following line:
moo goo gai pan
would be split into three fields: `m', ` g' and ` gai pan'.
The field separator is represented by the built-in variable FS
.
Shell programmers take note! awk
does not use the name IFS
which is used by the shell.
You can change the value of FS
in the awk
program with the
assignment operator, `=' (see section Assignment Expressions).
Often the right time to do this is at the beginning of execution,
before any input has been processed, so that the very first record
will be read with the proper separator. To do this, use the special
BEGIN
pattern
(see section BEGIN
and END
Special Patterns).
For example, here we set the value of FS
to the string
","
:
awk 'BEGIN { FS = "," } ; { print $2 }'
Given the input line,
John Q. Smith, 29 Oak St., Walamazoo, MI 42139
this awk
program extracts the string ` 29 Oak St.'.
Sometimes your input data will contain separator characters that don't separate fields the way you thought they would. For instance, the person's name in the example we've been using might have a title or suffix attached, such as `John Q. Smith, LXIX'. From input containing such a name:
John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
the previous sample program would extract ` LXIX', instead of ` 29 Oak St.'. If you were expecting the program to print the address, you would be surprised. So choose your data layout and separator characters carefully to prevent such problems.
As you know, by default, fields are separated by whitespace sequences
(spaces and tabs), not by single spaces: two spaces in a row do not
delimit an empty field. The default value of the field separator is a
string " "
containing a single space. If this value were
interpreted in the usual way, each space character would separate
fields, so two spaces in a row would make an empty field between them.
The reason this does not happen is that a single space as the value of
FS
is a special case: it is taken to specify the default manner
of delimiting fields.
If FS
is any other single character, such as ","
, then
each occurrence of that character separates two fields. Two consecutive
occurrences delimit an empty field. If the character occurs at the
beginning or the end of the line, that too delimits an empty field. The
space character is the only single character which does not follow these
rules.
More generally, the value of FS
may be a string containing any
regular expression. Then each match in the record for the regular
expression separates fields. For example, the assignment:
FS = ", \t"
makes every area of an input line that consists of a comma followed by a space and a tab, into a field separator. (`\t' stands for a tab.)
For a less trivial example of a regular expression, suppose you want
single spaces to separate fields the way single commas were used above.
You can set FS
to "[ ]"
. This regular expression
matches a single space and nothing else.
FS
can be set on the command line. You use the `-F' argument to
do so. For example:
awk -F, 'program' input-files
sets FS
to be the `,' character. Notice that the argument uses
a capital `F'. Contrast this with `-f', which specifies a file
containing an awk
program. Case is significant in command options:
the `-F' and `-f' options have nothing to do with each other.
You can use both options at the same time to set the FS
argument
and get an awk
program from a file.
The value used for the argument to `-F' is processed in exactly the
same way as assignments to the built-in variable FS
. This means that
if the field separator contains special characters, they must be escaped
appropriately. For example, to use a `\' as the field separator, you
would have to type:
# same as FS = "\\" awk -F\\\\ '...' files ...
Since `\' is used for quoting in the shell, awk
will see
`-F\\'. Then awk
processes the `\\' for escape
characters (see section Constant Expressions), finally yielding
a single `\' to be used for the field separator.
As a special case, in compatibility mode
(see section Invoking awk
), if the
argument to `-F' is `t', then FS
is set to the tab
character. (This is because if you type `-F\t', without the quotes,
at the shell, the `\' gets deleted, so awk
figures that you
really want your fields to be separated with tabs, and not `t's.
Use `-v FS="t"' on the command line if you really do want to separate
your fields with `t's.)
For example, let's use an awk
program file called `baud.awk'
that contains the pattern /300/
, and the action `print $1'.
Here is the program:
/300/ { print $1 }
Let's also set FS
to be the `-' character, and run the
program on the file `BBS-list'. The following command prints a
list of the names of the bulletin boards that operate at 300 baud and
the first three digits of their phone numbers:
awk -F- -f baud.awk BBS-list
It produces this output:
aardvark 555 alpo barfly 555 bites 555 camelot 555 core 555 fooey 555 foot 555 macfoo 555 sdace 555 sabafoo 555
Note the second line of output. If you check the original file, you will see that the second line looked like this:
alpo-net 555-3412 2400/1200/300 A
The `-' as part of the system's name was used as the field separator, instead of the `-' in the phone number that was originally intended. This demonstrates why you have to be careful in choosing your field and record separators.
The following program searches the system password file, and prints the entries for users who have no password:
awk -F: '$2 == ""' /etc/passwd
Here we use the `-F' option on the command line to set the field separator. Note that fields in `/etc/passwd' are separated by colons. The second field represents a user's encrypted password, but if the field is empty, that user has no password.
According to the POSIX standard, awk
is supposed to behave
as if each record is split into fields at the time that it is read.
In particular, this means that you can change the value of FS
after a record is read, but before any of the fields are referenced.
The value of the fields (i.e. how they were split) should reflect the
old value of FS
, not the new one.
However, many implementations of awk
do not do this. Instead,
they defer splitting the fields until a field reference actually happens,
using the current value of FS
! This behavior can be difficult
to diagnose. The following example illustrates the results of the two methods.
(The sed
command prints just the first line of `/etc/passwd'.)
sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'
will usually print
root
on an incorrect implementation of awk
, while gawk
will print something like
root:nSijPlPhZZwgE:0:0:Root:/:
There is an important difference between the two cases of `FS = " "'
(a single blank) and `FS = "[ \t]+"' (which is a regular expression
matching one or more blanks or tabs). For both values of FS
, fields
are separated by runs of blanks and/or tabs. However, when the value of
FS
is " "
, awk
will strip leading and trailing whitespace
from the record, and then decide where the fields are.
For example, the following expression prints `b':
echo ' a b c d ' | awk '{ print $2 }'
However, the following prints `a':
echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" } ; { print $2 }'
In this case, the first field is null.
The stripping of leading and trailing whitespace also comes into
play whenever $0
is recomputed. For instance, this pipeline
echo ' a b c d' | awk '{ print; $2 = $2; print }'
produces this output:
a b c d a b c d
The first print
statement prints the record as it was read,
with leading whitespace intact. The assignment to $2
rebuilds
$0
by concatenating $1
through $NF
together,
separated by the value of OFS
. Since the leading whitespace
was ignored when finding $1
, it is not part of the new $0
.
Finally, the last print
statement prints the new $0
.
The following table summarizes how fields are split, based on the
value of FS
.
FS == " "
FS == any single character
FS == regexp
(This section discusses an advanced, experimental feature. If you are
a novice awk
user, you may wish to skip it on the first reading.)
gawk
2.13 introduced a new facility for dealing with fixed-width fields
with no distinctive field separator. Data of this nature arises typically
in one of at least two ways: the input for old FORTRAN programs where
numbers are run together, and the output of programs that did not anticipate
the use of their output as input for other programs.
An example of the latter is a table where all the columns are lined up by
the use of a variable number of spaces and empty fields are just
spaces. Clearly, awk
's normal field splitting based on FS
will not work well in this case. (Although a portable awk
program
can use a series of substr
calls on $0
, this is awkward and
inefficient for a large number of fields.)
The splitting of an input record into fixed-width fields is specified by
assigning a string containing space-separated numbers to the built-in
variable FIELDWIDTHS
. Each number specifies the width of the field
including columns between fields. If you want to ignore the columns
between fields, you can specify the width as a separate field that is
subsequently ignored.
The following data is the output of the w
utility. It is useful
to illustrate the use of FIELDWIDTHS
.
10:06pm up 21 days, 14:04, 23 users User tty login idle JCPU PCPU what hzuo ttyV0 8:58pm 9 5 vi p24.tex hzang ttyV3 6:37pm 50 -csh eklye ttyV5 9:53pm 7 1 em thes.tex dportein ttyV6 8:17pm 1:47 -csh gierd ttyD3 10:00pm 1 elm dave ttyD4 9:47pm 4 4 w brent ttyp0 26Jun91 4:46 26:46 4:41 bash dave ttyq4 26Jun9115days 46 46 wnewmail
The following program takes the above input, converts the idle time to
number of seconds and prints out the first two fields and the calculated
idle time. (This program uses a number of awk
features that
haven't been introduced yet.)
BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" } NR > 2 { idle = $4 sub(/^ */, "", idle) # strip leading spaces if (idle == "") idle = 0 if (idle ~ /:/) { split(idle, t, ":"); idle = t[1] * 60 + t[2] } if (idle ~ /days/) { idle *= 24 * 60 * 60 } print $1, $2, idle }
Here is the result of running the program on the data:
hzuo ttyV0 0 hzang ttyV3 50 eklye ttyV5 0 dportein ttyV6 107 gierd ttyD3 1 dave ttyD4 0 brent ttyp0 286 dave ttyq4 1296000
Another (possibly more practical) example of fixed-width input data
would be the input from a deck of balloting cards. In some parts of
the United States, voters make their choices by punching holes in computer
cards. These cards are then processed to count the votes for any particular
candidate or on any particular issue. Since a voter may choose not to
vote on some issue, any column on the card may be empty. An awk
program for processing such data could use the FIELDWIDTHS
feature
to simplify reading the data.
This feature is still experimental, and will likely evolve over time.
In some data bases, a single line cannot conveniently hold all the information in one entry. In such cases, you can use multi-line records.
The first step in doing this is to choose your data format: when records are not defined as single lines, how do you want to define them? What should separate records?
One technique is to use an unusual character or string to separate
records. For example, you could use the formfeed character (written
\f
in awk
, as in C) to separate them, making each record
a page of the file. To do this, just set the variable RS
to
"\f"
(a string containing the formfeed character). Any
other character could equally well be used, as long as it won't be part
of the data in a record.
Another technique is to have blank lines separate records. By a special
dispensation, a null string as the value of RS
indicates that
records are separated by one or more blank lines. If you set RS
to the null string, a record always ends at the first blank line
encountered. And the next record doesn't start until the first nonblank
line that follows--no matter how many blank lines appear in a row, they
are considered one record-separator. (End of file is also considered
a record separator.)
The second step is to separate the fields in the record. One way to do
this is to put each field on a separate line: to do this, just set the
variable FS
to the string "\n"
. (This simple regular
expression matches a single newline.)
Another way to separate fields is to divide each of the lines into fields
in the normal manner. This happens by default as a result of a special
feature: when RS
is set to the null string, the newline character
always acts as a field separator. This is in addition to whatever
field separations result from FS
.
The original motivation for this special exception was probably so that
you get useful behavior in the default case (i.e., FS == " "
).
This feature can be a problem if you really don't want the
newline character to separate fields, since there is no way to
prevent it. However, you can work around this by using the split
function to break up the record manually
(see section Built-in Functions for String Manipulation).
getline
So far we have been getting our input files from awk
's main
input stream--either the standard input (usually your terminal) or the
files specified on the command line. The awk
language has a
special built-in command called getline
that
can be used to read input under your explicit control.
This command is quite complex and should not be used by
beginners. It is covered here because this is the chapter on input.
The examples that follow the explanation of the getline
command
include material that has not been covered yet. Therefore, come back
and study the getline
command after you have reviewed the
rest of this manual and have a good knowledge of how awk
works.
getline
returns 1 if it finds a record, and 0 if the end of the
file is encountered. If there is some error in getting a record, such
as a file that cannot be opened, then getline
returns -1.
In this case, gawk
sets the variable ERRNO
to a string
describing the error that occurred.
In the following examples, command stands for a string value that represents a shell command.
getline
getline
command can be used without arguments to read input
from the current input file. All it does in this case is read the next
input record and split it up into fields. This is useful if you've
finished processing the current record, but you want to do some special
processing right now on the next record. Here's an
example:
awk '{ if (t = index($0, "/*")) { if (t > 1) tmp = substr($0, 1, t - 1) else tmp = "" u = index(substr($0, t + 2), "*/") while (u == 0) { getline t = -1 u = index($0, "*/") } if (u <= length($0) - 2) $0 = tmp substr($0, t + u + 3) else $0 = tmp } print $0 }'
This awk
program deletes all C-style comments, `/* ...
*/', from the input. By replacing the `print $0' with other
statements, you could perform more complicated processing on the
decommented input, like searching for matches of a regular
expression. (This program has a subtle problem--can you spot it?)
This form of the getline
command sets NF
(the number of
fields; see section Examining Fields), NR
(the number of
records read so far; see section How Input is Split into Records),
FNR
(the number of records read from this input file), and the
value of $0
.
Note: the new value of $0
is used in testing
the patterns of any subsequent rules. The original value
of $0
that triggered the rule which executed getline
is lost. By contrast, the next
statement reads a new record
but immediately begins processing it normally, starting with the first
rule in the program. See section The next
Statement.
getline var
getline
reads a record into the variable var.
This is useful when you want your program to read the next record from
the current input file, but you don't want to subject the record to the
normal input processing.
For example, suppose the next line is a comment, or a special string,
and you want to read it, but you must make certain that it won't trigger
any rules. This version of getline
allows you to read that line
and store it in a variable so that the main
read-a-line-and-check-each-rule loop of awk
never sees it.
The following example swaps every two lines of input. For example, given:
wan tew free phore
it outputs:
tew wan phore free
Here's the program:
awk '{ if ((getline tmp) > 0) { print tmp print $0 } else print $0 }'
The getline
function used in this way sets only the variables
NR
and FNR
(and of course, var). The record is not
split into fields, so the values of the fields (including $0
) and
the value of NF
do not change.
getline < file
getline
function takes its input from the file
file. Here file is a string-valued expression that
specifies the file name. `< file' is called a redirection
since it directs input to come from a different place.
This form is useful if you want to read your input from a particular file, instead of from the main input stream. For example, the following program reads its input record from the file `foo.input' when it encounters a first field with a value equal to 10 in the current input file.
awk '{ if ($1 == 10) { getline < "foo.input" print } else print }'
Since the main input stream is not used, the values of NR
and
FNR
are not changed. But the record read is split into fields in
the normal manner, so the values of $0
and other fields are
changed. So is the value of NF
.
This does not cause the record to be tested against all the patterns
in the awk
program, in the way that would happen if the record
were read normally by the main processing loop of awk
. However
the new record is tested against any subsequent rules, just as when
getline
is used without a redirection.
getline var < file
getline
function takes its input from the file
file and puts it in the variable var. As above, file
is a string-valued expression that specifies the file from which to read.
In this version of getline
, none of the built-in variables are
changed, and the record is not split into fields. The only variable
changed is var.
For example, the following program copies all the input files to the output, except for records that say `@include filename'. Such a record is replaced by the contents of the file filename.
awk '{ if (NF == 2 && $1 == "@include") { while ((getline line < $2) > 0) print line close($2) } else print }'
Note here how the name of the extra input file is not built into the program; it is taken from the data, from the second field on the `@include' line.
The close
function is called to ensure that if two identical
`@include' lines appear in the input, the entire specified file is
included twice. See section Closing Input Files and Pipes.
One deficiency of this program is that it does not process nested `@include' statements the way a true macro preprocessor would.
command | getline
getline
. A pipe is
simply a way to link the output of one program to the input of another. In
this case, the string command is run as a shell command and its output
is piped into awk
to be used as input. This form of getline
reads one record from the pipe.
For example, the following program copies input to output, except for lines that begin with `@execute', which are replaced by the output produced by running the rest of the line as a shell command:
awk '{ if ($1 == "@execute") { tmp = substr($0, 10) while ((tmp | getline) > 0) print close(tmp) } else print }'
The close
function is called to ensure that if two identical
`@execute' lines appear in the input, the command is run for
each one. See section Closing Input Files and Pipes.
Given the input:
foo bar baz @execute who bletch
the program might produce:
foo bar baz hack ttyv0 Jul 13 14:22 hack ttyp0 Jul 13 14:23 (gnu:0) hack ttyp1 Jul 13 14:23 (gnu:0) hack ttyp2 Jul 13 14:23 (gnu:0) hack ttyp3 Jul 13 14:23 (gnu:0) bletch
Notice that this program ran the command who
and printed the result.
(If you try this program yourself, you will get different results, showing
you who is logged in on your system.)
This variation of getline
splits the record into fields, sets the
value of NF
and recomputes the value of $0
. The values of
NR
and FNR
are not changed.
command | getline var
getline
and into the variable var. For example, the
following program reads the current date and time into the variable
current_time
, using the date
utility, and then
prints it.
awk 'BEGIN { "date" | getline current_time close("date") print "Report printed on " current_time }'
In this version of getline
, none of the built-in variables are
changed, and the record is not split into fields.
If the same file name or the same shell command is used with
getline
more than once during the execution of an awk
program, the file is opened (or the command is executed) only the first time.
At that time, the first record of input is read from that file or command.
The next time the same file or command is used in getline
, another
record is read from it, and so on.
This implies that if you want to start reading the same file again from
the beginning, or if you want to rerun a shell command (rather than
reading more output from the command), you must take special steps.
What you must do is use the close
function, as follows:
close(filename)
or
close(command)
The argument filename or command can be any expression. Its value must exactly equal the string that was used to open the file or start the command--for example, if you open a pipe with this:
"sort -r names" | getline foo
then you must close it with this:
close("sort -r names")
Once this function call is executed, the next getline
from that
file or command will reopen the file or rerun the command.
close
returns a value of zero if the close succeeded.
Otherwise, the value will be non-zero.
In this case, gawk
sets the variable ERRNO
to a string
describing the error that occurred.
Go to the previous, next section.