Go to the previous, next section.
The general format of the program call is one of:
recode [option]... [charset] recode [option]... [before]:[after] [file]...
The second form is the common case. Each file will be read assuming it is coded with charset before, it will be recoded over itself so to use the charset after. If there is no such file, the program rather acts as a filter and recode standard input to standard output.
The available options are:
-C
--copyright
-a
--auto-check
recode
diagnostics itself by analyzing
connectivity of the various charsets and reporting on standard output.
No file will be recoded.
There might be one non-option argument, in which case it is interpreted
as a charset name, possibly abbreviated to any non ambiguous prefix.
recode
will then study all recodings having the given charset as
a starting or ending point. If there is no such non-option argument,
recode
will study all possible recodings.
For each possible pair of different charsets, it prints on standard
output how many single steps are needed for achieving the recoding and
how many can be saved by step merging. If a recoding cannot be done,
the word `UNACHIEVABLE' is printed instead. However, this special
line is completely suppressed if option -x
specified some charset
to ignore.
The option -hname
affects the resulting output, because
there are more merging rules when this option is in effect. Other
options affect the result: -d
, -g
and, notably, -s
.
There was a time, in GNU recode
development, when this option was
reasonably interesting. With the greater number of handled charsets,
it became inordinately slow, taking on the order of one hour of wall
clock time, while generating a great deal of output. This option is not
practical anymore when used without a charset parameter. However, it
can be made slightly more usable, together with option -x.
, which
effectively disables most RFC 1345 charsets from the report.
-c
--colons
Texte
Easy French conventions, use the column :
instead of the double-quote " for marking diaeresis.
See section ASCII with easy French conventions.
-d
--diacritics
HTML
or LaTeX
charset,
limit conversion to diacritics only. This is particularly useful when
people write what would be valid HTML
, TeX or LaTeX files,
if only they were using provided sequences for applying diacritics
instead of using the diacriticized characters directly from the
underlying character set.
While converting to HTML
or LaTeX
charset, this option
assumes that non-diacriticized special characters are properly coded or
protected, so recode
will transmit them literally. While
converting the other way, this option prevents all attempts at
recognizing coded or protected versions of non-diacriticized special
characters of the other charset. See section World Wide Web representations. See section ASCII with LaTeX codes.
-f
--force
recode
will protect
you against recoding a file irreversibly over itself. However,
please keep vividly in mind that this protection is not yet active
in recode
. When the protection will be enforced, option
`-f' will become mandatory for a file to be replaced by some
recoding of its contents, if such conversion is loosing information.
For now, recode
acts as if option `-f' was always selected.
In preparation for the time this option will become mandatory, you
may start using `-f' right away in scripts calling recode
,
when you know this is the reasonnable thing to do.
-g
--graphics
IBM-PC
charset. In this charset, characters 176 to 223 are used
for constructing rulers and boxes, using simple or double horizontal or
vertical lines. This option forces the automatic selection of ASCII
characters for approximating these rulers and boxes, at cost of making
the transformation irreversible. Option -g
implies -f
.
-h[name]
--header[=name]
recode
writes a C source file on
standard output and exits. This source is meant to be included in a
regular C program: its purpose is to declare and initialize an array,
named name, which represents the requested recoding. If
name is not specified, then it defaults to
before_to_after
, where before is the starting
charset and after is the goal charset.
Even if recode
tries its best, this option does not always
succeed in producing the requested C table. It will however, provided
the recoding can be internally represented by only one step after the
optimization phase, and if this merged step conveys a one-to-one or a
one-to-many explicit table. But this is all fairly technical. Better
try and see!
Beware that other options might affect the produced C tables, these are:
-d
, -g
and, particularly, -s
.
-i
--sequence=files
-l[format]
--list[=format]
If there is no non-option arguments, recode
ignores the
format value of the option, it writes a sorted list of charset
names on standard output, one per line. When a charset name have
aliases or synonyms, they follow the true charset name on its line,
presented in lexicographical order from left to right. This list is
over one hundred lines. It is best used with grep
, as in:
recode -l | grep greek
There might be one non-option argument, in which case it is interpreted
as a charset name, possibly abbreviated to any non ambiguous prefix.
This particular usage of the -l
option is obeyed only for
charsets having an RFC 1345 style internal description. Even if most
charsets have this property, some do not, then option -l
cannot
be used to detail these particular charsets. For knowing if a
particular charset can be listed this way, you should merely try and see
if this works. The format value of the option is a keyword from
the following list. Keywords may be abbreviated by dropping suffix
letters, and even reduced to the first letter only:
decimal
octal
hexadecimal
full
When option -l
is used together with a charset argument,
the format defaults to decimal
.
popen(3)
library call, all
operating in parallel. In filter mode, costing the overhead of multiple
program initializations, recoded data will be available soon after the
program starts, even if many elementary recoding steps are required.
If, at installation time, the popen(3)
call is said to be
unavailable, selecting option -o
is equivalent to selecting
option -i
.
pipe(2)
system call.
All copies of the program operate in parallel. This method is similar
to the method used through option -o
, but is more efficient
because the program initializes only once. This is the default
behavior in filter mode. If this option is used when files are recoded
over themselves, this should also save disk space because some temporary
files might not be needed, at cost of more system overhead.
If, at installation time, the pipe(2)
call is said to be
unavailable, selecting option -p
is equivalent to selecting
option -o
. If both pipe(2)
and popen(3)
are
unavailable, selecting option -p
is equivalent to selecting
option -i
.
This option is set automatically for the children processes, when
recode splits itself in many collaborating copies. Doing so, the
diagnostic is issued only once by the parent. See options -o
and -p
.
recode
be very
strict while recoding a file, merely loosing in the transformation any
character which is not explicitly mapped from a charset to another.
This option renders the recoding less likely reversible, so it also
implies option -f
. Also See section Reversibility issues.
This information will appear once or twice. It is shown a second time only when the optimization and step merging phase succeeds in creating a new single step.
This option also has a second effect. The program will print on `stderr' one message per file recoded, so to let the user informed of the progress of its command.
An easy way to know beforehand the sequence or quality of a recoding is by using the command such as:
recode -v before:after < /dev/null
using the fact that, so far in recode
, an empty input file
produces an empty output file.
recode
in using an alternate recoding path.
charset may be abbreviated to any unambiguous prefix. For
convenience, the value `.' is an alias for `RFC 1345', so the
option -x.
effectively disables all RFC 1345 tables at
once.
The before:after argument specifies the start charset and the goal charset. The allowable values for before or after are described in the remainder of this document. Charsets may have predefined alternate names, or aliases, which are equally acceptable.
In the before:after argument only, a backslash may be used
to quote the next character of a charset name. This might be useful for
preventing a colon to be mistakenly interpreted as the separator between
before and after. Rather, the colon could be omitted,
because while recognizing a charset name or alias, GNU recode
ignores all characters besides letters and digits. There is also no
distinction between upper and lower case. Charset names or aliases may
always be abbreviated to any unambiguous prefix.
One or both of the before or after keywords may be omitted,
but the colon which separates them cannot. An omitted keyword implies
the usual or default code in usage on the system where this program is
installed. Usually, this default code is Latin-1
for UNIX systems
or IBM-PC
for MS-DOS machines.
Go to the previous, next section.