Go to the previous, next section.

How to use this program

The general format of the program call is one of:

recode [option]... [charset]
recode [option]... [before]:[after] [file]...

The second form is the common case. Each file will be read assuming it is coded with charset before, it will be recoded over itself so to use the charset after. If there is no such file, the program rather acts as a filter and recode standard input to standard output.

The available options are:

-C
--copyright
Given this option, all other parameters and options are ignored. The program prints briefly the Copyright and copying conditions. See the file `COPYING' in the distribution for full statement of the Copyright and copying conditions.

-a
--auto-check
In this special mode, recode diagnostics itself by analyzing connectivity of the various charsets and reporting on standard output. No file will be recoded.

There might be one non-option argument, in which case it is interpreted as a charset name, possibly abbreviated to any non ambiguous prefix. recode will then study all recodings having the given charset as a starting or ending point. If there is no such non-option argument, recode will study all possible recodings.

For each possible pair of different charsets, it prints on standard output how many single steps are needed for achieving the recoding and how many can be saved by step merging. If a recoding cannot be done, the word `UNACHIEVABLE' is printed instead. However, this special line is completely suppressed if option -x specified some charset to ignore.

The option -hname affects the resulting output, because there are more merging rules when this option is in effect. Other options affect the result: -d, -g and, notably, -s.

There was a time, in GNU recode development, when this option was reasonably interesting. With the greater number of handled charsets, it became inordinately slow, taking on the order of one hour of wall clock time, while generating a great deal of output. This option is not practical anymore when used without a charset parameter. However, it can be made slightly more usable, together with option -x., which effectively disables most RFC 1345 charsets from the report.

-c
--colons
With Texte Easy French conventions, use the column : instead of the double-quote " for marking diaeresis. See section ASCII with easy French conventions.

-d
--diacritics
While converting to or from one of HTML or LaTeX charset, limit conversion to diacritics only. This is particularly useful when people write what would be valid HTML, TeX or LaTeX files, if only they were using provided sequences for applying diacritics instead of using the diacriticized characters directly from the underlying character set.

While converting to HTML or LaTeX charset, this option assumes that non-diacriticized special characters are properly coded or protected, so recode will transmit them literally. While converting the other way, this option prevents all attempts at recognizing coded or protected versions of non-diacriticized special characters of the other charset. See section World Wide Web representations. See section ASCII with LaTeX codes.

-f
--force
It is planned that some future version of recode will protect you against recoding a file irreversibly over itself. However, please keep vividly in mind that this protection is not yet active in recode. When the protection will be enforced, option `-f' will become mandatory for a file to be replaced by some recoding of its contents, if such conversion is loosing information. For now, recode acts as if option `-f' was always selected.

In preparation for the time this option will become mandatory, you may start using `-f' right away in scripts calling recode, when you know this is the reasonnable thing to do.

-g
--graphics
This option is only meaningful while getting out of the IBM-PC charset. In this charset, characters 176 to 223 are used for constructing rulers and boxes, using simple or double horizontal or vertical lines. This option forces the automatic selection of ASCII characters for approximating these rulers and boxes, at cost of making the transformation irreversible. Option -g implies -f.

-h[name]
--header[=name]
Instead of recoding files, recode writes a C source file on standard output and exits. This source is meant to be included in a regular C program: its purpose is to declare and initialize an array, named name, which represents the requested recoding. If name is not specified, then it defaults to before_to_after, where before is the starting charset and after is the goal charset.

Even if recode tries its best, this option does not always succeed in producing the requested C table. It will however, provided the recoding can be internally represented by only one step after the optimization phase, and if this merged step conveys a one-to-one or a one-to-many explicit table. But this is all fairly technical. Better try and see!

Beware that other options might affect the produced C tables, these are: -d, -g and, particularly, -s.

-i
--sequence=files
When the recoding requires a combination of two or more elementary recoding steps, this option forces many passes over the data, using intermediate files between passes. This is the default behavior when files are recoded over themselves. If this option is selected in filter mode, that is, when the program reads standard input and writes standard output, it might take longer for programs further down the pipe chain to start receiving some recoded data.

-l[format]
--list[=format]
This option asks for information about all charsets, or about one particular charset. No file will be recoded.

If there is no non-option arguments, recode ignores the format value of the option, it writes a sorted list of charset names on standard output, one per line. When a charset name have aliases or synonyms, they follow the true charset name on its line, presented in lexicographical order from left to right. This list is over one hundred lines. It is best used with grep, as in:

recode -l | grep greek

There might be one non-option argument, in which case it is interpreted as a charset name, possibly abbreviated to any non ambiguous prefix. This particular usage of the -l option is obeyed only for charsets having an RFC 1345 style internal description. Even if most charsets have this property, some do not, then option -l cannot be used to detail these particular charsets. For knowing if a particular charset can be listed this way, you should merely try and see if this works. The format value of the option is a keyword from the following list. Keywords may be abbreviated by dropping suffix letters, and even reduced to the first letter only:

decimal
This format asks for the production on standard output of a concise tabular display of the charset, in which character code values are expressed in decimal.

octal
This format uses octal instead of decimal in the concise tabular display of the charset.

hexadecimal
This format uses hexadecimal instead of decimal in the concise tabular display of the charset.

full
This format requests an extensive display of the charset on standard output, using one line per character showing its decimal, hexadecimal and octal code values, and also a descriptive comment which is indeed the 10646 character name.

When option -l is used together with a charset argument, the format defaults to decimal.

  • -o
  • --sequence=popen When the recoding requires a combination of two or more elementary recoding steps, this option forces the creation of a chain of program instances initiated through the popen(3) library call, all operating in parallel. In filter mode, costing the overhead of multiple program initializations, recoded data will be available soon after the program starts, even if many elementary recoding steps are required.

    If, at installation time, the popen(3) call is said to be unavailable, selecting option -o is equivalent to selecting option -i.

  • -p
  • --sequence=pipe When the recoding requires a combination of two or more elementary recoding steps, this option forces the program to fork itself into a few copies interconnected with pipes, using the pipe(2) system call. All copies of the program operate in parallel. This method is similar to the method used through option -o, but is more efficient because the program initializes only once. This is the default behavior in filter mode. If this option is used when files are recoded over themselves, this should also save disk space because some temporary files might not be needed, at cost of more system overhead.

    If, at installation time, the pipe(2) call is said to be unavailable, selecting option -p is equivalent to selecting option -o. If both pipe(2) and popen(3) are unavailable, selecting option -p is equivalent to selecting option -i.

  • -q
  • --quiet
  • --silent This option has the sole purpose of inhibiting diagnostic messages about irreversible recodings.

    This option is set automatically for the children processes, when recode splits itself in many collaborating copies. Doing so, the diagnostic is issued only once by the parent. See options -o and -p.

  • -s
  • --strict By using this option, the user requests that recode be very strict while recoding a file, merely loosing in the transformation any character which is not explicitly mapped from a charset to another. This option renders the recoding less likely reversible, so it also implies option -f. Also See section Reversibility issues.

  • -t
  • --touch The touch option is meaningful only when files are recoded over themselves. Without it, the time-stamps associated with files are preserved, to reflect the fact that changing the code of a file does not really alter its informational contents. When the user wants the recoded files to be time-stamped at the recoding time, this option inhibits the automatic protection of the time-stamps.

  • -v
  • --verbose Before doing any recoding, the program will first print on `stderr' the list of all intermediate charsets planned for recoding, starting with the before charset and ending with the after charset. It also prints an indication of the recoding quality, as one of the word `reversible', `one to one', `one to many', `many to one' or `many to many'.

    This information will appear once or twice. It is shown a second time only when the optimization and step merging phase succeeds in creating a new single step.

    This option also has a second effect. The program will print on `stderr' one message per file recoded, so to let the user informed of the progress of its command.

    An easy way to know beforehand the sequence or quality of a recoding is by using the command such as:

    recode -v before:after < /dev/null
    

    using the fact that, so far in recode, an empty input file produces an empty output file.

  • -x=charset
  • --ignore=charset This option tells the program to ignore any recoding path through the specified charset, so disabling any single step using this charset as a start or end point. This may be used when the user wants to force recode in using an alternate recoding path.

    charset may be abbreviated to any unambiguous prefix. For convenience, the value `.' is an alias for `RFC 1345', so the option -x. effectively disables all RFC 1345 tables at once.

  • --help The program merely prints a page of help on standard output, and exits without doing any recoding.

  • --version The program merely prints its version numbers on standard output, and exits without doing anything else.

  • The before:after argument specifies the start charset and the goal charset. The allowable values for before or after are described in the remainder of this document. Charsets may have predefined alternate names, or aliases, which are equally acceptable.

    In the before:after argument only, a backslash may be used to quote the next character of a charset name. This might be useful for preventing a colon to be mistakenly interpreted as the separator between before and after. Rather, the colon could be omitted, because while recognizing a charset name or alias, GNU recode ignores all characters besides letters and digits. There is also no distinction between upper and lower case. Charset names or aliases may always be abbreviated to any unambiguous prefix.

    One or both of the before or after keywords may be omitted, but the colon which separates them cannot. An omitted keyword implies the usual or default code in usage on the system where this program is installed. Usually, this default code is Latin-1 for UNIX systems or IBM-PC for MS-DOS machines.

    Go to the previous, next section.