Go to the previous, next section.
Even if GNU recode
tries hard at keeping the recodings
reversible, you should not develop an unconditional confidence in its
ability to do so. You ought to keep only reasonable expectations
about reverse recodings. In particular, consider:
-s
is specified.
IBM-PC
to Latin-1
. End of lines are represented as
`\r\n' in IBM-PC
and as `\n' in Latin-1
. There
is no way by which a faulty IBM-PC
file containing a `\n'
not preceded by `\r' be translated into a Latin-1
file, and
then back.
LaTeX
charset file, the string `\^\i{}'
could be recoded back and forth through another charset and become
`\^{\i}'. Even if the resulting file is equivalent to the
original one, it is not identical.
Unless option -s
is used, recode
automatically tries to
fill mappings with invented correspondences, often making them fully
reversible. This filling is not made at random. The algorithm tries to
stick to the identity mapping and, when this is not possible, it prefers
generating many small permutation cycles, each involving only a few
codes.
For example, here is how IBM-PC code 186 gets translated to control-U
in Latin-1. Control-U is 21. Code 21 is the IBM-PC section sign,
which is 167 in Latin-1. recode
cannot reciprocate 167 to 21,
because 167 is the masculine ordinal indicator on IBM PC's, which is
186 in Latin-1. Code 186 in IBM PC's has no Latin-1 equivalent; by
assigning back to 21, recode
closes this short permutation loop.
As a consequence of this map filling, recode
may sometimes
produce funny characters. They may look annoying, they are
nevertheless helpful when one changes his/her mind and wants to revert
to the prior recoding. If you cannot stand these, use option -s
,
which asks for a very strict recoding.
This map filling sometimes has another surprising consequence. In some
cases, recode
seems to copy a file without recoding it. But in
fact, it does. As an illuminating example, consider you requested:
recode l1:us < File-Latin1 > File-ASCII cmp File-Latin1 File-ASCII
then cmp
will not report any difference. This is quite normal.
Latin-1 gets correctly recoded to ASCII for charsets commonalities
(which are the first 128 characters, in this case). The remaining last
128 Latin-1 characters have no ASCII correspondent. Instead of loosing
them, recode elects to map them to unspecified characters of ASCII, so
making the recoding reversible. The simplest way of achieving this is
merely to keep those last 128 characters unchanged. The overall effect
is copying the file verbatim.
If you feel this behavior is too generous and if you do not wish to
care about reversibility, simply use option -s
. By doing so,
recode
will strictly map only those Latin-1 characters which have
an ASCII equivalent, and will merely drop those which do not. Then,
there is more chance that you will observe a difference between the
input and the output file.
Go to the previous, next section.