Go to the previous, next section.

Reversibility issues

Even if GNU recode tries hard at keeping the recodings reversible, you should not develop an unconditional confidence in its ability to do so. You ought to keep only reasonable expectations about reverse recodings. In particular, consider:

Most transformations are fully reversible for all inputs, but loose this property whenever -s is specified.
A few transformations are not meant to be reversible, by design.
Reversibility sometimes depends on actual file contents and cannot be ascertained beforehand, without reading the file.
Reversibility is never absolute across successive versions of this program. Even correcting a small bug in a mapping could induce slight discrepancies later.
Reversibility is easily lost by merging. This is best explained through an example. If you reversibly recode a file from charset `A' to charset `B', then you reversibly recode the result from charset `B' to charset `C', you cannot expect to recover the original file by merely recoding from charset `C' directly to charset `A'. You will instead have to recode from charset `C' back to charset `B', and only then from charset `B' to charset `A'.
Faulty files create a particular problem. Consider an example, recoding from IBM-PC to Latin-1. End of lines are represented as `\r\n' in IBM-PC and as `\n' in Latin-1. There is no way by which a faulty IBM-PC file containing a `\n' not preceded by `\r' be translated into a Latin-1 file, and then back.
There is another difficulty arising from code equivalences. For example, in a LaTeX charset file, the string `\^\i{}' could be recoded back and forth through another charset and become `\^{\i}'. Even if the resulting file is equivalent to the original one, it is not identical.

Unless option -s is used, recode automatically tries to fill mappings with invented correspondences, often making them fully reversible. This filling is not made at random. The algorithm tries to stick to the identity mapping and, when this is not possible, it prefers generating many small permutation cycles, each involving only a few codes.

For example, here is how IBM-PC code 186 gets translated to control-U in Latin-1. Control-U is 21. Code 21 is the IBM-PC section sign, which is 167 in Latin-1. recode cannot reciprocate 167 to 21, because 167 is the masculine ordinal indicator on IBM PC's, which is 186 in Latin-1. Code 186 in IBM PC's has no Latin-1 equivalent; by assigning back to 21, recode closes this short permutation loop.

As a consequence of this map filling, recode may sometimes produce funny characters. They may look annoying, they are nevertheless helpful when one changes his/her mind and wants to revert to the prior recoding. If you cannot stand these, use option -s, which asks for a very strict recoding.

This map filling sometimes has another surprising consequence. In some cases, recode seems to copy a file without recoding it. But in fact, it does. As an illuminating example, consider you requested:

recode l1:us < File-Latin1 > File-ASCII
cmp File-Latin1 File-ASCII

then cmp will not report any difference. This is quite normal. Latin-1 gets correctly recoded to ASCII for charsets commonalities (which are the first 128 characters, in this case). The remaining last 128 Latin-1 characters have no ASCII correspondent. Instead of loosing them, recode elects to map them to unspecified characters of ASCII, so making the recoding reversible. The simplest way of achieving this is merely to keep those last 128 characters unchanged. The overall effect is copying the file verbatim.

If you feel this behavior is too generous and if you do not wish to care about reversibility, simply use option -s. By doing so, recode will strictly map only those Latin-1 characters which have an ASCII equivalent, and will merely drop those which do not. Then, there is more chance that you will observe a difference between the input and the output file.

Go to the previous, next section.