Go to the previous, next section.

Reversibility issues

Even if GNU recode tries hard at keeping the recodings reversible, you should not develop an unconditional confidence in its ability to do so. You ought to keep only reasonable expectations about reverse recodings. In particular, consider:

Unless option -s is used, recode automatically tries to fill mappings with invented correspondences, often making them fully reversible. This filling is not made at random. The algorithm tries to stick to the identity mapping and, when this is not possible, it prefers generating many small permutation cycles, each involving only a few codes.

For example, here is how IBM-PC code 186 gets translated to control-U in Latin-1. Control-U is 21. Code 21 is the IBM-PC section sign, which is 167 in Latin-1. recode cannot reciprocate 167 to 21, because 167 is the masculine ordinal indicator on IBM PC's, which is 186 in Latin-1. Code 186 in IBM PC's has no Latin-1 equivalent; by assigning back to 21, recode closes this short permutation loop.

As a consequence of this map filling, recode may sometimes produce funny characters. They may look annoying, they are nevertheless helpful when one changes his/her mind and wants to revert to the prior recoding. If you cannot stand these, use option -s, which asks for a very strict recoding.

This map filling sometimes has another surprising consequence. In some cases, recode seems to copy a file without recoding it. But in fact, it does. As an illuminating example, consider you requested:

recode l1:us < File-Latin1 > File-ASCII
cmp File-Latin1 File-ASCII

then cmp will not report any difference. This is quite normal. Latin-1 gets correctly recoded to ASCII for charsets commonalities (which are the first 128 characters, in this case). The remaining last 128 Latin-1 characters have no ASCII correspondent. Instead of loosing them, recode elects to map them to unspecified characters of ASCII, so making the recoding reversible. The simplest way of achieving this is merely to keep those last 128 characters unchanged. The overall effect is copying the file verbatim.

If you feel this behavior is too generous and if you do not wish to care about reversibility, simply use option -s. By doing so, recode will strictly map only those Latin-1 characters which have an ASCII equivalent, and will merely drop those which do not. Then, there is more chance that you will observe a difference between the input and the output file.

Go to the previous, next section.