Go to the previous, next section.

Charsets based on ASCII

Usual ASCII

This charset is available in recode under the name ASCII. In fact, it's true name is ANSI_X3.4-1968 as per RFC 1345, accepted aliases being ANSI_X3.4-1986, ASCII, IBM367, ISO646-US, ISO_646.irv:1991, US-ASCII, cp367, iso-ir-6 and us. The shortest way of specifying it in recode is us.

This documentation used to include ASCII tables. They have been removed since recode can now recreate these (and a lot of others) easily:

recode -lf us                   for commented ASCII
recode -ld us                   for concise decimal table
recode -lo us                   for concise octal table
recode -lh us                   for concise hexadecimal table

ASCII extended by Latin Alphabets

This charset is available in recode under the name Latin-1. In fact, it's true name is ISO_8859-1:1987 as per RFC 1345, accepted aliases being CP819, IBM819, ISO-8859-1, ISO_8859-1, iso-ir-100, l1 and Latin-1. The shortest way of specifying it in recode is l1.

This charset corresponds to the ISO Latin Alphabet 1. It is an eight-bit code which coincides with ASCII for the lower half.

This documentation used to include Latin-1 tables. They have been removed since recode can now recreate these (and a lot of others) easily:

recode -lf l1                   for commented ISO Latin-1
recode -ld l1                   for concise decimal table
recode -lo l1                   for concise octal table
recode -lh l1                   for concise hexadecimal table

The following from `lasko@video.dec.com' (Tim Lasko), with no date.

ISO Latin-1, or more completely ISO Latin Alphabet No 1, is now an international standard as of February 1987 (IS 8859, Part 1). For those American USEnet'rs that care, the 8-bit ASCII standard, which is essentially the same code, is going through the final administrative processes prior to publication.

ISO Latin-1 (IS 8859/1) is actually one of an entire family of eight-bit one-byte character sets, all having ASCII on the left hand side, and with varying repertoires on the right hand side:

  1. Latin Alphabet No 1 (caters to Western Europe - now approved).
  2. Latin Alphabet No 2 (caters to Eastern Europe - now approved).
  3. Latin Alphabet No 3 (caters to SE Europe + others - in draft ballot).
  4. Latin Alphabet No 4 (caters to Northern Europe - in draft ballot).
  5. Latin-Cyrillic alphabet (right half all Cyrillic - processing currently suspended pending USSR input).
  6. Latin-Arabic alphabet (right half all Arabic - now approved).
  7. Latin-Greek alphabet (right half Greek + symbols - in draft ballot).
  8. Latin-Hebrew alphabet (right half Hebrew + symbols - proposed).

ASCII 7-bits, BS to overstrike

This charset is available in recode under the name ASCII-BS, with BS as an acceptable alias.

The file is straight ASCII, seven bits only. According to the definition of ASCII: diacritics are applied by a sequence of three characters: the letter, one BS, the diacritic mark. We deviate slightly from this by exchanging the diacritic mark and the letter so, on a screen device, the diacritic will disappear and let the letter alone. At recognition time, both methods are acceptable.

The French quotes are coded by the sequences: < BS " or " BS < for the opening quote and > BS " or " BS > for the closing quote. This artificial convention was inherited in straight ASCII-BS from habits around Bang-Bang entry, and is not well known. But we decided to stick to it so that ASCII-BS charset will not loose French quotes.

The ASCII-BS charset is independent of ASCII, and different. The following examples demonstrate this, knowing at advance that `!2' is the Bang-Bang way of representing an e with an acute accent. Compare:

% echo \!2 | recode -v bang:us | od -bc
Bang-Bang -> ISO_8859-1:1987 -> RFC 1345 -> ANSI_X3.4-1968 (many to one)
Simplified to: Bang-Bang -> ISO_8859-1:1987 -> ANSI_X3.4-1968 (many to one)
0000000 351 012
        351  \n
0000002

with:

% echo \!2 | recode -v bang:bs | od -bc
Bang-Bang -> ISO_8859-1:1987 -> ASCII-BS (many to many)
0000000 047 010 145 012
          '  \b   e  \n
0000004

In the first case, the e with an acute accent is merely transmitted by the Latin-1:ASCII mapping, not having a special recoding rule for it. In the Latin-1:ASCII-BS case, the acute accent is applied over the e with a backspace: diacriticized characters have special rules. For the ASCII-BS charset, reversibility is still possible, but there might be difficult cases.

ASCII without diacritics nor underline

This charset is available in recode under the name flat.

This code is ASCII expunged of all diacritics and underlines, as long as they are applied using three character sequences, with BS in the middle. Also, despite slightly unrelated, each control character is represented by a sequence of two or three graphic characters. The newline character, however, keeps its functionality and is not represented.

Note that charset flat is a terminal charset. We can convert to flat, but not from it.

Go to the previous, next section.