Go to the previous, next section.

Some other charsets

Even if these charsets were originally added to recode for handling texts written in French, they find other uses. We did use them lot for writing French diacriticized texts in the past, so recode knows how to handle these particularly well for French texts.

ASCII with LaTeX codes

This charset is available in recode under the name LaTeX and has ltex as an alias. It is used for ASCII files coded to be read by LaTeX or, in certain cases, by TeX.

Whenever you recode from another charset to LaTeX, beware that all occurrences of backslashes \ are usually translated into the string `\backslash{}'. However, in practice, people often use backslashes in the other charset for introducing TeX commands, compromising it: it is not pure TeX, nor it is pure other charset. This translation of backslashes into `\backslash{}' can be rather inconvenient, it may be inhibited through the command option -d.

ASCII with easy French conventions

This charset is available in recode under the name Texte and has txte for an alias.

This charset is a seven bits code, identical to ASCII-BS, save for French diacritics which are noted using a slightly different convention.

At text entry time, these conventions provide a little speed up. At read time, they slightly improve the readability over a few alternate ways of coding diacritics. Of course, it would better to have a specialized keyboard to make direct eight bits entries and fonts for immediately displaying eight bit ISO Latin-1 characters. But not everybody is so fortunate. In several mailing environments, the eight bit is often willingfully destroyed.

Easy French has been in use in France for a while. I only slightly adapted it (the diaeresis option) to make it more comfortable to several usages in Qu'ebec originating from Universit'e de Montr'eal. In fact, the main problem for me was not to necessarily to invent Easy French, but to recognize the "best" convention to use, (best is not being defined, here) and to try to solve the main pitfalls associated with the selected convention.

Diacritics

French quotes (sometimes called "angle quotes") are noted the same way English quotes are noted in TeX, id est by " and ".

No effort has been put to preserve Latin ligatures (ae, oe) which are representable in several other charsets. So, these ligatures may be lost through Easy French conventions.

This is almost the French convention for simplified diacritics entry:

e': Acute accent
e`: Grave accent
e^: Circumflex accent
e": Diaeresis
c,: Cedilla

In some countries, : is used instead of " to mark diaeresis. recode support one convention on a single call, depending on the -c option of the recode command.

The convention is prone to loosing information, because the diacritic meaning overloads some characters that already have other uses. To alleviate this, some knowledge of the French language is boosted into the recognition routines. So, the following subtleties are systematically obeyed by the various recognizers.

A single quote which follows a e does not necessarily means an acute accent if it is followed by a single other one. For example:

e'
will give an e with an acute accent.
e"
will give a simple e, with a closing quotation mark.
e"'
will give an e with an acute accent, followed by a closing quotation mark.

There is a problem induced by this convention if there are English quotations with a French text. In sentences like:
```
There's a meeting at Archie's restaurant.
```
the single quotes will be mistaken twice for acute accents. So English contractions and suffix possessives could be mangled.
A double quote or colon, depending on -c option, which follows a vowel is interpreted as diaeresis only if it is followed by another letter. But there are in French several words that end with a diaeresis, the program also recognizes them. See section List of words ending with diaeresis, for a study of all the problematic cases.
A comma which follows a c is interpreted as a cedilla only if it is followed by one of the vowels a, o and u.

List of words ending with diaeresis

Here is a classification of all cases of a diaeresis at the end of a French word:

Words ending in "igue"
- Feminine words without a relative masculine: `besaigue"' and `cigue"'.
- Feminine words with a relative masculine (1): `aigue"', `ambigue"', `contigue"', `exigue"', `subaigue"' and `suraigue"'.
Words not ending in "igue"
- Ended by "i" (2): `ai"', `congai"', `goi"', `hai"kai"', `inoui"', `sai"', `samurai"', `thai"' and `tokai"'.
- Ended by "e": `canoe"'.
- Ended by "u" (3): `Esau"'.

Notes:

There are supposed to be seven words in this case. So, one is missing.
Look at one of the following sentences (the second has to be interpreted with the -c option):
```
"Ai"e!  Voici le proble`me que j'ai"
Ai:e!  Voici le proble`me que j'ai:
```
There is an ambiguity between an `ai"', the small animal, and the indicative future of avoir (first person singular), when followed by what could be a diaeresis mark. Hopefully, the case is solved by the fact that an apostrophe always precedes the verb and almost never the animal.
I did not pay attention to proper nouns, but this one showed up as being fairly evident.

Just to complete this topic, note that it would be wrong to make a rule for all words ending in "igue" as needing a diaerisis. Here are counter-examples: `becfigue', `be`sigue', `bigue', `bordigue', `bourdigue', `brigue', `contre-digue', `digue', `d'intrigue', `fatigue', `figue', `garrigue', `gigue', `igue', `intrigue', `ligue', `prodigue', `sarigue' and `zigue'.

World Wide Web representations

This charset is available in recode under the name HTML and has w3 and WWW for aliases.

HTML texts used by World Wide Web limit themselves to 7-bit characters internally, special sequences beginning with an ampersand & and ending with a semicolon ; are used for representing characters from Latin-1 having the 8th bit set.

When you recode from another charset to HTML, beware that all occurrences of ampersands are usually translated into the string `&', similarly, left angle brackets < are translated into `<' and right angle brackets > are translated into `>'. However, in practice, people often use ampersands and angle brackets in the other charset for introducing HTML commands, compromising it: it is not pure HTML, not it is pure other charset. These three translations can be rather inconvenient, they may be specifically inhibited through the command option -d.

Go to the previous, next section.