Go to the previous, next section.
Even if these charsets were originally added to recode
for
handling texts written in French, they find other uses. We did use them
lot for writing French diacriticized texts in the past, so recode
knows how to handle these particularly well for French texts.
This charset is available in recode
under the name LaTeX
and has ltex
as an alias. It is used for ASCII files coded to be
read by LaTeX or, in certain cases, by TeX.
Whenever you recode from another charset to LaTeX
, beware that
all occurrences of backslashes \ are usually translated into
the string `\backslash{}'. However, in practice, people often
use backslashes in the other charset for introducing TeX commands,
compromising it: it is not pure TeX, nor it is pure other charset.
This translation of backslashes into `\backslash{}' can be rather
inconvenient, it may be inhibited through the command option -d
.
This charset is available in recode
under the name Texte
and has txte
for an alias.
This charset is a seven bits code, identical to ASCII-BS
, save
for French diacritics which are noted using a slightly different
convention.
At text entry time, these conventions provide a little speed up. At read time, they slightly improve the readability over a few alternate ways of coding diacritics. Of course, it would better to have a specialized keyboard to make direct eight bits entries and fonts for immediately displaying eight bit ISO Latin-1 characters. But not everybody is so fortunate. In several mailing environments, the eight bit is often willingfully destroyed.
Easy French has been in use in France for a while. I only slightly adapted it (the diaeresis option) to make it more comfortable to several usages in Qu'ebec originating from Universit'e de Montr'eal. In fact, the main problem for me was not to necessarily to invent Easy French, but to recognize the "best" convention to use, (best is not being defined, here) and to try to solve the main pitfalls associated with the selected convention.
French quotes (sometimes called "angle quotes") are noted the same way English quotes are noted in TeX, id est by " and ".
No effort has been put to preserve Latin ligatures (ae, oe) which are representable in several other charsets. So, these ligatures may be lost through Easy French conventions.
This is almost the French convention for simplified diacritics entry:
In some countries, : is used instead of " to mark diaeresis.
recode
support one convention on a single call, depending on the
-c
option of the recode
command.
The convention is prone to loosing information, because the diacritic meaning overloads some characters that already have other uses. To alleviate this, some knowledge of the French language is boosted into the recognition routines. So, the following subtleties are systematically obeyed by the various recognizers.
There is a problem induced by this convention if there are English quotations with a French text. In sentences like:
There's a meeting at Archie's restaurant.
the single quotes will be mistaken twice for acute accents. So English contractions and suffix possessives could be mangled.
-c
option, which follows a
vowel is interpreted as diaeresis only if it is followed by another
letter. But there are in French several words that end with a
diaeresis, the program also recognizes them. See section List of words ending with diaeresis,
for a study of all the problematic cases.
Here is a classification of all cases of a diaeresis at the end of a French word:
Notes:
-c
option):
"Ai"e! Voici le proble`me que j'ai" Ai:e! Voici le proble`me que j'ai:
There is an ambiguity between an `ai"', the small animal, and the indicative future of avoir (first person singular), when followed by what could be a diaeresis mark. Hopefully, the case is solved by the fact that an apostrophe always precedes the verb and almost never the animal.
Just to complete this topic, note that it would be wrong to make a rule for all words ending in "igue" as needing a diaerisis. Here are counter-examples: `becfigue', `be`sigue', `bigue', `bordigue', `bourdigue', `brigue', `contre-digue', `digue', `d'intrigue', `fatigue', `figue', `garrigue', `gigue', `igue', `intrigue', `ligue', `prodigue', `sarigue' and `zigue'.
This charset is available in recode
under the name HTML
and has w3
and WWW
for aliases.
HTML texts used by World Wide Web limit themselves to 7-bit characters internally, special sequences beginning with an ampersand & and ending with a semicolon ; are used for representing characters from Latin-1 having the 8th bit set.
When you recode from another charset to HTML
, beware that all
occurrences of ampersands are usually translated into the string
`&', similarly, left angle brackets < are translated
into `<' and right angle brackets > are translated into
`>'. However, in practice, people often use ampersands and
angle brackets in the other charset for introducing HTML commands,
compromising it: it is not pure HTML, not it is pure other charset.
These three translations can be rather inconvenient, they may be
specifically inhibited through the command option -d
.
Go to the previous, next section.