Firefox 29 issued half a year ago, so this post is long overdue. Nevertheless I wanted to pause for a second to discuss the Internationalization API first shipped on desktop in that release, implemented by Norbert Lindenberg and reviewed by me. (Android support coming soon! b2g perhaps slightly longer, due to some b2g-specific hurdles. Stay tuned.)

What's internationalization?

Internationalization (i18n for short — i, eighteen characters, n) is the process of writing applications in a way that allows them to be easily adapted for audiences from varied places, using varied languages. It's easy to get this wrong by inadvertently assuming one's users will come from one place and speak a single language, especially if you don't even know you've made an assumption.

function formatDate(d)
{
  // Everyone uses month/date/year…right?
  var month = d.getMonth() + 1;
  var date = d.getDate();
  var year = d.getFullYear();
  return month + "/" + date + "/" + year;
}

function formatMoney(amount)
{
  // All money is dollars with two fractional digits…right?
  return "$" + amount.toFixed(2);
}

function sortNames(names)
{
  function sortAlphabetically(a, b)
  {
    var l = a.toLowerCase(), r = b.toLowerCase();
    if l < r)
      return -1;
    if (l > r)
      return 1;
    return 0;
  }

  // Names always sort alphabetically…right?
  names.sort(sortAlphabetically);
}

JS's historical i18n support is poor

i18n-aware formatting in traditional JS uses the various toLocaleString() methods. The resulting strings contained whatever details the implementation chose to provide: no way to pick and choose (did you need a weekday in that formatted date? is the year irrelevant?). Even if the proper details were included, the format might be wrong e.g. decimal when percentage was desired. And you were stuck formatting for the user's locale only.

As for sorting (collation, when sorting text), JS provided almost no useful locale-sensitive text-comparison functions. localeCompare() existed but with a very awkward interface unsuited for use with sort. And it too didn't permit choosing a locale or sort order.

These limitations are bad enough that — this surprised me greatly when I learned it! — serious web applications that need internationalization capabilities (most commonly, financial sites displaying currencies) will box up their data, send it to a server, have the server perform the operation, and send it back to the client. Server roundtrips just to format amounts of money. Yeesh.

A new JS Internationalization API

The new ECMAScript Internationalization API greatly improves JavaScript's i18n capabilities. It provides all the flourishes one could want for formatting dates and numbers and sorting text. The locale is selectable, with fallback if the requested locale is unsupported. Formatting requests can specify the particular components to include. Custom formats for percentages, significant digits, and currencies are supported. Numerous collation options are exposed for use in sorting text. And if you care about performance, the up-front work to select a locale and process options can now be done once, instead of once every time a locale-dependent operation is performed.

That said, the API is not a panacea. The API is "best effort" only. Precise outputs are almost always deliberately unspecified. An implementation could legally support only the oj locale, or it could ignore (almost all) provided formatting options. Most implementations will have high-quality support for many locales, but it's not guaranteed (particularly on resource-constrained systems such as mobile).

The `Intl` interface

The i18n API lives on the global Intl object. Intl contains three constructors (with more coming): Intl.Collator, Intl.DateTimeFormat, and Intl.NumberFormat. Each constructor creates an object exposing the relevant operation, caching the provided locale and options-processing information for efficiency. Creating any of these objects follows a simple pattern:

var instance = new Intl.ctor(locales, options);

locales is a string specifying a single language tag or an arraylike object containing a list of language tags. Language tags are strings like en (English generally), de-AT (German as used in Austria), or zh-Hant-TW (Chinese as used in Taiwan, using the traditional Chinese script). Language tags can also include a "Unicode extension", of the form -u-key1-value1-key2-value2..., where each key is an "extension key". The various constructors interpret these specially.

options is an object whose properties (or their absence, by evaluating to undefined) determine how the formatter or collator behaves. Its exact interpretation is determined by the individual constructor. The implementation will try to produce behavior matching the formatting/collating requests, in the way that closest follows locale rules. But there's no guarantee particular formatting/collation rules will be used.

Date/time formatting

Options

The primary options properties for date/time formatting are as follows:

weekday, era (that is, BC/AD): "narrow", "short", or "long"
month: "2-digit", "numeric", "narrow", "short", or "long"
year
day
hour, minute, second: "2-digit" or "numeric"
timeZoneName: "short" or "long"

The values don't map to particular formats: remember, the Intl API almost never specifies exact behavior. But the intent is that "narrow", "short", and "long" produce output of corresponding size — "S" or "Sa", "Sat", and "Saturday", for example. (Output may be ambiguous: Saturday and Sunday both could produce "S".) "2-digit" and "numeric" map to two-digit number strings or full-length numeric strings: "70" and "1970", for example.

The final used options are largely the requested options. However, if you don't specifically request any weekday/year/month/day/hour/minute/second, then year/month/day will be added to your provided options.

Beyond these basic options are a few special options:

timeZone: If a time zone is included in the format, case-insensitive "UTC" formats the time as UTC. This is the only value that must be supported: values like "CEST" or "America/New_York" aren't guaranteed to work (and, presently, won't work in Firefox).
hour12: Specifies whether hours will be in 12-hour or 24-hour format. The default is typically locale-dependent. (Details such as whether midnight is zero-based or twelve-based and whether leading zeroes are present are also locale-dependent.)

There are also two special properties, localeMatcher (taking either "lookup" or "best fit") and formatMatcher (taking either "basic" or "best fit"), each defaulting to "best fit". These affect how the right locale and format are selected. They're esoteric, and you should probably ignore them.

Locale-centric options

DateTimeFormat also allows formatting using customized calendaring and numbering systems. These details are effectively part of the locale, so they're specified in the Unicode extension in the language tag.

For example, Thai as spoken in Thailand has the language tag th-TH. Recall that a Unicode extension has the format -u-key1-value1-key2-value2.... The calendaring system key is ca, and the numbering system key is nu. The Thai numbering system has the value thai, and the Buddhist calendaring system has the value buddhist. Thus to format dates in this overall manner, we tack a Unicode extension containing both these key/value pairs onto the end of the language tag: th-TH-u-ca-buddhist-nu-thai.

For more information on the various calendaring and numbering systems, see the full DateTimeFormat documentation.

Examples

After creating a DateTimeFormat object, the next step is to use it to format dates via the handy format() function. Conveniently, this function is a bound function: you don't have to call it on the DateTimeFormat directly. Then provide it a timestamp or Date object.

Putting it all together, here are some examples of how to create DateTimeFormat options for particular uses, with possible behaviors.

var msPerDay = 24 * 60 * 60 * 1000;

// July 17, 2014 00:00:00.
var july172014 = new Date(msPerDay * (44 * 365 + 11 + 197));

Let's format a date for English as used in the United States. Let's include two-digit month/day/year, plus two-digit hours/minutes, and a short time zone to clarify that time. (The result would obviously be different in another time zone.)

var americanDateTime =
  new Intl.DateTimeFormat("en-US",
                          { year: "2-digit", month: "2-digit", day: "2-digit",
                             hour: "2-digit", minute: "2-digit",
                             timeZoneName: "short" }).format;

print(americanDateTime(july172014)); // 07/16/14, 5:00 PM PDT

Or let's do something similar for Portuguese — ideally as used in Brazil, but in a pinch Portugal works. Let's go for a little longer format, with full year and spelled-out month, but make it UTC for portability.

var portugueseTime =
  new Intl.DateTimeFormat(["pt-BR", "pt-PT"],
                           { year: "numeric", month: "long", day: "numeric",
                             hour: "2-digit", minute: "2-digit",
                             timeZoneName: "short", timeZone: "UTC" });

print(portugueseTime.format(july172014)); // 17 de julho de 2014 00:00 GMT

How about a compact weekly Swiss train schedule entry? We'll try the official languages from most to least popular to choose the one that's most likely to be readable. Let's use UTC to be unambiguous.

var swissTime =
  new Intl.DateTimeFormat(["de-CH", "fr-CH", "it-CH", "rm-CH"],
                           { weekday: "short",
                             hour: "numeric", minute: "numeric",
                             timeZone: "UTC", timeZoneName: "short" }).format;

print(swissTime(july172014)); // Do. 00:00 GMT

And for something completely different, a longer date for use in Thai as used in Thailand, using the Thai numbering system and Buddhist calendar:

var thaiDate =
  new Intl.DateTimeFormat("th-TH-u-nu-thai-ca-buddhist",
                           { year: "numeric", month: "long", day: "numeric" });

print(thaiDate.format(july172014)); // ๑๖ กรกฎาคม ๒๕๕๗

Calendar and numbering system bits aside, it's relatively simple. Just pick what you want and the length you want it.

Number formatting

Options

The primary options properties for number formatting are as follows:

style: "currency", "percent", or "decimal" (the default) to format a value of that kind.
currency: A three-letter currency code, e.g. USD or CHF. Required if style is "currency", otherwise meaningless.
currencyDisplay: "code", "symbol", or "name", defaulting to "symbol". "code" will use the three-letter currency code in the formatted string. "symbol" will use a currency symbol such as $ or £. "name" typically uses some sort of spelled-out version of the currency. (Firefox currently only supports "symbol", but this will be fixed soon.)
minimumIntegerDigits: An integer in the range 1 to 21 (inclusive at both ends), defaulting to 1. The resulting string is front-padded with zeroes until its integer component contains at least this many digits. (For example, if this value were 2, formatting 3 might produce "03".)
minimumFractionDigits, maximumFractionDigits: Integers in the range 0 to 20 (inclusive at both ends). The resulting string will have at least minimumFractionDigits, and no more than maximumFractionDigits, fractional digits. The default minimum is currency-dependent (usually 2, rarely 0 or 3) if style is "currency", otherwise 0. The default maximum is 0 for percents, 3 for decimals, and currency-dependent for currencies, or the minimum if it's larger.
minimumSignificantDigits, maximumSignificantDigits: Integers in the range 1 to 21 (inclusive at both ends). If present, these override the integer/fraction digit control above to determine the minimum/maximum significant figures in the formatted number string, as determined in concert with the number of decimal places required to accurately specify the number. (Note that in a multiple of 10 the significant digits may be ambiguous, as in "100" with its one, two, or three significant digits.)
useGrouping: Boolean (defaulting to true) determining whether the formatted string will contain grouping separators (for example, "," as English thousands separator).

NumberFormat also recognizes an esoteric, mostly ignorable localeMatcher property.

Locale-centric options

Just as DateTimeFormat permitted use of a custom numbering system, so too does NumberFormat. As before, this information is in the Unicode extension using the nu key. For example, the language tag for Chinese as used in China is zh-CN. The value for the Tibetan numbering system is tibt. To format numbers for these systems, we tack a Unicode extension onto the language tag: zh-CN-u-nu-tibt.

For complete information on specifying the various numbering systems, see the full NumberFormat documentation.

Examples

NumberFormat objects have a format function property just as DateTimeFormat objects do. And as there, the format function is a bound function that may be used in isolation from the NumberFormat.

Here are some examples of how to create DateTimeFormat options for particular uses, with possible behaviors. First let's format some money for use in Chinese as used in China. Select the "currency" style, then use the code for Chinese renminbi (yuan), grouping by default, with the usual number of fractional digits.

var tibetanRMBInChina =
  new Intl.NumberFormat("zh-CN-u-nu-tibt",
                        { style: "currency", currency: "CNY" });

print(tibetanRMBInChina.format(1314.25)); // ￥ ༡,༣༡༤.༢༥

Or let's format a United States-style gas price, with its peculiar thousandths-place 9, for use in English as used in the United States.

var gasPrice =
  new Intl.NumberFormat("en-US",
                        { style: "currency", currency: "USD",
                          minimumFractionDigits: 3 });

print(gasPrice.format(5.259)); // $5.259

Or let's try a percentage in Arabic, meant for use in Egypt. Make sure the percentage has at least two fractional digits.

var arabicPercent =
  new Intl.NumberFormat("ar-EG",
                        { style: "percent",
                          minimumFractionDigits: 2 }).format;

print(arabicPercent(0.4382)); // ٤٣٫٨٢٪

Or suppose we're formatting for Farsi as used in Afghanistan, and we want at least two integer digits and no more than two fractional digits.

var farsiDecimal =
  new Intl.NumberFormat("fa-AF",
                        { minimumIntegerDigits: 2,
                          maximumFractionDigits: 2 });

print(farsiDecimal.format(3.1416)); // ۰۳٫۱۴

Finally, let's format an amount of Iraqi dinars, for Kurdish as used in Iraq. Unusually compared to most currencies, Iraqi dinars are formatted by default to the thousandth. We haven't overridden that, so that's what we'll get.

var iraqiDinars =
  new Intl.NumberFormat("ku-IQ",
                        { style: "currency", currency: "IQD" });

print(iraqiDinars.format(3.175)); // ۰۳٫۱۴

Collation

Options

The primary options properties for number formatting are as follows:

usage: "sort" or "search" (defaulting to "sort"), specifying the intended use of this Collator. (It's sometimes possible to perform comparisons for searching faster than comparisons for sorting. This property lets implementations optimize if properly informed.)
sensitivity: "base", "accent", "case", or "variant". This affects how sensitive the collator is to characters that have the same base letter (e.g. "o" for "ö" and "Ó") but differ in terms of accents/diacritics and/or case. "base" sensitivity will consider only the base letter, ignoring all modifications to it (so "a", "A", and "Ä" are considered the same). "accent" considers the base letter and accents but ignores case (so "a" and "A" are the same, but "â" differs from both). "case" considers the base letter and its case, but ignores accents and such (so "a" and "ä" are the same, but "A" differs from both). Finally, "variant" considers base letter, accents, and case (so "a", "ä, "Ä" and "A" all differ). If usage is "sort", the default is "variant"; otherwise it's locale-dependent.
numeric: Boolean (defaulting to false) determining whether complete numbers embedded in strings are considered when sorting. For example, numeric sorting might produce "F-4 Phantom II", "F-14 Tomcat", "F-35 Lightning II"; non-numeric sorting might produce "F-14 Tomcat", "F-35 Lightning II", "F-4 Phantom II".
caseFirst: "upper", "lower", or "false" (the default). Determines how case is considered when sorting: "upper" places uppercase letters first ("B", "a", "c"), "lower" places lowercase first ("a", "c", "B"), and "false" ignores case entirely ("a", "B", "c"). (Note: Firefox currently ignores this property.)
ignorePunctuation: Boolean (defaulting to false) determining whether to ignore embedded punctuation when performing the comparison (for example, so that "biweekly" and "bi-weekly" compare equivalent).

And there's that localeMatcher property that you can probably ignore.

Locale-centric options

The main Collator option specified as part of the locale's Unicode extension is co, selecting the kind of sorting to perform: phone book (phonebk), dictionary (dict), and many others.

Additionally, the keys kn and kf may, optionally, duplicate the numeric and caseFirst properties of the options object. But they're not guaranteed to be supported in the language tag, and options is much clearer than language tag components. So it's best to only adjust these options through options.

These key-value pairs are included in the Unicode extension the same way they've been included for DateTimeFormat and NumberFormat; refer to those sections for how to specify these in a language tag.

Examples

Collator objects have a compare function property. This function accepts two arguments x and y and returns a number less than zero if x compares less than y, 0 if x compares equal to y, or a number greater than zero if x compares greater than y. As with the format functions, compare is a bound function that may be extracted for standalone use.

Let's try sorting a few German surnames, for use in German as used in Germany. There are actually two different sort orders in German, phonebook and dictionary. Phonebook sort emphasizes sound, and it's as if "ä", "ö", and so on were expanded to "ae", "oe", and so on prior to sorting.

var names =
  ["Hochberg", "Hönigswald", "Holzman"];

var germanPhonebook = new Intl.Collator("de-DE-u-co-phonebk");

// as if sorting ["Hochberg", "Hoenigswald", "Holzman"]
print(names.sort(germanPhonebook.compare).join(", ")); // Hochberg, Hönigswald, Holzman

Some German words conjugate with extra umlauts, so in dictionaries it's sensible to order ignoring umlauts (except when ordering words differing only by umlauts: schon before schön).

var germanDictionary = new Intl.Collator("de-DE-u-co-dict");

// as if sorting ["Hochberg", "Honigswald", "Holzman"]
print(names.sort(germanDictionary.compare).join(", ")); // Hochberg, Holzman, Hönigswald

Or let's sort a list Firefox versions with various typos (different capitalizations, random accents and diacritical marks, extra hyphenation), in English as used in the United States. We want to sort respecting version number, so do a numeric sort so that numbers in the strings are compared, not considered character-by-character.

var firefoxen =
  ["FireFox 3.6", "Fire-fox 1.0", "Firefox 29", "FÍrefox 3.5", "Fírefox 18"];

var usVersion =
  new Intl.Collator("en-US",
                    { sensitivity: "base",
                      numeric: true,
                      ignorePunctuation: true });

// Fire-fox 1.0, FÍrefox 3.5, FireFox 3.6, Fírefox 18, Firefox 29
print(firefoxen.sort(usVersion.compare).join(", "));

Last, let's do some locale-aware string searching that ignores case and accents, again in English as used in the United States.

// ["A͢maya", "CH͛rôme", "FirefÓx", "sAfàri", "ọpERA", "I͒E"],
// but with both composed and decomposed forms because
// comparisons work regardless.
var decoratedBrowsers =
  ["A\u0362maya", "CH\u035Brôme", "FirefÓx", "sAfàri", "o\u0323pERA", "I\u0352E"];

var fuzzySearch =
  new Intl.Collator("en-US", { usage: "search", sensitivity: "base" });

function findBrowser(browser)
{
  return function(v) { return fuzzySearch.compare(v, browser) === 0; };
}

print(decoratedBrowsers.findIndex(findBrowser("Firêfox"))); // 2
print(decoratedBrowsers.findIndex(findBrowser("Safåri")));  // 3
print(decoratedBrowsers.findIndex(findBrowser("Ãmaya")));   // 0
print(decoratedBrowsers.findIndex(findBrowser("Øpera")));   // 4
print(decoratedBrowsers.findIndex(findBrowser("Chromè")));  // 1
print(decoratedBrowsers.findIndex(findBrowser("IË")));      // 5

Odds and ends

It may be useful to determine whether support for some operation is provided for particular locales, or to determine whether a locale is supported. Intl provides supportedLocales() functions on each constructor, and resolvedOptions() functions on each prototype, to expose this information.

print(Intl.Collator.supportedLocalesOf(["nv"], { usage: "sort" }).length > 0
      ? "Navajo collation supported"
      : "Navajo collation not supported");

var germanFakeRegion = new Intl.DateTimeFormat("de-XX", { timeZone: "UTC" });
var usedOptions = germanFakeRegion.resolvedOptions();
print(usedOptions.locale);   // de
print(usedOptions.timeZone); // UTC

Legacy behavior

The ES5 toLocaleString-style and localeCompare functions previously had no particular semantics, accepted no particular options, and were largely useless. So the i18n API reformulates them in terms of Intl operations. Each method now accepts additional trailing locales and options arguments, interpreted just as the Intl constructors would do. (Except that for toLocaleTimeString and toLocaleDateString, different default components are used if options aren't provided.)

For brief use where precise behavior doesn't matter, the old methods are fine to use. But if you need more control or are formatting or comparing many times, it's best to use the Intl primitives directly.

Conclusion

Internationalization is a fascinating topic whose complexity is bounded only by the varied nature of human communication. The Internationalization API addresses a small but quite useful portion of that complexity, making it easier to produce locale-sensitive web applications. Go use it!

What's internationalization?

JS's historical i18n support is poor

A new JS Internationalization API

The Intl interface

Date/time formatting

Options

Locale-centric options

Examples

Number formatting

Options

Locale-centric options

Examples

Collation

Options

Locale-centric options

Examples

Odds and ends

Legacy behavior

Conclusion

The `Intl` interface