aboutsummaryrefslogtreecommitdiff
path: root/content/matchlang.article
diff options
context:
space:
mode:
authorRuss Cox <rsc@golang.org>2020-03-09 23:54:35 -0400
committerRuss Cox <rsc@golang.org>2020-03-17 20:58:37 +0000
commitaf5018f64e406aaa646dae066f28de57321ea5ce (patch)
tree8db7b1f049d83d215fa9abf68851efce7b5ccadb /content/matchlang.article
parent86e424fac66fa90ddcb7e8d7febd4c2b07d7c59e (diff)
content: convert to Markdown-enabled present inputs
Converted blog to Markdown-enabled present (CL 222846) using present2md (CL 222847). For golang/go#33955. Change-Id: Ib39fa1ddd9a46f9c7a62a2ca7b96e117635553e8 Reviewed-on: https://go-review.googlesource.com/c/blog/+/222848 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Andrew Bonventre <andybons@golang.org>
Diffstat (limited to 'content/matchlang.article')
-rw-r--r--content/matchlang.article43
1 files changed, 22 insertions, 21 deletions
diff --git a/content/matchlang.article b/content/matchlang.article
index 47473ea..ea64dbd 100644
--- a/content/matchlang.article
+++ b/content/matchlang.article
@@ -1,10 +1,11 @@
-Language and Locale Matching in Go
-09 Feb 2016
+# Language and Locale Matching in Go
+9 Feb 2016
Tags: language, locale, tag, BCP 47, matching
+Summary: Consider an application, such as a web site, with support for multiple languages in its user interface. When a user arrives with a list of preferred languages, the application must decide which language it should use in its presentation to the user. This requires finding the best match between the languages the application supports and those the user prefers. This post explains why this is a difficult decision and how Go can help.
Marcel van Lohuizen
-* Introduction
+## Introduction
Consider an application, such as a web site, with support for multiple languages
in its user interface.
@@ -14,7 +15,7 @@ This requires finding the best match between the languages the application suppo
and those the user prefers.
This post explains why this is a difficult decision and how Go can help.
-* Language Tags
+## Language Tags
Language tags, also known as locale identifiers, are machine-readable
identifiers for the language and/or dialect being used.
@@ -47,14 +48,14 @@ This involves a different kind of matching. For example, as there is no specific
sorting order for Portuguese, a collate package may fall back to the sorting
order for the default, or “root”, language.
-* The Messy Nature of Matching Languages
+## The Messy Nature of Matching Languages
Handling language tags is tricky.
This is partly because the boundaries of human languages are not well defined
and partly because of the legacy of evolving language tag standards.
In this section we will show some of the messy aspects of handling language tags.
-__Tags_with_different_language_codes_can_indicate_the_same_language_
+_ Tags with different language codes can indicate the same language_
For historical and political reasons, many language codes have changed over
time, leaving languages with an older legacy code as well as a new one.
@@ -66,7 +67,7 @@ the group of Chinese languages.
Tags for macro languages are often used interchangeably with the most-spoken
language in the group.
-_Matching_language_code_alone_is_not_sufficient_
+_Matching language code alone is not sufficient_
Azerbaijani (“az”), for example, is written in different scripts depending on
the country in which it is spoken: "az-Latn" for Latin (the default script),
@@ -82,7 +83,7 @@ A similar thing can be said for Kyrgyz and other languages.
If you ignore subtags, you might as well present Greek to the user.
-_The_best_match_might_be_a_language_not_listed_by_the_user_
+_The best match might be a language not listed by the user_
The most common written form of Norwegian (“nb”) looks an awful lot like Danish.
If Norwegian is not available, Danish may be a good second choice.
@@ -93,7 +94,7 @@ Other examples abound.
If a user-requested language is not supported, falling back to English is often
not the best thing to do.
-_The_choice_of_language_decides_more_than_translation_
+_The choice of language decides more than translation_
Suppose a user asks for Danish, with German as a second choice.
If an application chooses German, it must not only use German translations
@@ -105,10 +106,10 @@ handshaking algorithm: first you determine which protocol to communicate in (the
language) and then you stick with this protocol for all communication for the
duration of a session.
-_Using_a_“parent”_of_a_language_as_fallback_is_non-trivial_
+_Using a “parent” of a language as fallback is non-trivial_
Suppose your application supports Angolan Portuguese (“pt-AO”).
-Packages in [[https://golang.org/x/text]], like collation and display, may not
+Packages in [golang.org/x/text](https://golang.org/x/text), like collation and display, may not
have specific support for this dialect.
The correct course of action in such cases is to match the closest parent dialect.
Languages are arranged in a hierarchy, with each specific language having a more
@@ -127,9 +128,9 @@ To give a few more examples, the parent of “es-CL” is “es-419”, the pare
If you compute the parent by simply removing subtags, you may select a “dialect”
that is incomprehensible to the user.
-* Language Matching in Go
+## Language Matching in Go
-The Go package [[https://golang.org/x/text/language]] implements the BCP 47
+The Go package [golang.org/x/text/language](https://golang.org/x/text/language) implements the BCP 47
standard for language tags and adds support for deciding which language to use
based on data published in the Unicode Common Locale Data Repository (CLDR).
@@ -138,7 +139,7 @@ preferences against an application's supported languages:
.code -edit matchlang/complete.go
-** Creating Language Tags
+### Creating Language Tags
The simplest way to create a language.Tag from a user-given language code string
is with language.Make.
@@ -164,7 +165,7 @@ Canonicalization is handled in the Matcher instead.
A full array of canonicalization options are available if the programmer still
desires to do so.
-** Matching User-Preferred Languages to Supported Languages
+### Matching User-Preferred Languages to Supported Languages
A Matcher matches user-preferred languages to supported languages.
Users are strongly advised to use it if they don’t want to deal with all the
@@ -193,7 +194,7 @@ common English (“en”), which defaults to American.
It is all the same for the Matcher. An application may even add both, allowing
for more specific American slang for “en-US”.
-** Matching Example
+### Matching Example
Consider the following Matcher and lists of supported languages:
@@ -245,7 +246,7 @@ collation order).
German is the best match for Swiss German in the server's language list, and the
option for phone-book collation order has been carried over.
-** Confidence Scores
+### Confidence Scores
Go uses coarse-grained confidence scoring with rule-based elimination.
A match is classified as Exact, High (not exact, but no known ambiguity), Low
@@ -264,9 +265,9 @@ We found that using coarse-grained scoring in the Go implementation ended up
simpler to implement, more maintainable, and faster, meaning that we could
handle more rules.
-** Displaying Supported Languages
+### Displaying Supported Languages
-The [[https://golang.org/x/text/language/display]] package allows naming language
+The [golang.org/x/text/language/display](https://golang.org/x/text/language/display) package allows naming language
tags in many languages.
It also contains a “Self” namer for displaying a tag in its own language.
@@ -287,7 +288,7 @@ prints
In the second column, note the differences in capitalization, reflecting the
rules of the respective language.
-* Conclusion
+## Conclusion
At first glance, language tags look like nicely structured data, but because
they describe human languages, the structure of relationships between language
@@ -297,5 +298,5 @@ ad-hoc language matching using nothing other than string manipulation of the
language tags.
As described above, this can produce awful results.
-Go's [[https://golang.org/x/text/language]] package solves this complex problem
+Go's [golang.org/x/text/language](https://golang.org/x/text/language) package solves this complex problem
while still presenting a simple, easy-to-use API. Enjoy.