fn:matches-language-range from John Cowan on 2009-03-30 (public-rdf-text@w3.org from January to March 2009)

From: John Cowan <cowan@ccil.org>
Date: Mon, 30 Mar 2009 11:44:30 -0400
To: public-rdf-text@w3.org
Message-ID: <20090330154430.GB12552@mercury.ccil.org>
The current wiki version of rdf:text speaks of "the algorithm for
'Matching of Language Tags' which is part of BCP 47".  However, that
document contains three distinct matching algorithms.  The editorial
comment suggests that the filtering algorithm (either basic or extended)
is intended.  I am writing to you, however, to urge that at least two
and if possible all three algorithms be provided.

Briefly, the basic and extended filtering algorithms treat the range as an
underspecification of the tag, in the familiar manner of regex matching.
A tag matches iff it provides at least the subtags present in the range.
Thus the range "en" matches the tags "en" and "en-us", whereas the range
"en-us" matches "en-us" but not "en".  Basic filtering is HTTP-compatible
and simply truncates the tag from the right; extended filtering copes
better with more complex tags and treats missing subtags in the range
as wildcards.  Thus the range "en-us" will match the tag "en-us" using
either basic or extended filtering, but will match "en-Latn-us" with
extended filtering but not with basic filtering.

The lookup algorithm has a quite different behavior: it treats the range
as a possible overspecification of the tag.  Thus the range "en-us"
matches the tags "en" and "en-us", but the range "en" matches "en" but not
"en-us".  Truncation from the right is applied to the range rather than
to the tag.  There is another difference between filtering and lookup:
when applied to a sequence of language tags, filtering returns all matches
whereas lookup returns only the longest match.

Filtering, as its name suggests, is used to filter out tags that do not
meet the minimal constraint of the range; lookup is used to find the most
specific tag that is no more specific than what the range prescribes.
HTTP servers apply filtering when a Language-Range header is supplied
by the client; if there is more than one match, a special HTTP status code
is returned and the possibilities are listed.  Some servers, such as
Apache, will apply lookup if filtering does not return any results.
Lookup alone is used by Java, for example, in looking for the most
appropriate localized properties: if en-us properties cannot be found,
then en properties are used instead.

It is not obvious that all applications of rdf:text will prefer filtering.
In particular, if rdf:text is used for localization, as seems likely,
lookup will prove useful.  Therefore, both algorithms should be supplied.
The choice between basic and extended filtering is a matter of backward
compatibility vs. more intuitive results: I suggest that if only one
filtering algorithm is supplied, it should be extended filtering.

Further I would suggest the possibility of allowing the function(s) to
accept a sequence of language tags rather than just a single language tag,
to provide the full functionality of matching.

I speak as a member of the IETF LTRU WG and the ietf-languages mailing
list, but not for them.

-- 
But you, Wormtongue, you have done what you could for your true master.  Some
reward you have earned at least.  Yet Saruman is apt to overlook his bargains.
I should advise you to go quickly and remind him, lest he forget your faithful
service.  --Gandalf             John Cowan <cowan@ccil.org>
Received on Monday, 30 March 2009 15:45:05 UTC