- From: John Cowan <cowan@ccil.org>
- Date: Mon, 30 Mar 2009 11:44:30 -0400
- To: public-rdf-text@w3.org
The current wiki version of rdf:text speaks of "the algorithm for 'Matching of Language Tags' which is part of BCP 47". However, that document contains three distinct matching algorithms. The editorial comment suggests that the filtering algorithm (either basic or extended) is intended. I am writing to you, however, to urge that at least two and if possible all three algorithms be provided. Briefly, the basic and extended filtering algorithms treat the range as an underspecification of the tag, in the familiar manner of regex matching. A tag matches iff it provides at least the subtags present in the range. Thus the range "en" matches the tags "en" and "en-us", whereas the range "en-us" matches "en-us" but not "en". Basic filtering is HTTP-compatible and simply truncates the tag from the right; extended filtering copes better with more complex tags and treats missing subtags in the range as wildcards. Thus the range "en-us" will match the tag "en-us" using either basic or extended filtering, but will match "en-Latn-us" with extended filtering but not with basic filtering. The lookup algorithm has a quite different behavior: it treats the range as a possible overspecification of the tag. Thus the range "en-us" matches the tags "en" and "en-us", but the range "en" matches "en" but not "en-us". Truncation from the right is applied to the range rather than to the tag. There is another difference between filtering and lookup: when applied to a sequence of language tags, filtering returns all matches whereas lookup returns only the longest match. Filtering, as its name suggests, is used to filter out tags that do not meet the minimal constraint of the range; lookup is used to find the most specific tag that is no more specific than what the range prescribes. HTTP servers apply filtering when a Language-Range header is supplied by the client; if there is more than one match, a special HTTP status code is returned and the possibilities are listed. Some servers, such as Apache, will apply lookup if filtering does not return any results. Lookup alone is used by Java, for example, in looking for the most appropriate localized properties: if en-us properties cannot be found, then en properties are used instead. It is not obvious that all applications of rdf:text will prefer filtering. In particular, if rdf:text is used for localization, as seems likely, lookup will prove useful. Therefore, both algorithms should be supplied. The choice between basic and extended filtering is a matter of backward compatibility vs. more intuitive results: I suggest that if only one filtering algorithm is supplied, it should be extended filtering. Further I would suggest the possibility of allowing the function(s) to accept a sequence of language tags rather than just a single language tag, to provide the full functionality of matching. I speak as a member of the IETF LTRU WG and the ietf-languages mailing list, but not for them. -- But you, Wormtongue, you have done what you could for your true master. Some reward you have earned at least. Yet Saruman is apt to overlook his bargains. I should advise you to go quickly and remind him, lest he forget your faithful service. --Gandalf John Cowan <cowan@ccil.org>
Received on Monday, 30 March 2009 15:45:05 UTC