- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Fri, 11 May 2007 17:35:14 +0100
- To: "Eric Prud'hommeaux" <eric@w3.org>
- CC: public-rdf-dawg-comments@w3.org
Thank you for the detailed consideration you've given to my comment. Jeremy Eric Prud'hommeaux wrote: > You indicated this was being addressed to your satisfaction, but some > extra text has been added to LANG(?x) and I want to make sure you're > content with that as well. > > * Jeremy Carroll <jjc@hpl.hp.com> [2007-04-05 10:34+0100] >> >> This is a comment on: >> http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/#func-langMatches >> >> specifically the text: >> [[ >> matches language-tag (first argument) per Matching of Language Tags >> [RFC4647] section 2.1. >> ]] >> >> Contents of comment: >> - issue statement >> - suggested editorial textual change >> - further analysis and options (the bulk of the message, which to large >> part can be ignored) >> >> Issue >> ===== >> Section 2.1 of RFC 4647 defines basic language ranges, without giving >> any semantics, nor defining an algorithm for "matches". Hence the word >> "matches" in the quoted text is unbound, and without clear meaning. >> >> Sections 3.3.1 and 3.3.2 and 3.4 each describe different matching >> algorithms that can be used with basic language ranges. >> >> Suggested text: >> =============== >> Replace >> [[ >> Returns true if language-range (second argument) matches language-tag >> (first argument) per Matching of Language Tags [RFC4647] section 2.1. >> ]] >> with >> [[ >> Returns true if language-range (second argument) matches language-tag >> (first argument). >> language-range is a basic language range >> per Matching of Language Tags [RFC4647] section 2.1. >> 'matches' is defined as basic filtering in [RFC4647] section 3.3.1. >> ]] > > http://www.w3.org/2001/sw/DataAccess/rq23/rq25#func-langMatches > [[ > 11.4.12 langMatches > > xsd:boolean langMatches (simple literal language-tag, simple literal language-range) > > Returns true if language-tag (first argument) matches language-range > (second argument) per the basic filtering scheme defined in [RFC4647] > section 3.3.1. language-range is a basic language range per Matching > of Language Tags [RFC4647] section 2.1. A language-range of "*" > matches any non-empty language-tag string. > ]] > > I believe this addresses your comments above. > > Do to a request from Addison Philipps on behalf of the i18n WG, > a sentence has been added to the definition of LANG(?x): > > http://www.w3.org/2001/sw/DataAccess/rq23/rq25#func-lang > [[ > simple literal lang (literal ltrl) > > Returns the language tag of ltrl, if it has one. It returns "" if ltrl > has no language tag. Note that the RDF data model does not include > literals with an empty language tag. > ]] > > This comes from the text about how literals are represented in RDF. > http://www.w3.org/TR/rdf-syntax-grammar/#section-literal-node > [[ > If ·literal-language· is the empty string then the value is the > concatenation of """ (1 double quote), the escaped value of the > ·literal-value· accessor and """ (1 double quote). > > Otherwise the value is the concatenation of """ (1 double quote), the > escaped value of the ·literal-value· accessor ""@" (1 double quote and > a '@'), and the value of the ·literal-language· accessor. > ]] > > I read this as saying that the object in > <rdf:Description><some:predicate xml:lang="">abc</...></...> > exactly equals > <rdf:Description><some:predicate >abc</...></...> > > I will feel better about this interpretation with your blessing. > >> Analysis >> ======== >> The algorithm of section 3.4 is not suitable since it is scoped as >> [[select[ing] the single language tag that best matches the [...] >> request]]. i.e. it always gives exactly one result, when matching >> against any non-empty set of languages - it does not define a boolean >> function: lang-tag x lang-range => boolean, but a selection function >> non-empty-list-of-lang-tags x lang-range => lang-tag >> >> The algorithm of section 3.3.2 is designed for extended language ranges, >> which are more appropriate for the new features of RFC 4646 (such as >> script subtags). >> >> The reference to section 2.1 is indicative that SPARQL is more >> interested in basic language ranges, which were already specified in RFC >> 3066, and are suited to matching lang tags that conform with RFC 3066 >> (and hence also with RFC 4646). The algorithm of section 3.3.1 is hence >> (IMO) currently the closest 'reading' of the SPARQL WD. >> >> Technically, the choice is: >> a) use basic language ranges (section 2.1) and basic filtering (3.3.1) >> or >> b) use extended language ranges (section 2.2) and extended filtering (3.3.2) >> >> FYGI, the extended language ranges are like language ranges except they >> permit a "*" in any subtag position, e.g. >> de-*-DE >> de-DE-* >> (but not de-DE*) >> When used with extended filtering, any -*- is effectively ignored, and >> treated as -, but note that an initial *- is significant. >> >> Then (simplifying by ignoring private use and other extensions) a lang >> range matches a lang tag if both >> a) the first subtags match >> b) (ignoring the *'s) the "-"-separated sequence of the language range >> is a subsequence (allowing arbitrary deletions) of the "-"-separated >> sequence of the language tag. >> >> The reason this is more appropriate for new RFC 4646 style tags is that >> RFC 4646 allows additional information, such as script subtags, to be >> inserted in the appropriate place in a tag. >> >> So, the example given in RFC 4647 is that >> >> de-DE basic matches de-DE (i.e. german as spoken in Germany) >> de-DE basic matches de-DE-1966 (i.e. german as spoken in Germany, >> written with the orthography of 1996) >> de-DE does not basic match de-Latf-DE (i.e. german, as spoken in >> Germany, written in the Fraktur variant of the Latin script) >> >> whereas >> both the basic matches are extended matches (indeed, any basic match is >> an extended match), but also >> de-DE extended matches de-Latf-DE >> which is probably more consistent behaviour from the end users point of >> view when using such new features of RFC 4646 style tags. >> >> It is plausible that some semantic web applications may well have a need >> for using extended language ranges like "*-Latn", for example, to >> populate some part of a web page, when no content exactly matching the >> current language preferences has been found. Many users have a >> preference for text in a script they can read, even if they don't >> understand it, over a perhaps intelligible word, written in a script >> that is not intelligible. This use case however, depends on widespread >> use of RFC 4646 script subtags, which, while possibly desirable is not a >> current actuality. Moreover, code that worked to end user satisfaction >> would also depend on appropriate deployment of section 4.1 of RFC 4646 >> (choice of language tag) either in the code or the processes of >> constructing the semantic web data or both, so that script codes were >> used consistently. >> >> Thus, I have suggested the more conservative change, but would be >> equally satisfied if the SPARQL WG wanted to embrace extended language >> ranges! > > Thank you for the thorough analysis. It certainly prepared me to > discuss this text to the i18n group. If you are satisfied with, > please reply with [CLOSED], and thanks again. -- Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England
Received on Friday, 11 May 2007 16:35:43 UTC