[CLOSED] Re: 11.4.12 langMatches from Jeremy Carroll on 2007-05-11 (public-rdf-dawg-comments@w3.org from May 2007)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Fri, 11 May 2007 17:35:14 +0100
To: "Eric Prud'hommeaux" <eric@w3.org>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <46449B42.5040508@hpl.hp.com>
Thank you for the detailed consideration you've given to my comment.

Jeremy

Eric Prud'hommeaux wrote:
> You indicated this was being addressed to your satisfaction, but some
> extra text has been added to LANG(?x) and I want to make sure you're
> content with that as well.
> 
> * Jeremy Carroll <jjc@hpl.hp.com> [2007-04-05 10:34+0100]
>>
>> This is a comment on:
>> http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/#func-langMatches
>>
>> specifically the text:
>> [[
>> matches language-tag (first argument) per Matching of Language Tags
>> [RFC4647] section 2.1.
>> ]]
>>
>> Contents of comment:
>> - issue statement
>> - suggested editorial textual change
>> - further analysis and options (the bulk of the message, which to large
>> part can be ignored)
>>
>> Issue
>> =====
>> Section 2.1 of RFC 4647 defines basic language ranges, without giving
>> any semantics, nor defining an algorithm for "matches". Hence the word
>> "matches" in the quoted text is unbound, and without clear meaning.
>>
>> Sections 3.3.1 and 3.3.2 and 3.4 each describe different matching
>> algorithms that can be used with basic language ranges.
>>
>> Suggested text:
>> ===============
>> Replace
>> [[
>> Returns true if language-range (second argument) matches language-tag
>> (first argument) per Matching of Language Tags [RFC4647] section 2.1.
>> ]]
>> with
>> [[
>> Returns true if language-range (second argument) matches language-tag
>> (first argument).
>> language-range is a basic language range
>>  per Matching of Language Tags [RFC4647] section 2.1.
>> 'matches' is defined as basic filtering in [RFC4647] section 3.3.1.
>> ]]
> 
> http://www.w3.org/2001/sw/DataAccess/rq23/rq25#func-langMatches
> [[
> 11.4.12 langMatches
> 
>  xsd:boolean   langMatches (simple literal language-tag, simple literal language-range)
> 
> Returns true if language-tag (first argument) matches language-range
> (second argument) per the basic filtering scheme defined in [RFC4647]
> section 3.3.1. language-range is a basic language range per Matching
> of Language Tags [RFC4647] section 2.1. A language-range of "*"
> matches any non-empty language-tag string.
> ]]
> 
> I believe this addresses your comments above.
> 
> Do to a request from Addison Philipps on behalf of the i18n WG,
> a sentence has been added to the definition of LANG(?x):
> 
> http://www.w3.org/2001/sw/DataAccess/rq23/rq25#func-lang
> [[
> simple literal   lang (literal ltrl)
> 
> Returns the language tag of ltrl, if it has one. It returns "" if ltrl
> has no language tag. Note that the RDF data model does not include
> literals with an empty language tag.
> ]]
> 
> This comes from the text about how literals are represented in RDF.
> http://www.w3.org/TR/rdf-syntax-grammar/#section-literal-node
> [[
> If ·literal-language· is the empty string then the value is the
> concatenation of """ (1 double quote), the escaped value of the
> ·literal-value· accessor and """ (1 double quote).
> 
> Otherwise the value is the concatenation of """ (1 double quote), the
> escaped value of the ·literal-value· accessor ""@" (1 double quote and
> a '@'), and the value of the ·literal-language· accessor.
> ]]
> 
> I read this as saying that the object in
>   <rdf:Description><some:predicate xml:lang="">abc</...></...>
> exactly equals
>    <rdf:Description><some:predicate            >abc</...></...>
> 
> I will feel better about this interpretation with your blessing.
> 
>> Analysis
>> ========
>> The algorithm of section 3.4 is not suitable since it is scoped as
>> [[select[ing] the single language tag that best matches the [...]
>> request]]. i.e. it always gives exactly one result, when matching
>> against any non-empty set of languages - it does not define a boolean
>> function: lang-tag x lang-range => boolean, but a selection function
>> non-empty-list-of-lang-tags x lang-range => lang-tag
>>
>> The algorithm of section 3.3.2 is designed for extended language ranges,
>> which are more appropriate for the new features of RFC 4646 (such as
>> script subtags).
>>
>> The reference to section 2.1 is indicative that SPARQL is more
>> interested in basic language ranges, which were already specified in RFC
>> 3066, and are suited to matching lang tags that conform with RFC 3066
>> (and hence also with RFC 4646). The algorithm of section 3.3.1 is hence
>> (IMO) currently the closest 'reading' of the SPARQL WD.
>>
>> Technically, the choice is:
>> a) use basic language ranges (section 2.1) and basic filtering (3.3.1)
>> or
>> b) use extended language ranges (section 2.2) and extended filtering (3.3.2)
>>
>> FYGI, the extended language ranges are like language ranges except they
>> permit a "*" in any subtag position, e.g.
>> de-*-DE
>> de-DE-*
>> (but not de-DE*)
>> When used with extended filtering, any -*- is effectively ignored, and
>> treated as -, but note that an initial *- is significant.
>>
>> Then (simplifying by ignoring private use and other extensions) a lang
>> range matches a lang tag if  both
>> a) the first subtags match
>> b) (ignoring the *'s) the "-"-separated sequence of the language range
>> is a subsequence (allowing arbitrary deletions) of the "-"-separated
>> sequence of the language tag.
>>
>> The reason this is more appropriate for new RFC 4646 style tags is that
>> RFC 4646 allows additional information, such as script subtags, to be
>> inserted in the appropriate place in a tag.
>>
>> So, the example given in RFC 4647 is that
>>
>> de-DE basic matches de-DE (i.e. german as spoken in Germany)
>> de-DE basic matches de-DE-1966 (i.e. german as spoken in Germany,
>> written with the orthography of 1996)
>> de-DE does not basic match de-Latf-DE (i.e. german, as spoken in
>> Germany, written in the Fraktur variant of the Latin script)
>>
>> whereas
>> both the basic matches are extended matches (indeed, any basic match is
>> an extended match), but also
>> de-DE extended matches de-Latf-DE
>> which is probably more consistent behaviour from the end users point of
>> view when using such new features of RFC 4646 style tags.
>>
>> It is plausible that some semantic web applications may well have a need
>> for using extended language ranges like "*-Latn", for example, to
>> populate some part of a web page, when no content exactly matching the
>> current language preferences has been found. Many users have a
>> preference for text in a script they can read, even if they don't
>> understand it, over a perhaps intelligible word, written in a script
>> that is not intelligible. This use case however, depends on widespread
>> use of RFC 4646 script subtags, which, while possibly desirable is not a
>> current actuality. Moreover, code that worked to end user satisfaction
>> would also depend on appropriate deployment of section 4.1 of RFC 4646
>> (choice of language tag) either in the code or the processes of
>> constructing the semantic web data or both, so that script codes were
>> used consistently.
>>
>> Thus, I have suggested the more conservative change, but would be
>> equally satisfied if the SPARQL WG wanted to embrace extended language
>> ranges!
> 
> Thank you for the thorough analysis. It certainly prepared me to
> discuss this text to the i18n group. If you are satisfied with,
> please reply with [CLOSED], and thanks again.

-- 
Hewlett-Packard Limited
registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Friday, 11 May 2007 16:35:43 UTC