Re: 11.4.12 langMatches from Eric Prud'hommeaux on 2007-04-26 (public-rdf-dawg-comments@w3.org from April 2007)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 26 Apr 2007 09:50:21 -0700
To: Jeremy Carroll <jjc@hpl.hp.com>
Cc: public-rdf-dawg-comments@w3.org
Message-ID: <20070426165021.GB14194@w3.org>
You indicated this was being addressed to your satisfaction, but some
extra text has been added to LANG(?x) and I want to make sure you're
content with that as well.

* Jeremy Carroll <jjc@hpl.hp.com> [2007-04-05 10:34+0100]
> 
> 
> This is a comment on:
> http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/#func-langMatches
> 
> specifically the text:
> [[
> matches language-tag (first argument) per Matching of Language Tags
> [RFC4647] section 2.1.
> ]]
> 
> Contents of comment:
> - issue statement
> - suggested editorial textual change
> - further analysis and options (the bulk of the message, which to large
> part can be ignored)
> 
> Issue
> =====
> Section 2.1 of RFC 4647 defines basic language ranges, without giving
> any semantics, nor defining an algorithm for "matches". Hence the word
> "matches" in the quoted text is unbound, and without clear meaning.
> 
> Sections 3.3.1 and 3.3.2 and 3.4 each describe different matching
> algorithms that can be used with basic language ranges.
> 
> Suggested text:
> ===============
> Replace
> [[
> Returns true if language-range (second argument) matches language-tag
> (first argument) per Matching of Language Tags [RFC4647] section 2.1.
> ]]
> with
> [[
> Returns true if language-range (second argument) matches language-tag
> (first argument).
> language-range is a basic language range
>  per Matching of Language Tags [RFC4647] section 2.1.
> 'matches' is defined as basic filtering in [RFC4647] section 3.3.1.
> ]]

http://www.w3.org/2001/sw/DataAccess/rq23/rq25#func-langMatches
[[
11.4.12 langMatches

 xsd:boolean   langMatches (simple literal language-tag, simple literal language-range)

Returns true if language-tag (first argument) matches language-range
(second argument) per the basic filtering scheme defined in [RFC4647]
section 3.3.1. language-range is a basic language range per Matching
of Language Tags [RFC4647] section 2.1. A language-range of "*"
matches any non-empty language-tag string.
]]

I believe this addresses your comments above.

Do to a request from Addison Philipps on behalf of the i18n WG,
a sentence has been added to the definition of LANG(?x):

http://www.w3.org/2001/sw/DataAccess/rq23/rq25#func-lang
[[
simple literal   lang (literal ltrl)

Returns the language tag of ltrl, if it has one. It returns "" if ltrl
has no language tag. Note that the RDF data model does not include
literals with an empty language tag.
]]

This comes from the text about how literals are represented in RDF.
http://www.w3.org/TR/rdf-syntax-grammar/#section-literal-node
[[
If ·literal-language· is the empty string then the value is the
concatenation of """ (1 double quote), the escaped value of the
·literal-value· accessor and """ (1 double quote).

Otherwise the value is the concatenation of """ (1 double quote), the
escaped value of the ·literal-value· accessor ""@" (1 double quote and
a '@'), and the value of the ·literal-language· accessor.
]]

I read this as saying that the object in
  <rdf:Description><some:predicate xml:lang="">abc</...></...>
exactly equals
   <rdf:Description><some:predicate            >abc</...></...>

I will feel better about this interpretation with your blessing.

> Analysis
> ========
> The algorithm of section 3.4 is not suitable since it is scoped as
> [[select[ing] the single language tag that best matches the [...]
> request]]. i.e. it always gives exactly one result, when matching
> against any non-empty set of languages - it does not define a boolean
> function: lang-tag x lang-range => boolean, but a selection function
> non-empty-list-of-lang-tags x lang-range => lang-tag
> 
> The algorithm of section 3.3.2 is designed for extended language ranges,
> which are more appropriate for the new features of RFC 4646 (such as
> script subtags).
> 
> The reference to section 2.1 is indicative that SPARQL is more
> interested in basic language ranges, which were already specified in RFC
> 3066, and are suited to matching lang tags that conform with RFC 3066
> (and hence also with RFC 4646). The algorithm of section 3.3.1 is hence
> (IMO) currently the closest 'reading' of the SPARQL WD.
> 
> Technically, the choice is:
> a) use basic language ranges (section 2.1) and basic filtering (3.3.1)
> or
> b) use extended language ranges (section 2.2) and extended filtering (3.3.2)
> 
> FYGI, the extended language ranges are like language ranges except they
> permit a "*" in any subtag position, e.g.
> de-*-DE
> de-DE-*
> (but not de-DE*)
> When used with extended filtering, any -*- is effectively ignored, and
> treated as -, but note that an initial *- is significant.
> 
> Then (simplifying by ignoring private use and other extensions) a lang
> range matches a lang tag if  both
> a) the first subtags match
> b) (ignoring the *'s) the "-"-separated sequence of the language range
> is a subsequence (allowing arbitrary deletions) of the "-"-separated
> sequence of the language tag.
> 
> The reason this is more appropriate for new RFC 4646 style tags is that
> RFC 4646 allows additional information, such as script subtags, to be
> inserted in the appropriate place in a tag.
> 
> So, the example given in RFC 4647 is that
> 
> de-DE basic matches de-DE (i.e. german as spoken in Germany)
> de-DE basic matches de-DE-1966 (i.e. german as spoken in Germany,
> written with the orthography of 1996)
> de-DE does not basic match de-Latf-DE (i.e. german, as spoken in
> Germany, written in the Fraktur variant of the Latin script)
> 
> whereas
> both the basic matches are extended matches (indeed, any basic match is
> an extended match), but also
> de-DE extended matches de-Latf-DE
> which is probably more consistent behaviour from the end users point of
> view when using such new features of RFC 4646 style tags.
> 
> It is plausible that some semantic web applications may well have a need
> for using extended language ranges like "*-Latn", for example, to
> populate some part of a web page, when no content exactly matching the
> current language preferences has been found. Many users have a
> preference for text in a script they can read, even if they don't
> understand it, over a perhaps intelligible word, written in a script
> that is not intelligible. This use case however, depends on widespread
> use of RFC 4646 script subtags, which, while possibly desirable is not a
> current actuality. Moreover, code that worked to end user satisfaction
> would also depend on appropriate deployment of section 4.1 of RFC 4646
> (choice of language tag) either in the code or the processes of
> constructing the semantic web data or both, so that script codes were
> used consistently.
> 
> Thus, I have suggested the more conservative change, but would be
> equally satisfied if the SPARQL WG wanted to embrace extended language
> ranges!

Thank you for the thorough analysis. It certainly prepared me to
discuss this text to the i18n group. If you are satisfied with,
please reply with [CLOSED], and thanks again.
-- 
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
mobile: +1.617.599.3509

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Thursday, 26 April 2007 16:50:54 UTC