[OK?] Re: [SPARQL] i18n comment: Modification in description of langMatches operator

* Addison Phillips <addison@inter-locale.com> [2007-04-25 14:03+0100]
> Eric Prud'hommeaux wrote:
> 
> >>>1. Language matching in RFC 4647 is defined in terms of "language 
> >>>priority lists" made up of "language ranges". It may be useful to 
> >>>incorporate this concept into SPARQL query. If necessary, you may 
> >>>limit the list to a single range.
> >>>     
> >>>
> >
> >That is the intention. Multiple ones may be expressed as multiple
> >langMatches tests:
> >
> > FILTER (langMatches(lang(?x), "en") || langMatches(lang(?x), "es"))
> >
> > 
> >
> 
> The problem I see with this is that implementations of matching may 
> already be in terms of language priority lists. Also, note that the 
> range can be an expression---taking its value, for example, from HTTP 
> Accept-Language. Ideally I'd like to see a language priority list here.

The last draft SPARQL was defined in terms of RFC3066 matching (single
language range). Changing this would require all the implementations
to re-visit their code. I will bring this up in the next DAWG telecon,
noting that your comment can be satisfied even if it isn't adopted. To
that end, let me make sure I understand your arguments:

Were you arguing that SPARQL implementations may already take priority
lists? or that if some used out-of-the-box libraries, the libraries
might take priority lists?

Are there any applications that use basic language priority lists
today (I know, classic chicken and egg problem)? HTTP Accept-Language
headers are more complex as they include quality quotients.

> >>>2. The special range "*" usually matches all language tags, including 
> >>>the empty tag. If it didn't, you would have the problem of not being 
> >>>able to select contents with no tag except explicitly. That is, to 
> >>>select everything, you'd need two queries: one for "*" and one for the 
> >>>empty tag. (Obviously, omitting the langmatches statement has the same 
> >>>effect, so your current text may be by design??)
> >>>     
> >>>
> >
> >Yes, lang("abc") returns an empty string as giving type errors would
> >make the language more cumbersome. The use case for looking for
> >anything with a language tag drove langMatche("", "*") => false.
> > 
> >
> 
> Okay, that makes sense. But it should be documented clearly, since it 
> isn't quite RFC 4647. This suggests, please note, something that I 
> should take back to the LTRU WG at IETF (where 4647 is maintained).

Agreed, though I quibble with the wording below, believing that there
are no literals with a language tag of "" as they are represented as
literals with no language tag. I propose pointing that out in the lang
function call. I've added a sentence to the editor's draft:

http://www.w3.org/2001/sw/DataAccess/rq23/rq25#func-lang
[[
simple literal   lang (literal ltrl)

Returns the language tag of ltrl, if it has one. It returns "" if ltrl
has no language tag. Note that the RDF data model does not include
literals with an empty language tag.
]]

> >>>3. You don't have a way of specifying the empty tag, or at least you 
> >>>don't enumerate it. The empty tag only matches itself. That is:
> >>>
> >>>FILTER langMatches( lang(?title), "")
> >>>
> >>>only matches items with an xml:lang=""
> >>>
> >>>You should call this fact out.
> >>>     
> >>>
> >
> >RDF literals with empty language tags are treated as literals
> >with no language tag.
> > http://www.w3.org/TR/rdf-syntax-grammar/#section-literal-node
> >so <rdf:Description><some:predicate xml:lang="">abc</...></...>
> >exactly equals 
> >  <rdf:Description><some:predicate            >abc</...></...>
> >
> > 
> >
> Yes, but you have no way to select *only* the items with no language 
> tag? ("*" is available to find any non-empty value).
> 
> I know that your examples are equal: I want to select those distinct from:
> 
>  <rdf:Description><some:predicate xml:lang="de">foo</...></...>

You can't distinguish by using langMatches(lang(?x)) , but by using
lang(?x)="" .

In fact, this works for queries for literals with our without a
language tag. Ultimately, I think the motivation for the "*" rules is
to be an intuitive interpretation of "matches any tag" and that it is
more intuitive that it not match literals with no language tag; an
artifact that it's being used on data with and without language
tags. That said, I can't promise that that was the extent of the
wisdom. Some of these things require sending mail and regretting.

http://rfc.net/rfc4647.html#s3.3.1.
[[
The special range "*" in a language priority list matches any tag.  A
protocol that uses language ranges MAY specify additional rules about
the semantics of "*"; for instance, HTTP/1.1 [RFC2616] specifies that
the range "*" matches only languages not matched by any other range
within an "Accept-Language" header.
]]

> >[[
> >Returns true if language-range (second argument) matches language-tag
> >(first argument) according to the Basic Filter matching scheme in
> >Matching of Language Tags [RFC 4647] Section 3.3.1. language-range is
> >a basic language range per RFC 4647 Section 2.1. The special range "*"
> >matches any non-empty language-tag string.
> >]]
> >
> >I am content with either of these configurations, though slightly
> >prefer the one just uttered. If you are content with this wording,
> >please respond with a Subject: prefixed by "[CLOSED]". If not, let's
> >negotiate some more.
> > 
> >
> The wording is not the big issue to me. It's fine as long as technically 
> correct: it's editorial and I'm not concerned about how you phrase it so 
> much. I would reverse the range and tag in the first sentence (as a 
> nit). Maybe the following (text in {{{}}} is optional per above):
> 
> --
>  Returns true if the language-tag (first argument) matches the 
> language-range {{{s in the language priority list}}} (second argument). 
> The matching scheme is based on Basic Filtering from Matching of 
> Language Tags [RFC 4647, Section 3.3.1], with some minor modifications. 
> The special range "*" matches any non-empty language-tag string. Unlike 
> in RFC 4647, it does not match the empty string. The empty range matches 
> only items with an empty language-tag or lacking the language attribute 
> altogether.

Given the 4647 "matches any tag" wording, and the new "Note that the
RDF data model does not include literals with an empty language tag"
in 11.4.6 lang, I propose:

http://www.w3.org/2001/sw/DataAccess/rq23/rq25#func-langMatches
[[
Returns true if language-tag (first argument) matches language-range
(second argument) per the basic filtering scheme defined in [RFC4647]
section 3.3.1. language-range is a basic language range per Matching
of Language Tags [RFC4647] section 2.1. A language-range of "*"
matches any non-empty language-tag string.
]]

<insert usual, feel-free-to-close clause here>
-- 
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
mobile: +1.617.599.3509

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Thursday, 26 April 2007 03:51:31 UTC