More fun with xml:lang

On 2002-02-15 15:59, "ext Dave Beckett" <dave.beckett@Bristol.ac.uk> wrote:

>>>> Patrick Stickler said:
>> On 2002-02-15 12:10, "ext Brian McBride" <bwm@hplb.hpl.hp.com> wrote:
>> 
>>> At 11:04 15/02/2002 +0200, Patrick Stickler wrote:
>>> [...]
>>>>    <dc:date rdf:value="2002-02-14" rdf:dtype="&xsd;date"/>
>>> 
>>> Am I right that under the current proposal this can be more
>>> compactly written:
>>> 
>>>  <dc:date xsd:date="2002-02-14"/>
>> 
>> Yes and no. It is more compact, but is less local (thus it
>> is not exactly an equivalent variant of the doublet idiom).
>> 
>> In this case, it is not clear from the RDF that xsd:date is a
>> datatype. It could be any kind of property at all. It has
>> no more datatyping clarity to the parser than
>> 
>>    <dc:date foo:bar="2002-02-14"/>
> 
> Yeah, that is an issue, but maybe just one of education - choose
> clearly the prefixes and local names for data type URIs.
> 
> If the foo:bar property isn't given an RDF datatype meaning via some
> extra info, then it remains a property?

Right. If it is not clear from some other statements
that it is a datatyping class, than it is just a property.

>> And one would not, I think, expect xml:lang to apply to
>> all attributes of the element -- or really to any of
>> the attributes, but rather only the content of the element,
>> and it's just a trick of rdf:value that the content can
>> be hidden (contracted) into an attribute.
> 
> "A special attribute named xml:lang may be inserted in documents to
> specify the language used in the contents and attribute values of
> any element in an XML document. "
> 
> -- http://www.w3.org/TR/2000/REC-xml-20001006#sec-lang-tag
> 
> so it is clear that xml:lang affects attribute values.

Well, I certainly stand corrected on this technically, though
see below...

> 
>> If I have
>> 
>>  <dc:title rdf:value="Foo" xml:lang="en" x:scope="237a87"/>
>> 
>> we're saying that "Foo" is English, but not "237a87".
> 
> XML says both are.

That may be so, but that doesn't necessarily mean they should be
(or that it is practical in a KR context).

Consider the following two separate graphs, adapted from the DC
rec on DCQ in RDF:

<?xml version="1.0" ?>
<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:dcterms="http://purl.org/dc/terms/"
     xml:lang="en">  <!-- *** -->
  <Description>
    <dc:subject>
      <Description>
         <value>19D10</value>
         <rdfs:label>Algebraic K-Theory of spaces</rdfs:label>
      </Description>
    </dc:subject>
  </Description>
</RDF>

and then

<?xml version="1.0" ?>
<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:dcterms="http://purl.org/dc/terms/"
     xml:lang="fi">   <!-- *** -->
  <Description>
    <dc:subject>
      <Description>
         <value>19D10</value>
         <rdfs:label>Algebraic K-Theory of spaces</rdfs:label>
      </Description>
    </dc:subject>
  </Description>
</RDF>

in these two cases, because the "language" of the two
RDF instances is specified as being different, all literals
will be pairs of string + language and ("19D10","en")
will never match ("19D10","fi") in any query even though they are
logically supposed to be the same value.

Yes, it is possible to constrain the xml:lang attribute to
the narrowest possible scope -- but

(a) we would have to be very careful to educate users about
    this potential gotcha

(b) the whole logic of xml:lang as applying to all content
    and all attribute values is just wrong

There is a big difference between defining a "primary language"
scope for a document and saying that every last bit of data
content is expressed in that language. The broad and absolute
scope of applicability of xml:lang is too fuzzy and imprecise to be
meaningful in a KR context and clearly reflects the general
"documentation" mindset of the original XML authors.

It's clear that RDF users have real needs to be able to qualify
literals by language (Nokia is certainly no exception to that!).

I'm just wondering of xml:lang really is the proper mechanism
for that, given it's lack of precision and potential for unintended
language qualification of non-linguistic data.

We may have far more headaches with string+lang structured
literals than some other methodology.

Eh?

Patrick
   
--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Saturday, 16 February 2002 11:01:34 UTC