RE: Datatyping: moving away from "literal as 3-part thing" to "literal as dt+opaque bit" from Patrick.Stickler@nokia.com on 2002-09-03 (w3c-rdfcore-wg@w3.org from September 2002)

From: <Patrick.Stickler@nokia.com>
Date: Tue, 3 Sep 2002 09:02:53 +0300
To: <jjc@hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B5FBAB7@trebe006.europe.nokia.com>

> -----Original Message-----
> From: ext Jeremy Carroll [mailto:jjc@hpl.hp.com]
> Sent: 02 September, 2002 21:15
> To: w3c-rdfcore-wg@w3.org
> Subject: Re: Datatyping: moving away from "literal as 3-part thing" to
> "literal as dt+opaque bit"
> 
> 
> 
> >[Patrick said, at the telecon, "xml:lang infects everything" as an
> >example of this view]
> 
> >There should be no "infection"
> >of new types by stuff like language properties,
> 
> The unicode string in an XML document which gives the lexical 
> form of a 
> datatype literal may well be in scope of an xml:lang declaration.
> 
> But the current proposals expect the parser to know whether 
> it is parsing an 
> old-style literal (in which case xml:lang is significant) or 
> a new style 
> literal, in which case it is not.

Hmmmm, interesting. I've yet to see any proposal that spells this
out, that xml:lang information is discarded entirely.

Now, I agree that xml:lang does not affect the L2V mapping in 
any way, but if specified, it still must be presumed to be information
about the literal that is relevant to applications and must not
be discarded (even though it is known that xml:lang can overgenerate
such information).


> Thus
> 
>   <a:prop xml:lang="en" rdf:ltype="&xsd;string">banana</a:prop>
> 
> would deliver the value <xsd:string>"banana" and the language 
> declaration has 
> no effect. (If you want an xsd:string, you don't get a langstring.
> 
> Jeremy

Unfortunately, this precludes being able to use xml:lang with
explicitly typed xsd:string values, which I consider unacceptable.

Consider the following use case:

   <rdf:Description rdf:about="#TheEnglishLanguage">
      <rdfs:label xml:lang="en" rdfd:type="&xsd;string">English</rdfs:label>
      <rdfs:label xml:lang="fi" rdfd:type="&xsd;string">Englanti</rdfs:label>
      <rdfs:label xml:lang="sp" rdfd:type="&xsd;string">Ingles</rdfs:label>
   </rdf:Description>

which I would expect to produce

   <#TheEnglishLanguage> rdfs:label xsd:string"English"-en .
   <#TheEnglishLanguage> rdfs:label xsd:string"Englanti"-fi .
   <#TheEnglishLanguage> rdfs:label xsd:string"Ingles"-sp .

so that my RDF application can choose which label is most appropriate,
per the intentionally specified language. If all I get is 

   <#TheEnglishLanguage> rdfs:label xsd:string"English" .
   <#TheEnglishLanguage> rdfs:label xsd:string"Englanti" .
   <#TheEnglishLanguage> rdfs:label xsd:string"Ingles" .

then I have lost some crucial information needed for e.g. autogeneration
of GUIs, etc.

And if literals cannot be subjects, nor are tidy, then how can I assert the
language of the particular string values otherwise?

To be quite honest, I find hiding the xml:lang information in the structure
of the literal, rather than generating triples, to be highly distasteful, but
that's what we've got at the moment, so ...    Ideally, we'd just let literals
be subjects and have a trivially simple solution to all this mess, for both
datatyping and xml:lang attribution:

   <rdf:Description rdf:about="#TheEnglishLanguage">
      <rdfs:label xml:lang="en" rdf:type="&xsd;string">English</rdfs:label>
   </rdf:Description> 

would give us

   <#TheEnglishLanguage> rdfs:label _:a"English" .
   _:a"English" rdf:type xsd:string .
   _:a"English" xml:lang _:b"en" .

etc...  even if at this time we don't extend RDF/XML yet to make explicit
statements about literal subjects but just leave such statements as output
from the parser...  but I guess we shouldn't go there anymore... maybe
in RDF 2.x ... maybe ...

--

At the very least, for now, xml:lang codes must be part of the typed literal
node structure just as they are for the untyped literal node structure, and
just as it is specified in the restructured document.

Regards,

Patrick

Received on Tuesday, 3 September 2002 02:02:56 UTC