Re: Literals: language and xml (was: Comments on new datatyping document, part 1) from Patrick Stickler on 2002-09-11 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 11 Sep 2002 10:04:16 +0300
To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "Graham Klyne" <GK@NineByNine.org>
Cc: "RDF core WG" <w3c-rdfcore-wg@w3.org>
Message-ID: <002101c25961$71b3e530$864416ac@NOE.Nokia.com>
[Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com]


----- Original Message ----- 
From: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>
To: "Graham Klyne" <GK@NineByNine.org>; "Patrick Stickler" <patrick.stickler@nokia.com>
Cc: "RDF core WG" <w3c-rdfcore-wg@w3.org>
Sent: 10 September, 2002 16:35
Subject: RE: Literals: language and xml (was: Comments on new datatyping document, part 1)


> 
> 
> (agreeing with Patrick I think)


Mostly, I think, but a few questions/comments below to test that ;-)


> My view is that the abstract syntax will say something like:
> 
> A Literal Node is labelled with one of:
> (a) - A datatype value

It cannot be labeled by a datatype value. It can only be
labeled with a URIref denoting the datatype and a lexical
form -- which together denote a datatype value.

URIref nodes are not labeled with the resources they
denote, neither are typed literal nodes.

There are no native datatype values in the RDF graph, 
only labeled nodes which denote datatype values.

Perhaps we are in agreement on this, and it's just a matter
of getting the wording right (though I think you are suggesting
something different).

> (b) - An rdf string literal

It may be useful to say "a non-explicitly typed string literal".

> (c) - An rdf xml literal

I would rephrase the above list as

(a) an explicitly typed string literal    (<xsd:string>, "xyz")
(b) a non-explicitly typed string literal (_:a, "xyz")
(c) an XML literal                        (xml"xyz")

and if XML literals can be typed (and I don't see
why they couldn't):

(a) an explicitly typed string literal    (<xsd:string>, "xyz")
(b) a non-explicitly typed string literal (_:a, "xyz")
(c) an explicitly typed XML literal       (<xhtml:h1>, xml"<h1>Foo</h1>")
(d) a non-explicitly typed XML literal    (_:b, xml"<blarg>belch</blarg>")

> Typical RDF/XML giving rise to (a) is:
> 
> <rdf:Description>
>   <eg:prop rdf:datatype="&xsd;string">val<eg:prop>
> </rdf:Description>
> 
> (Label is <xsd:string>"val")

OK.

> (b)
> 
> <rdf:Description>
>   <eg:prop>val<eg:prop>
> </rdf:Description>
> 
> (Label is "val")

I thought it should be _:x"val"

Isn't that what you meant by syntactically untidy?

> (c)
> <rdf:Description>
>   <eg:prop rdf:parseType="Literal">val<eg:prop>
> </rdf:Description>
> 
> (Label is xml"val")

Well, perhaps it should be _:yxml"val" or such, of 
course, we have the problem with maintaining an explicit
partition between _:y and xml, as I've pointed out before.

And then also

(d)
<rdf:Description>
   <eg:prop rdf:datatype="&ex;someComplexType" rdf:parseType="Literal">val</eg:prop>
</rdf:Description>

(Label is <ex:someComplexType>xml"val")

> 
> Adding an xml:lang we get:
> (a)
> <rdf:Description xml:lang="en">
>   <eg:prop rdf:datatype="&xsd;string">val<eg:prop>
> </rdf:Description>
> 
> (Label is "val"
> It has to be an xsd:string, and so the language tag must be lost)

No. If the primary mechanism for specifying language for literal
content is xml:lang, then that information must not be lost from
the literal node.

The label here should be <xsd:string>"val"-en 

We *have* to have a mechanism for attributing language qualification
to literals.

Since literals can't be subjects, I see no other mechanism than
to attach it to the literal node label itself, as was decided
at the Bristol f2f.

Here, just because there is a datatype specified, does not
mean the language is not considered valid. I may wish to
say *both* that the property value is a string, *and* that
the string contains e.g. Finnish content.

No, the semantics of xsd:string does not care about the language
qualification and the xml:lang value does not affect the L2V
mapping, but applications will likely want to have that information.

> (b)
> 
> <rdf:Description xml:lang="en">
>   <eg:prop>val<eg:prop>
> </rdf:Description>
> 
> Label is "val"-en

Or rather _:x"val"-en

> (c)
> <rdf:Description xml:lang="en">
>   <eg:prop rdf:parseType="Literal">val<eg:prop>
> </rdf:Description>
> 
> Label is xml"val"-en

OK.

> The only choice is whether we allow:
> 
> <rdf:Description xml:lang="en">
>   <eg:prop rdf:parseType="Literal" rdf:datatype="&xsd;string>val<eg:prop>
> </rdf:Description>

In which case, we'd have

   <xsd:string>xml"val"-en

Fine.

> 
> If we did then the following would be problematic
> 
> <rdf:Description xml:lang="en">
>   <eg:prop rdf:parseType="Literal"
> rdf:datatype="&xsd;string><b>val</b><eg:prop>
> </rdf:Description>
> 
> My take is that it a syntax error.

I would say it correlates to

   <xsd:string>xml"<b>val</b>"-en

Where's the problem?

At the RDF level, there is no distinction between simple and
complex datatype, as is made by XML Schema. For RDF, a datatype
simply has a lexical space, a value space, and a mapping from
the former to the latter, but does not care nor can say what
resides in the value space. If the datatype in question is 
a complex datatype, and the lexical space contains XML serializations
(fragments), there is no problem with applying the RDF
datatyping mechanisms to associate any datatype (complex
or otherwise) with an XML literal. It is up to the definition
of the datatype itself whether or not the XML literal is an
acceptable representation of some member of its value space
(whatever that is -- and RDF doesn't have to say).

So whether or not either

   <xsd:string>xml"<b>val</b>"-en
or
   <xsd:string>xml"val"-en

validly denote datatype values of xsd:string is not RDF's
concern. That's up to the definition of xsd:string.

Taking a similar example to the above, but with a known
complex type, again, it is not RDF's concern whether either of

   <xhtml:h1>xml"<h1>val</h1>"-en
or
   <xhtml:h1>xml"val"-en

validly denote members of the value space of xhtml:h1 (and
we can presume that the latter example does not).

What matters here is what is being *asserted*. And the
mechanisms for making assertions about the datatype by
which the above literals are to be interpreted are all being
used correctly, even if some of the assertions turn out
to be false. 

And most importantly, the presence of the XML bit/flag and
the xml:lang value are totally *irrelevant* to the datatyping
semantics -- but nevertheless are necessary for RDF applications
in many contexts where those typed literals are to be used.

In the case of the xml:lang value, that is not relevant to
the L2V mapping, but is relevant to applications in how
the value might be selected or displayed -- i.e. the xml:lang
value is a "hidden statement" about the value denoted *by*
the literal, not about the literal itself, so it's no surprise
that it does not affect the interpretation of the literal.

In the case of the XML bit/flag, it is saying something about
the legacy RDF/XML representation of the lexical form -- and
one can very well express complex typed values without it:

   <rdf:Description>
      <ex:prop rdf:datatype="&xhtml;h1">&lt;h1&gt;val&lt;/&gt;</ex:prop>
   </rdf:Description>

providing

   <xhtml:h1>"<h1>val</h1>"

which is every much as valid a typed literal representation as

   <xhtml:h1>xml"<h1>val</h1>"

the only difference being in the latter case that one specified
instead a parseType=Literal for convenience sake, but that's all
just syntactic sugar, no? And it's no surprise that variation in
the RDF/XML representation of the literal does not affect the
interpretation of that literal.


Patrick
Received on Wednesday, 11 September 2002 03:04:43 UTC