Re: Literals: language and xml (was: Comments on new datatyping document, part 1) from Patrick Stickler on 2002-09-11 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 11 Sep 2002 15:44:59 +0300
To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "Graham Klyne" <GK@NineByNine.org>
Cc: "RDF core WG" <w3c-rdfcore-wg@w3.org>
Message-ID: <002801c25991$0a1c6c00$864416ac@NOE.Nokia.com>
[Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com]


----- Original Message ----- 
From: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>
To: "Patrick Stickler" <patrick.stickler@nokia.com>; "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>; "Graham Klyne" <GK@NineByNine.org>
Cc: "RDF core WG" <w3c-rdfcore-wg@w3.org>
Sent: 11 September, 2002 14:32
Subject: RE: Literals: language and xml (was: Comments on new datatyping document, part 1)


> 
> > > My view is that the abstract syntax will say something like:
> > >
> > > A Literal Node is labelled with one of:
> > > (a) - A datatype value
> >
> > It cannot be labeled by a datatype value. It can only be
> > labeled with a URIref denoting the datatype and a lexical
> > form -- which together denote a datatype value.
> >
> > URIref nodes are not labeled with the resources they
> > denote, neither are typed literal nodes.
> >
> > There are no native datatype values in the RDF graph,
> > only labeled nodes which denote datatype values.
> >
> > Perhaps we are in agreement on this,
> 
> No, I don't think we are.
> The abstract syntax is not a serialization.
> It is quite possible to use integers in it.

I disagree. Not that it couldn't be done insofar as
mathematics is concerned, but not insofar as RDF is
concerned.

Firstly, in the abstract syntax, nodes are labled with
names that denote resources, not with the resources
themselves, and datatype values are resources, not names.

Secondly, in order to get to values as labels, the RDF MT must
have full knowledge of the datatype's L2V mapping, and that
is not possible if we are to maintain RDF as a generic and
neutral tool for knowledge interchange.

> My understanding is that the node you would label with
> <xsd:int>"10" is in fact labelled with the integer 10.

No. It can't be.

Because the RDF MT does not know what xsd:int *means*
(other than being a datatype) it cannot provide the value
label 10 for the typed literal node <xsd:int>"10". The RDF
MT does not and cannot *know* that "10"->10.

> > and it's just a matter
> > of getting the wording right (though I think you are suggesting
> > something different).
> >
> > > (b) - An rdf string literal
> >
> > It may be useful to say "a non-explicitly typed string literal".
> >
> > > (c) - An rdf xml literal
> >
> > I would rephrase the above list as
> >
> > (a) an explicitly typed string literal    (<xsd:string>, "xyz")
> > (b) a non-explicitly typed string literal (_:a, "xyz")
> 
> This presupposes that it is waiting to be typed.


Not at all. It is simply preserving the uniqueness of the occurrence
of the literal.

Whether or not one asserts a MT where

   I(<_:a>"xyz") = I("xyz")

or 

   I(<_:a>"xyz") = I(L2V(_:a))("xyz") where _:a rdf:type rdfs:Datatype

is a secondary matter.

The untidy syntax neither presumes nor excludes either interpretation.
And if the WG decides to say nothing at all about the semantics
of inline literals, it provides useful machinery for those applications
which wish to assert untidy semantics.


> > (c) an XML literal                        (xml"xyz")
> >
> > and if XML literals can be typed (and I don't see
> > why they couldn't):
> 
> That is a Part II issue.

True, but ...

As I explained, I don't see that any distinction need be made
between XML and non-XML literals insofar as datatyping is
concerned.

So making that distinction may be artificial and incorrect,
and thus datatyping of XML literals may very well be implicitly
provided for in Part 1 as defined.

> > >
> > > (Label is <xsd:string>"val")
> >
> > OK.
> 
> Perhaps I should be more explicit.
> The label is the Unicode string "val" understood as a member of the
> xsd:string value space.

Oh, in that case, no. I don't see the necessity nor benefit to
having actual values as labels in the abstract syntax.

Do we say that the URIref nodes in the abstract syntax are labeled
by the the actual resources they denote?! I think not.

The label <xsd:string>"val" denotes a resource, a datatype value,
and we don't put resources as labels on nodes in the abstract
syntax -- rather we put the name that denotes those resources,
and the name that denotes the unicode string "val" that is a
member of the value space of xsd:string is <xsd:string>"val".

How about if I have the following URIref:

   val:(xsd:integer)10

why not then label the URIref node with 10 and make it
tidy with some other node that was denoted as a typed
literal <xsd:integer>"10"??? (don't answer ;-)

Resources are not the labels of nodes, and a value is a resource.

> 
> >
> > (d)
> > <rdf:Description>
> >    <eg:prop rdf:datatype="&ex;someComplexType"
> > rdf:parseType="Literal">val</eg:prop>
> > </rdf:Description>
> >
> > (Label is <ex:someComplexType>xml"val")
> 
> Part II

Yes, but...

> >
> > >
> > > Adding an xml:lang we get:
> > > (a)
> > > <rdf:Description xml:lang="en">
> > >   <eg:prop rdf:datatype="&xsd;string">val<eg:prop>
> > > </rdf:Description>
> > >
> > > (Label is "val"
> > > It has to be an xsd:string, and so the language tag must be lost)
> >
> > No. If the primary mechanism for specifying language for literal
> > content is xml:lang, then that information must not be lost from
> > the literal node.
> >
> > The label here should be <xsd:string>"val"-en
> 
> 
> No. My understanding is that on a datatyped literal the label is taken from
> the value space of the datatype. The value space of xsd:string is Unicode
> strings; thus <xsd:string>"val"-en is not in that value space.

OK, well it would have been clearer if you hadn't overloaded
the proposed abstract/N-Triples label syntax with some other
meaning.

Per my comments above, no, the label can't be the actual value.

> >
> > We *have* to have a mechanism for attributing language qualification
> > to literals.
> 
> We have. XML Schema datatypes does not provide a mechanism for using lang
> codes with datatype values and explicit suggest using xml which we support.
> I have earlier sent the references.

OK, I agree that if you are using the value itself as the label,
then the xml:lang code should be omitted. It wasn't clear that
you were using the value.

Which is another reason why not to use the value, if that language
information becomes unavailable/discarded at the application level.

> >
> > Since literals can't be subjects, I see no other mechanism than
> > to attach it to the literal node label itself, as was decided
> > at the Bristol f2f.
> >
> > Here, just because there is a datatype specified, does not
> > mean the language is not considered valid. I may wish to
> > say *both* that the property value is a string, *and* that
> > the string contains e.g. Finnish content.
> >
> > No, the semantics of xsd:string does not care about the language
> > qualification and the xml:lang value does not affect the L2V
> > mapping, but applications will likely want to have that information.
> >
> > > (b)
> > >
> > > <rdf:Description xml:lang="en">
> > >   <eg:prop>val<eg:prop>
> > > </rdf:Description>
> > >
> > > Label is "val"-en
> >
> > Or rather _:x"val"-en
> >
> > > (c)
> > > <rdf:Description xml:lang="en">
> > >   <eg:prop rdf:parseType="Literal">val<eg:prop>
> > > </rdf:Description>
> > >
> > > Label is xml"val"-en
> >
> > OK.
> >
> > > The only choice is whether we allow:
> > >
> > > <rdf:Description xml:lang="en">
> > >   <eg:prop rdf:parseType="Literal"
> > rdf:datatype="&xsd;string>val<eg:prop>
> > > </rdf:Description>
> >
> > In which case, we'd have
> >
> >    <xsd:string>xml"val"-en
> >
> > Fine.
> 
> No, it's not. xml"val"-en is not a string.

Well, I didn't think you were talking about values...

> I think we probably need to move to test cases.

And not use the same syntax to represent the proposed
name of the value and the value itself ;-)

Patrick
Received on Wednesday, 11 September 2002 08:45:14 UTC