Re: weekly call for agenda items from Patrick Stickler on 2002-11-21 (w3c-rdfcore-wg@w3.org from November 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 21 Nov 2002 11:39:29 +0200
To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "pat hayes" <phayes@ai.uwf.edu>
Cc: "Brian McBride" <bwm@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <00e101c29141$e35d75d0$149316ac@NOE.Nokia.com>
[Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, patrick.stickler@nokia.com]


----- Original Message ----- 
From: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>
To: "Patrick Stickler" <patrick.stickler@nokia.com>; "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>; "pat hayes" <phayes@ai.uwf.edu>
Cc: "Brian McBride" <bwm@hplb.hpl.hp.com>; <w3c-rdfcore-wg@w3.org>
Sent: 21 November, 2002 11:01
Subject: RE: weekly call for agenda items


> 
> Replying to both Patrick messages in one go.
> >
> > I find this root tag a rather ugly hack
> 
> correct.
> 
> > that seems unnecessary, if we
> > take the value space to be the set of infosets,
> 
> equality is not defined on infosets.

Really? That's a surprise. I thought the whole point of infosets
was to provide a platform and application consistent interpretation
of an XML serialization according to its semantics (per XML).

My understanding was that equality is based on tree-equality,
which is definitely (and easily) testable.

> > which would (magically)
> > include the specified xml:lang value at their root scope.
> >
> > I really would like to see the root tag treatment go away. I think it
> > will confuse alot of folks and is rather kludgy.
> 
> The earlier treatment had it but in a slightly less in-your-face manner. On
> reflection I think the hoops that treatment jumped were more likely to
> confuse, and being blunt, kludgy and ugly gains clarity.
> At some level we inherited something that wasn't clear - we have made it
> clear. If we were to start from scratch we would not have the rdf-wrapper,
> e.g. change rdf:parseType="Literal" to be
>  <eg:prop parseType="Literal" >
>    <!--whitespace comment or PI -->
>    <xml-content>
>      <!-- just one element allowed, being the root element of the
>        corresponding value.
>    </xml-content>
>    <!--whitespace comment or PI -->
>  </eg:prop>

This wouldn't be acceptable, as there are many valid cases that
call for mixed content in XML literals.

I think the real problem here is the xml:lang attribute. Having
the xml:lang scope infect XML literals is IMO a breakdown in the
division between RDF semantics and XML semantics, and is one
of those key issues regarding the relationship between RDF and XML.
RDF does indeed use XML for its serialization, and to that extent
the xml:lang scope is valid *for the RDF parser* but it does not
need to be semantically valid (IMO) in the graph and thus need not
infect XML literals.

I.e., the RDF parser should presume a fixed attribute xml:lang=""
on every element having a parseType="Literal" attribute.

Since an XML literal is in fact XML, the content creator is free
to specify an xml:lang attribute *in* the XML literal itself.

Semantically, the lang tag should have no significance in the graph,
for any kind of literal, XML or otherwise. And to that end, even though
less convenient, I've already conceded that it could be omitted from
typed literals. 

I can go further and concede that it could (possibly even should) be
omitted from all literals entirely, that xml:lang does not apply to
literal values at all and M&S "got it wrong" because the semantics
of XML and that of RDF are not equivalent and xml:lang values are
significant only in the domain of the RDF parser, not RDF applications
operating on the graph.

I think excluding lang tags from all literals would simplify things
enormously, remove the need for any kludgy wrapper element for XML
literals, and make the treatment of all types of literals consistent.

> We still have the problems associated with invisibly used namespaces 

Well, again, I consider XML literals to be fragments of XML serializations
and thus having contextual interpretation. Using XML literals to e.g.
build modular content where XML literals are boilerplate fragments seems
at odds with the present treatment of XML literals which seems to want
to interpret them in the context of the RDF/XML serialization, to which
they likely have no semantic relationship to whatsoever.

> 
> >
> > Also, I think more folks will be able to relate to an infoset rather
> > than a canonical serialization.
> 
> Another problem with infoset is which one?
> In your other message you talk about Post Schema Validation Infoset, which
> since we don't have a schema, presents particular difficulties ...

A fair point.

> Also, Infoset includes a lot of things which we currently don't do, e.g.
> entities are preserved, all namespace attributes are preserved. (Does
> Infoset include document initial whitespace, or the XML declaration?)

I don't believe so, but I'd have to check.

> > Of course, I don't myself think that an xml:lang attribute in an
> > RDF/XML instance should infect an XML Literal as the literal is not
> > part of the RDF language and is only there as XML as a convenience
> > but not for any semantic reason, but...
> >
> > (i.e. XML literals should be treated as other literals where the
> > lang tag, if present, does not affect the denotation of the literal)
> 
> Just put a rdf:parseType="Literal" in a chat or shopping example. The
> langtag needs to be there.

If it is relevant to the XML literal, then it can be put *into* the
XML literal. But an XML literal (nearly always) has no semantic relation
whatsoever to the RDF/XML serialization, and should be kept semantically
isolated from the RDF/XML and that means not allowing an xml:lang value
in the RDF/XML level from infecting the XML literal.

The problem here is that RDF provides no proper (or at least recommended)
mechanisms for scoping. Reification can be used, but it's not "blessed"
for such use. And so there is the current practice of using xml:lang to
scope literal values per language -- which while I admit we use it, and
it's more work to use reification, I also will admit that it's a very
ugly hack.

The more I ponder about the semantic irrelevance of lang tags in typed
literals, the more I question whether it would be better for the RDF
community in the long run (however painful in the short term for us)
to get rid of them -- as well as for all literals.

It's simply not the right way to do scoping and it is mucking up literals
and datatyping too much.

At the very least, it should have a consistent interpretation for all
literals such that, if present, it consititutes "syntactic residue" from
the RDF/XML serialization which is available to RDF applications but
does not affect the denotation of the literal or equality. Such an approach
would reasonably capture the current practice based on M&S while keeping
the semantics of the RDF graph free of infection from the XML layer.

Patrick
Received on Thursday, 21 November 2002 04:39:32 UTC