Re: Summary of strings, markup, and language tagging in RDF (resend) from Patrick Stickler on 2003-07-03 (w3c-rdfcore-wg@w3.org from July 2003)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 3 Jul 2003 13:39:24 +0300
To: <w3c-rdfcore-wg@w3.org>, "Brian_McBride" <bwm@hplb.hpl.hp.com>, <jjc@hplb.hpl.hp.com>, "ext pat hayes" <phayes@ihmc.us>
Cc: "Martin Duerst" <duerst@w3.org>
Message-ID: <008c01c3414f$667bff50$580ea20a@NOE.Nokia.com>
----- Original Message -----
From: "ext pat hayes" <phayes@ihmc.us>
To: <w3c-rdfcore-wg@w3.org>; "Brian_McBride" <bwm@hplb.hpl.hp.com>;
<jjc@hplb.hpl.hp.com>
Cc: "Martin Duerst" <duerst@w3.org>
Sent: 03 July, 2003 04:52
Subject: Re: Summary of strings, markup, and language tagging in RDF
(resend)


>
> >
> >The wrapper is one solution to carry the language information.
> >Of course you can choose whatever solution fits you best, but
> >you should not forget that there are other solutions. One of
> >them would be to handle XML Literals in the same way as plain
> >literals, carrying the language information separately. If that
> >can be done for plain literals, why can it not be done for
> >XML Literals?
> >
>
> Martin puts his finger on the key point. It could be, but we chose a
> design for XML literals in which the XML 'label' is treated as a
> built-in datatype; which then puts a strong design constraint on us
> to treat it uniformly with the other datatypes, and that in turn
> requires either than it not have lang tags or that all other datatype
> namespaces have lang tags.  The latter option is unworkable, so we
> chose the former.
>
> Since this issue seems to be so centrally important, and since our
> design now appears to people like Martin to be so completely
> brain-damaged,

I think that is perhaps a bit strong. It may not be ideal for
folks having an XML-centric view, but it's certainly not
brain-damaged.

I think perspective counts for alot in understanding the
tradeoffs inherent in the present solution.

See my recent long post touching on this.

> let me propose that we re-open this issue

Please, don't.

> and change
> our design slightly, by reverting to an older design. The trouble
> seems to arise from our insisting that XML literals are treated
> uniformly with typed literals: so let us abandon that idea, in spite
> of its being very neat,

It is much more than just being "very neat". It has direct and substantial
gains for distributed, metadata driven, modular content management
as well as lays the foundation for the full support of XML Schema complex
typing in RDF. It's not simply our favorite color.

> and revert to the state where the XML
> literals as treated as a special syntactic case in the RDF graph, so
> that there would be five kinds of literal: plain and XML with and
> without lang tags, plus datatyped literals.

I agree that this would, in a way, be a bit closer to the original M&S,
but also, I don't consider the present solution to be contrary to
M&S, and is a much more useful long range solution.

Again, there are better ways to model language qualification than
xml:lang (even though at the expense of additional triples) and
the fact that lang tags for plain literals are invisible to generic
inference rules is IMO a far greater shortcoming of the final
solution than not having lang tags on XML literals. But that's another
(and probably needless, at this moment) discussion.

Literals are, after all, literals, so it seems to me to be pretty shoddy
engineering to allow the contextual characteristics of the serialization
syntax infect *literals* in any way shape or form. That includes
plain literals. IMO, the original M&S decision to allow xml:lang to apply
at all insofar as the semantics of the RDF graph is concerned is the
real mistake. That decision is of course understandable, I think,
considering
the newness of both XML and RDF at the time, and the lack of
a distinct MT, but regrettable, and we struggle with that legacy.

At least, for now, that error is limited only to plain literals. Let's not
make the impact of that error broader (again) by reintroducing it
into the treatment of XML literals.

> In detail, the proposal is as follows.
>
> 1. There are five kinds of literal in an RDF graph, indicated in
> Ntriples as follows:
> "string" plain
> "string"@tag plain plus lang tag
> "string"^^rdf:XMLLIteral XML
> "string"@tag^^rdf:XMLLiteral XML plus lang tag
> "string"^^foo:baz typed, where foo:baz is any
> URI other than 'rdf:XMLLiteral'
>
> Notice that the Ntriples way of indicating the XML case is just as it
> is now, but thats just a syntactic decision to save work;
> rdf:XMLLiteral isn't a datatype and XML literals are not typed
> literals in this design, so the possibility of having lang tags in
> its lexical space isn't going to cause any headaches..

See my recent post about why I think this is less optimal than
the present solution, insofar as modular content management is
concerned.

The problem with xml:lang has always been that it is intended for
the consumption of XML content by XML applications. The only
real world XML application that operates on RDF/XML is an
RDF parser -- and for the *parser* the xml:lang scope for XML
literals is visible and relevant BUT the parser is not required to
convey all information that might be available regarding XML content
to the RDF graph. The RDF spec specifies what in the XML instance
is considered relevant to RDF applications, and how that will be
organized for RDF applications, and we are free to discard whatever
we like in the XML, so long as we are clear about what we are doing.

Our specs are very clear about the non-relevance of xml:lang scope
on XML literals insofar as the semantics of *RDF* (not *XML*) are
concerned. XML semantics only is relevant to an RDF parser.
End of story. *RDF* users (as opposed to *XML*
users) will be aware of this (if they've read even the primer) and should
act accordingly.

The only argument against this particular solution that may be valid
is that XML users who have no clue about RDF might be confused
about the non-relevance of xml:lang for certain RDF constructs, namely
typed literals and thus, XML literals.

Well, they can grab the specs and learn a bit about the tool they are
using. At the risk of offending, and apologies in advance, this really
is a case of RTFM.

We are not violating the XML specs by disregarding xml:lang in
the way we do, and in fact, we are simply making manditory what was
otherwise optional for RDF applications in M&S, that the xml:lang scope
could be disregarded.

And the decisions are not residue from the WG process. There are
solid reasons for modelling XML Literals as typed literals.

> ...
>
> 4. Regarding Martin's other beef, that some XML without any markup in
> it is 'really' just plain text,

I'm not 100% sure that this is in fact Martin's position, but if it is, or
is
anyone else's position, then my reply is that this is simply wrong.

The difference between a plain literal and an XML literal, regardless
of the presence of markup, is that an RDF application is free to
presume that an XML literal constitutes well-formed XML, whereas
a plain literal need not. Period. It's as simple as that.

An XML literal is a string that conforms to XML well-formedness
conditions. Furthermore, the comparison of XML literals is not
based on string-equality. These important distinctions from plain
strings are captured semantically in the definition of the datatype
rdf:XMLLiteral and in the RDF datatyping model.

Though this is not pointed out anywhere explicitly in the RDF specs
(which is a shame, but understandable, since it needs a bit more
testing to ensure there are no major dragons) the present RDF datatyping
solution, with XML Literals modelled as a datatype, allow for us
to support the entire range of XML Schema types, including complex
types! And thereby, define property ranges to be e.g. xhtml:title,
asserting that all property values conform to the content model
constraining the lexical space of xhtml:title elements. Such benefits
simply dissappear if XML Literals are not modeled as typed
literals -- and we certainly don't want to go back to treating
rdf:XMLLiteral as a special case of datatype with lang tag.

Again, the WG has chosen between many "least of all evils"
or "best of all options" and IMO has chosen well.

We can't make everyone happy, even if we very much want to,
and I also think that the dissatisfaction of some regarding the
present solution is mostly a matter of perspective or perception
of the relationship between the RDF graph and its XML
representation and not actual technical or practical shortcomings
in the solution itself.

While I'm very sympathetic to Martin's concerns about
consistency and agreement between standards and the risk of
misunderstanding to XML-only users, I have not noted any
problems with the present solution that I would consider show
stoppers.

I feel that this issue should remain closed, and that we should
wrap up.

Regards,

Patrick

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
Received on Thursday, 3 July 2003 06:40:07 UTC