Re: Summary of strings, markup, and language tagging in RDF (resend) from Martin Duerst on 2003-07-03 (w3c-rdfcore-wg@w3.org from July 2003)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 03 Jul 2003 11:04:18 -0400
To: "Patrick Stickler" <patrick.stickler@nokia.com>, <w3c-rdfcore-wg@w3.org>, "Brian_McBride" <bwm@hplb.hpl.hp.com>, <jjc@hplb.hpl.hp.com>, "ext pat hayes" <phayes@ihmc.us>
Message-Id: <4.2.0.58.J.20030703093836.04ec18d8@localhost>
Hello Patrick,

At 13:39 03/07/03 +0300, Patrick Stickler wrote:

>I think perspective counts for alot in understanding the
>tradeoffs inherent in the present solution.

I think this mail clearly shows your perspective. I'm
agreeing with some of your perspectives, but I have to
disagree on quite a few details.


>The problem with xml:lang has always been that it is intended for
>the consumption of XML content by XML applications. The only
>real world XML application that operates on RDF/XML is an
>RDF parser -- and for the *parser* the xml:lang scope for XML
>literals is visible and relevant BUT the parser is not required to
>convey all information that might be available regarding XML content
>to the RDF graph.

Well, it doesn't convey whether an attribute was single-quoted or
double-quoted, but then even the XML parser may not convey that
information. It does eliminate, in certain well-defined ways,
the difference between attributes and elements, because that is
seen as a syntactic distinction. But I'm not aware of any
actual information in attributes or elements that is just
thrown out of the window.


>Our specs are very clear about the non-relevance of xml:lang scope
>on XML literals insofar as the semantics of *RDF* (not *XML*) are
>concerned. XML semantics only is relevant to an RDF parser.
>End of story. *RDF* users (as opposed to *XML*
>users) will be aware of this (if they've read even the primer) and should
>act accordingly.

Well, and assuming the primer actually gets around to talk about
parseType="Literal" :-(.


>The only argument against this particular solution that may be valid
>is that XML users who have no clue about RDF might be confused
>about the non-relevance of xml:lang for certain RDF constructs, namely
>typed literals and thus, XML literals.

I'm quite sure many readers and users will understand the treatment
of XML literals as typed literals as just a technical mechanism,
and therefore actually be surprised. That will be the case in particular
because the users that use XML literals will be familiar with XML.
After all, it's XML, not something else.


>Well, they can grab the specs and learn a bit about the tool they are
>using. At the risk of offending, and apologies in advance, this really
>is a case of RTFM.

Have you actually read the primer?


> > ...
> >
> > 4. Regarding Martin's other beef, that some XML without any markup in
> > it is 'really' just plain text,
>
>I'm not 100% sure that this is in fact Martin's position, but if it is, or
>is
>anyone else's position,

It indeed is.


>then my reply is that this is simply wrong.
>
>The difference between a plain literal and an XML literal, regardless
>of the presence of markup, is that an RDF application is free to
>presume that an XML literal constitutes well-formed XML, whereas
>a plain literal need not. Period. It's as simple as that.

Wrong. You are confusing some particular representations with
the abstraction behind it. A plain literal can always be represented
as a well-formed XML fragment. In RDF/XML, it actually is represented
that way. In an RDF store, you may choose a different representation,
but that's an implementation detail, which should not affect the
design.


>An XML literal is a string

read 'can be represented as a string'


>that conforms to XML well-formedness
>conditions.

There are many other ways to represent XML literals. This is
an implementation detail, the same way as it is an implementation
detail whether that string is represented as UTF-8 or as UTF-16.


>Furthermore, the comparison of XML literals is not
>based on string-equality. These important distinctions from plain
>strings are captured semantically in the definition of the datatype
>rdf:XMLLiteral and in the RDF datatyping model.

Is there any case where
    <foo:prop>Some text here</foo:prop>
and
    <foo:prop rdf:parseType="Literal">Some text here</foo:prop>
actually behave differently with respect to equality?
I.e. is it possible to construct two text strings A and B,
both conforming to XML #PCDATA, where
    <foo:prop>A</foo:prop> and
    <foo:prop>B</foo:prop> are equal
but
    <foo:prop rdf:parseType="Literal">A</foo:prop> and
    <foo:prop rdf:parseType="Literal">B</foo:prop> are not
(or the other way round)?

I do not know of any such case. If you know of any, please tell us.
If there is none, the above argument is moot.


>Though this is not pointed out anywhere explicitly in the RDF specs
>(which is a shame, but understandable, since it needs a bit more
>testing to ensure there are no major dragons) the present RDF datatyping
>solution, with XML Literals modelled as a datatype, allow for us
>to support the entire range of XML Schema types, including complex
>types! And thereby, define property ranges to be e.g. xhtml:title,
>asserting that all property values conform to the content model
>constraining the lexical space of xhtml:title elements.

First, xhtml:title is really a boring example, as it is currently
only #PCDATA. But that should change with XHTML2, so let's assume
we are there for the sake of this example.

It is important to point out that there is a distinction between
xhtml:title, and the content model of xhtml:title. The former
looks like
   <foo:prop rdf:parseType="Literal"><xhtml:title
       >Your <xhtml:em>Title</xhtml:em> Here</xhtml:title></foo:prop>
The later looks like
   <foo:prop rdf:parseType="Literal"
       >Your <xhtml:em>Title</xhtml:em> Here</foo:prop>

I am very convinced that the later will be much more frequent
in the context of RDF, because it is better to model the
fact that this is a title as an RDF property. This is also
the more important case for I18N.


>Such benefits
>simply dissappear if XML Literals are not modeled as typed
>literals

Well, they may.


>-- and we certainly don't want to go back to treating
>rdf:XMLLiteral as a special case of datatype with lang tag.

Why not? It is special, so why don't you treat it specially?


Regards,    Martin.
Received on Thursday, 3 July 2003 11:10:03 UTC