Re: Updated rationale for RDFCore's current design

Hello Brian, others,

In this mail, I'm trying to reply to Brian's list of arguments
on the RDF Core side. I'll send out a separate mail with the
arguments on the i18n side, but that may have to wait for
tomorrow.

At 18:48 03/07/08 +0100, Brian McBride wrote:

>I have had another attempt at stating a rationale for the current design
>based in what Patrick, Pat, Jeremy et al have written recently, and some
>thoughts of my own.  I'm just trying to capture RDFCore thinking in one
>place, so as usual, please correct, amend, clarify etc.
>
>1. For RDF, its abstract syntax, i.e. the graph, is it primary
>representation.  RDF/XML is a concrete syntax for representing graphs,
>i.e. from an RDF perspective, the goal is to figure out how best to
>represent graphs in RDF/XML, not how to represent RDF/XML in graphs.
>
>The typical use case for XML Literals is where an XML literal will be
>written into the middle of an XML document by an application.  This is
>simplest if the xml literal is a standalone fragment that can be simply
>written into the XML document.

- It *can* be written as a standalone fragment, thanks to xml:lang=""
- It is not actually a standalone fragment, because canonicalization
   requires to look at namespace declarations that may be higher up
   in the tree.
- Our main concern is not the xml:lang info in RDF/XML, but the
   info in the graph.


>2. RDFCore agrees with feedback that it received, that
>building an XML specific mechanism into its core model is architecturaly
>inappropriate - it mixes things that should be independent.  Accepting
>this implies that parseType="Literal" values must use one of the
>existing mechanisms - i.e. either plain literals or typed literals, or a
>new more general mechanism must be invented, e.g. a new triple
>structure.  An XML specific mechanism is undesirable.

- Who gave you this feedback, and how did they justify it?
- For i18n, text-with-markup, and the continuity between plain text
   and text-with-markup, are very important. As an example, for all
   XML DTD designs, we strongly recommend not using attributes for
   any kind of natural text. This has among else led to some changes
   from XHTML 1.0 to XHTML 2.0.
- On a higher level, we do not really care whether text-with-markup
   is represented as XML (a very natural choice) or in some other way,
   as long as this is clearly specified in RDF, rather than just
   left to the application. What we care about is to have a consistent
   model for text in all the forms we use, which we had in M&S, but we
   seem to get away from more and more.


>3. For the common use case, where applications embed a literal in an XML
>document, it is preferable to distinguish,in the graph, between plain
>and XML literals, so that e.g. different escaping conventions can be
>applied.

The different escaping conventions are implementation details.
They don't look like a justification for major design decisions.


>4. Taking the datatype approach creates the opportunity for future 
>applications
>to subclass the datatype XMLLiteral, so that the value of a property may be
>restricted to a specific form of XML Literal, possibly specified using XML
>Schema.

We do not deny this possibility, but:
- It does not necessarily affect the question of language tagging.
- It looks like i18n gets punished now for some benefits that
   others *might* get in the future.
- Integration of plain literals into the datatype system
   would probably solve some of the problems.


>5. The equality rules are different for plain and XML literals.
>
>   "<eg:prop eg:a='a' eg:b='b'/>" and "<eg:prop eg:b='b' eg:a='a'/>"
>
>are different plain literals, but equal XML literals.

Seems true, but only if we ignore the implementation details
mentioned in point 3. We are not asking for a plain literal
that by chance happens to look like some XML (somebody on
our call today mentioned that Martians might have such names)
and XML that looks similar to be the same. What we are asking
for is that plain literals and XML literals that consist
only of the same plain text are considered the same.


>6. The notion that the literal in the RDF/XML fragment below
>
>  <eg:prop xml:lang="en" rdf:parseType="Literal">
>    <span xml:lang="fr">chat</span>
>  </eg:prop>
>
>contains the English string "chat" as a substring seems bizarre.

The fragment above says that "chat" is French, not English.
Maybe a typo? If yes, then what's bizarre about that?
It would make both HTML 4.0 and XML 1.0 be very bizarre indeed.


>2, 3, 4, 5  and 6 argue for using the datatyping mechanism to represent
>xml literals.
>
>7. The XSD datatyping model does not support the notion that the value
>of a literal is affected by a language tag.  RDFCore's attempts to
>introduce this notion caused considerable complexity and difficulty in
>the model theory and met with strong negative feedback.
>
>Thus, if language is to affect the value of an xml literal it must be
>part of the members of the lexical space of the datatype.
>
>This can be accomplished by the parser generating a wrapper element to
>hold the lang tag.

That's one way. Some seem to like it, others not.
We think that it's better than what we have now, at least.


>8. The generation of a wrapper element is undesirable for the following
>reasons:
>
>  - it is unhelpful in a primary use case where one wants to simply embed
>the literal in another XML document - the application has to get rid of
>the wrapper element, and find another enclosing element on which to hang
>the lang tag.

The application does not have to do anything. The wrapper is
to make the model theory work. Nobody says APIs have to use wrappers.


>  - implementation complexity in general, caused by introducing and
>removing the wrapper element
>
>  - the value of a property cannot be an arbritary XML fragment - it must
>always have an outer wrapper
>
>  - the user may be surprised that the XML fragment is not identical to
>the one represented in the RDF/XML, e.g. XPATH expressions won't work as
>expected.

The same things apply here as above. The wrapper is something for
model theory. No need to store it, no need to consider it for XPath,
and so on. Nobody requires that an RDF application store integers
as strings, so it's perfectly natural to assume that different
datatypes have different ways to be stored and accessed.


>Thus we are left with the current RDFCore proposal.
>
>The practical experience of WG members suggests that thinking of
>parseType="Literal" values as isolated fragments of XML that do not
>inherit language from their context, i.e. the current RDFCore design, is
>appropriate in practice.

>Martin:
>
>  - leaving aside whether you agree with the value judgements it makes,
>would you accept that the above represents close to a coherent rationale
>for the current RDFCore proposal?

It makes it understandable why things got the way they are now.
And it is a good summary of the arguments that have come up.


>  - do you find it at all persuasive?

Not really, no, sorry.


Regards,    Martin.

Received on Tuesday, 8 July 2003 18:19:12 UTC