Re: [Fwd: Re: Summary of strings, markup, and language tagging in RDF (resend)]

On Fri, 2003-07-04 at 11:30, Brian McBride wrote:

Updating the arguments as I've understood them ...

[...]

> 
> So, if I can try to summarize (at least my understanding) of the details
> you gave (I've included some detailed comments amongst your text below,
> but they are not greatly relevant, I think).
> 
>   - users familar with XML will be surprised that the lang tag does not
> affect an xml literal
>   - users will be confused that plain literals are treated differently
> from XML literals
>   - the common case is that the user wishes an enclosing lang tag to
> apply to an xml literal, so why burden the user with duplicating the
> information
>   - not all XML languages have neutral elements such as <span> that can
> be added to hold extra type information
>   - conversion from other XML languages to RDF/XML will require more
> complex code

Martin wished to add a concern that RDFCore has exceeded its charter in
changing XML literals as defined in M&S.

I have had another attempt at stating a rationale for the current design
based in what Patrick, Pat, Jeremy et al have written recently, and some
thoughts of my own.  I'm just trying to capture RDFCore thinking in one
place, so as usual, please correct, amend, clarify etc.

1. For RDF, its abstract syntax, i.e. the graph, is it primary
representation.  RDF/XML is a concrete syntax for representing graphs,
i.e. from an RDF perspective, the goal is to figure out how best to
represent graphs in RDF/XML, not how to represent RDF/XML in graphs.

The typical use case for XML Literals is where an XML literal will be
written into the middle of an XML document by an application.  This is
simplest if the xml literal is a standalone fragment that can be simply
written into the XML document.

2. RDFCore agrees with last call feedback that it received, that
building an XML specific mechanism into its core model is architecturaly
inappropriate - it mixes things that should be independent.  Accepting
this implies that parseType="Literal" values must use one of the
existing mechanisms - i.e. either plain literals or typed literals, or a
new more general mechanism must be invented, e.g. a new triple
structure.  An XML specific mechanism is undesirable.

3. For the common use case, where applications embed a literal in an XML
document, it is preferable to distinguish,in the graph, between plain
and XML literals, so that e.g. different escaping conventions can be
applied.

4. Taking the datatype approach creates the opportunity to subclass the
datatype XMLLiteral, so that the value of a property may be restricted
to a specific form of XML Literal, possibly specified using XML Schema.

5. The equality rules are different for plain and XML literals.

  "<eg:prop eg:a='a' eg:b='b'/>" and "<eg:prop eg:b='b' eg:a='a'/>"

are different plain literals, but equal XML literals.

6. The notion that the literal in the RDF/XML fragment below

 <eg:prop xml:lang="en" rdf:parseType="Literal">
   <span xml:lang="fr">chat</span>
 </eg:prop>

contains the English string "chat" as a substring seems bizarre.

2, 3, 4, 5  and 6 argue for using the datatyping mechanism to represent
xml literals.

7. The XSD datatyping model does not support the notion that the value
of a literal is affected by a language tag.  RDFCore's attempts to
introduce this notion caused considerable complexity and difficulty in
the model theory and met with strong negative feedback.

Thus, if language is to affect the value of an xml literal it must be
part of the members of the lexical space of the datatype.  

This can be accomplished by the parser generating a wrapper element to
hold the lang tag.

8. The generation of a wrapper element is undesirable for the following
reasons:

 - it is unhelpful in a primary use case where one wants to simply embed
the literal in another XML document - the application has to get rid of
the wrapper element, and find another enclosing element on which to hang
the lang tag.

 - implementation complexity in general, caused by introducing and
removing the wrapper element

 - the value of a property cannot be an arbritary XML fragment - it must
always have an outer wrapper

 - the user may be surprised that the XML fragment is not identical to
the one represented in the RDF/XML, e.g. XPATH expressions won't work as
expected.

Thus we are left with the current RDFCore proposal.

The practical experience of WG members suggests that thinking of
parseType="Literal" values as isolated fragments of XML that do not
inherit language from their context, i.e. the current RDFCore design, is
appropriate in practice.

It has also been suggested that it is easier to integrate data from
different sources when xml lang is not inherited from context.  I am
finding that one hard to follow, unless the integration is being done
cut and paste by hand, since it is easy to always put an xml:lang=""
next to every rdf:parseType="Literal" to ensure language isolation. 
Perhaps someone can provide an example to show the advantage.

Martin:

 - leaving aside whether you agree with the value judgements it makes,
would you accept that the above represents close to a coherent rationale
for the current RDFCore proposal?

 - do you find it at all persuasive?

Brian

Received on Sunday, 6 July 2003 14:50:56 UTC