Re: rdfms-literal-is-xml-structure: Why? from Dan Connolly on 2001-07-12 (w3c-rdfcore-wg@w3.org from July 2001)

From: Dan Connolly <connolly@w3.org>
Date: Thu, 12 Jul 2001 00:58:45 -0500
To: Ron Daniel <rdaniel@interwoven.com>
CC: Dave Beckett <dave.beckett@bristol.ac.uk>, RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <3B4D3C95.9D87F9FA@w3.org>
Ron Daniel wrote:
> 
> Dan Connolly raised this issue, stating it as:
> 
>    A statement with a parseType of 'Literal' has as its object
>    an XML structure, not a simple string. For example, the first
>    character of the literal <foo>bar</foo> is not '<'.
> 
> This is an interesting suggestion. It raises several questions.
> I'll confine myself to one (at least for now)...
> 
> 1) What evidence is there that this was the intent of the
> M&S 1.0 specification?

Er... this looks like pretty direct evidence:

[[[
If the
     content of E contains no XML markup or if parseType="Literal" is
specified in the start tag of E then v is the
     content of E (a literal).
]]]

--        Resource Description Framework (RDF) Model and Syntax
Specification
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
Wed, 24 Feb 1999 14:45:07 GMT


E referes to an XML element in that bit of the spec.
The content of an XML element isn't (in general) a string; it's
a sequence of data characters and/or elements, PIs, and comments:

[[[
Content of Elements

         [43]   
              content
                         ::=   
                            CharData? ((element | Reference | CDSect |
PI | Comment)
                            CharData?)*
]]]

--        Extensible Markup Language (XML) 1.0 (Second Edition)
http://www.w3.org/TR/REC-xml#dt-content
Thu, 05 Oct 2000 12:19:51 GMT


Let's take this example from the RDF spec:

-------------
In the following example, the value of the Title property is a literal
containing some MATHML markup.


  <rdf:Description
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/metadata/dublin_core#"
    xmlns="http://www.w3.org/TR/REC-mathml"
    rdf:about="http://mycorp.com/papers/NobelPaper1">

    <dc:Title rdf:parseType="Literal">
      Ramifications of
         <apply>
        <power/>
        <apply>
          <plus/>
          <ci>a</ci>
          <ci>b</ci>
        </apply>
        <cn>2</cn>
      </apply>
      to World Peace
    </dc:Title>
    <dc:Creator>David Hume</dc:Creator>
  </rdf:Description>
-------------

what do you suggest is the value of the dc:Title (sic) property?
I suggest it's a structured thing, ala the XML infoset
or XPath data model; it's got some characters, a
mathml:apply element, and some more characters. No?


> Searching through the archives of the w3c-rdf-syntax-wg
> list for 'infoset' turns up VERY few messages.

The infoset hardly existed then.

> Re-reading those messages, IMHO, supports a very different
> interpretation of the WG intent - that parseType="Literal" was a
> stop-gap measure to let us deal with embedded XML content through
> the simple expedient of turning off RDF parsing of that content.
> In fact, the phrase "generates no tuples" is used in the emails
> above in a manner that seems to indicate that the WG wanted to
> completely ignore the content and markup in the Literal, and treat it
> as a simple string. Later applications might do something with the
> markup.

I could live with encoding the structured thing as a string
provided
	(1) namespace info isn't lost; in the example above,
	the resulting string must capture the namespace
	name associated with <apply/> etc.

	(2) it remains distinguishable from a string that
	happens to have the same characters.

i.e. I'm OK with "delaying" the parsing, so long as we don't
lose information.


> If that is the case, then the clarification document can't say
> that M&S 1.0 requires the generation of tuples for the infoset of
> the embedded content. That seems the opposite of the intent.

I agree that M&S 1.0 doesn't give the URIs of the relevant
properties, so it would be more than clarification to specify them.

But this seems like another bug in the spec: "anybody can
say anything about anything; but if you want to give the language
of a Literal or model the structure of XML content, don't use
RDF properties to do it!" I don't think that was the intent.

> Dan's suggestion could be within the scope of a 2.0 revisitation
> of M&S, but clearly seems to exceed our chartered tasks.
> 
> (At that time, there may be an approach we can take which reconciles
> the views. We might say that in 1.0, a Literal is just a String,
> but that in 2.0, we have some extra info in the model so that
> we not only have the string, we have a URI for it. (We should also
> agree on just what those URIs are). That URI can be used as the
> subject for all sorts of statements. We could use it in statements
> which have a predicate called something like 'rdf2:hasInfoset'. The
> rest is left as an exercise for the future.)
> 
> But for now, I think that as far as RDF 1.0 processors are concerned,
> Literals are just strings, and the first character of a string
> like "<foo>bar</foo>" would be '<'.

As long as we somehow distinguish
	<prop parseType="Literal"><foo>bar</foo></prop>
from
	<prop>&lt;foo&gt;bar&lt;/foo&gt;</prop>

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Thursday, 12 July 2001 01:58:56 UTC