W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > July 2001

Re: rdfms-literal-is-xml-structure: Why?

From: Dan Connolly <connolly@w3.org>
Date: Thu, 12 Jul 2001 00:58:45 -0500
Message-ID: <3B4D3C95.9D87F9FA@w3.org>
To: Ron Daniel <rdaniel@interwoven.com>
CC: Dave Beckett <dave.beckett@bristol.ac.uk>, RDF Core <w3c-rdfcore-wg@w3.org>
Ron Daniel wrote:
> Dan Connolly raised this issue, stating it as:
>    A statement with a parseType of 'Literal' has as its object
>    an XML structure, not a simple string. For example, the first
>    character of the literal <foo>bar</foo> is not '<'.
> This is an interesting suggestion. It raises several questions.
> I'll confine myself to one (at least for now)...
> 1) What evidence is there that this was the intent of the
> M&S 1.0 specification?

Er... this looks like pretty direct evidence:

If the
     content of E contains no XML markup or if parseType="Literal" is
specified in the start tag of E then v is the
     content of E (a literal).

--        Resource Description Framework (RDF) Model and Syntax
Wed, 24 Feb 1999 14:45:07 GMT

E referes to an XML element in that bit of the spec.
The content of an XML element isn't (in general) a string; it's
a sequence of data characters and/or elements, PIs, and comments:

Content of Elements

                            CharData? ((element | Reference | CDSect |
PI | Comment)

--        Extensible Markup Language (XML) 1.0 (Second Edition)
Thu, 05 Oct 2000 12:19:51 GMT

Let's take this example from the RDF spec:

In the following example, the value of the Title property is a literal
containing some MATHML markup.


    <dc:Title rdf:parseType="Literal">
      Ramifications of
      to World Peace
    <dc:Creator>David Hume</dc:Creator>

what do you suggest is the value of the dc:Title (sic) property?
I suggest it's a structured thing, ala the XML infoset
or XPath data model; it's got some characters, a
mathml:apply element, and some more characters. No?

> Searching through the archives of the w3c-rdf-syntax-wg
> list for 'infoset' turns up VERY few messages.

The infoset hardly existed then.

> Re-reading those messages, IMHO, supports a very different
> interpretation of the WG intent - that parseType="Literal" was a
> stop-gap measure to let us deal with embedded XML content through
> the simple expedient of turning off RDF parsing of that content.
> In fact, the phrase "generates no tuples" is used in the emails
> above in a manner that seems to indicate that the WG wanted to
> completely ignore the content and markup in the Literal, and treat it
> as a simple string. Later applications might do something with the
> markup.

I could live with encoding the structured thing as a string
	(1) namespace info isn't lost; in the example above,
	the resulting string must capture the namespace
	name associated with <apply/> etc.

	(2) it remains distinguishable from a string that
	happens to have the same characters.

i.e. I'm OK with "delaying" the parsing, so long as we don't
lose information.

> If that is the case, then the clarification document can't say
> that M&S 1.0 requires the generation of tuples for the infoset of
> the embedded content. That seems the opposite of the intent.

I agree that M&S 1.0 doesn't give the URIs of the relevant
properties, so it would be more than clarification to specify them.

But this seems like another bug in the spec: "anybody can
say anything about anything; but if you want to give the language
of a Literal or model the structure of XML content, don't use
RDF properties to do it!" I don't think that was the intent.

> Dan's suggestion could be within the scope of a 2.0 revisitation
> of M&S, but clearly seems to exceed our chartered tasks.
> (At that time, there may be an approach we can take which reconciles
> the views. We might say that in 1.0, a Literal is just a String,
> but that in 2.0, we have some extra info in the model so that
> we not only have the string, we have a URI for it. (We should also
> agree on just what those URIs are). That URI can be used as the
> subject for all sorts of statements. We could use it in statements
> which have a predicate called something like 'rdf2:hasInfoset'. The
> rest is left as an exercise for the future.)
> But for now, I think that as far as RDF 1.0 processors are concerned,
> Literals are just strings, and the first character of a string
> like "<foo>bar</foo>" would be '<'.

As long as we somehow distinguish
	<prop parseType="Literal"><foo>bar</foo></prop>

Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Thursday, 12 July 2001 01:58:56 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:24:02 UTC