after-hours conversation (#literal-as-resources #literal-is-xml-structure #xmllang #graph #identity-anon-resources #literal-subjects) from Ron Daniel on 2001-10-12 (w3c-rdfcore-wg@w3.org from October 2001)

From: Ron Daniel <rdaniel@interwoven.com>
Date: Fri, 12 Oct 2001 13:53:50 -0700
To: <w3c-rdfcore-wg@w3.org>
Message-ID: <EMEKICCGFEKJFGKMFLEPGEBKDMAA.rdaniel@interwoven.com>
A few of us remained on the call after the official close
of the 2001-10-12 teleconference. We had some more discussion
of parseType="Literal" and other non-controversial topics.
I thought the rest of the group might be interested in one
suggestion that was made for handling literals (all
literals, not just XML literals). I do not recall any
objections being made to this proposal. It was:

1) With respect to literals, let's define a solution based on where
   we want to be with RDF 2, then treat RDF 1 M&S as a special case,
   or simple mapping, down from that.

2) In RDF 2, let each occurrence of a literal be a prince/b/whatever_node,
   identified in whatever way we decide to handle the things we used to
   call anonymous resources.

3) That node will have an rdf:value property whose value is the
   literal's character string.
   (A corollary might be that rdf:value properties are the only ones
    that actually have character strings as values. That would be
    conceptually cleanest. However, it might be a pain in practice as
    things like xml:lang and some other things might be best served
    with immediate string values. TBD.)

4) The xml:lang is another property of that node.

5) If the literal had rdf:parseType="Literal", this will be reflected
   in the model by giving that node an rdf:type property with an
   appropriate value, perhaps rdf2:xmlLiteral.

6) The namespace bindings in effect for XML literals will appear
   as another property (or set of properties) of that node.

7) This mechanism will allow 'Literals as subjects' in RDF 2.

8) Literals as subjects are not part of the RDF 1 M&S.

9) The 2.0 model can be rendered in 1.0 syntax by:
   a) Rendering the xml:lang property as an attribute on an element at
      an appropriate scoping level
   b) Rendering the rdf:type property whose value is "rdf2:xmlLiteral" as
      an rdf:parseType attribute with the value "Literal".
   c) Rendering the xmlns properties as attributes at the appropriate
      scoping level (which probably means 'as high as possible').
   d) Not rendering any other properties of the literal.
      (This means that a 2.0 model cannot be round-tripped through the
       1.0 syntax. That is OK. A 1.0 syntax will still be
       round-trippable through the 2.0 model.)

10) We still have the question of how to express the language and parse
    type in the 1.0 model (i.e. n-triples). We have at least the following
    choices:
   a) Leave n-triples with three fields. Literals can only appear in
      the third, object, field. Literals follow some grammar like:
         Literal := QUOTE literal_string (DELIM1 lang_string)?
                      (DELIM2 xmlns_string)? UNQUOTE
      and we argue for awhile over the characters we actually use for
      the terminals QUOTE, DELIM1, DELIM2, and UNQUOTE.
   b) Let statements in an n-triples document which have literal values
      contain more than 3 fields (which, to me, seems no different
      than (a) since we still have to argue over how things will be
      delimited).
   c) Say that the 1.0 model was never defined clearly, and just start with
      the 2.0 model, letting the 1.0 syntax be the thing that requires
      various restrictions. (In other words, the n-triples representation
      would use the p/b/anon_nodes to carry xml:lang, rdf:type, and
      namespace properties as separate statements.)
   d) something else?

Currently, I'd be OK with 10(c).

As an example, here's an XML document with embedded RDF, followed
by a possible n-triples representation of the RDF portion.

<?xml version="1.0" encoding="ISO-8859-1"?>
<m:article  xmlns="the XHTML namespace URI"
            xmlns:rdf="the RDF 1.0 URI"
            xmlns:dc="the Dublin Core namespace URI"
            xmlns:prism="the PRISM namespace URI"
            xmlns:m="a magazine article message namespace"
            xml:lang="en-US">
 <rdf:RDF>
   <rdf:Description rdf:about="">
     <dc:title rdf:parseType="Literal"><i>CRN</i> Interview: Ellen Hancock, Exodus Communications</dc:title>
     <dc:subject rdf:resource="http://example.org/subject_codes/networks"/>
     <prism:releaseTime>2001-10-12</prism:releaseTime>
   </rdf:Description>
 </rdf:RDF>
 <body>
   <m:headline>Interview: Ellen Hancock, Exodus Communications</m:headline>
   <p>If this were a real story, there would be lots of stuff here.</p>
   <p>Some of that stuff would include pithy quotes from Ms. Hancock,
      such as <quote prism:speaker="Ellen Hancock">Like Mark Twain said,
      <quote prism:speaker="Samuel Clemens">It's better to keep one's
      mouth shut and appear a fool, than to open it and remove all doubt</quote>.
      Too bad that Ron Daniel guy doesn't follow that advice</quote>.</p>
   <p>But it's not a real story, so there isn't.</p>
 </body>
</article>

Assuming the file was called hancock.article, the n-triples might look
like the following (modulo the use of QNames instead of full URIs because
I'm lazy and think full URIs are hard to read and harder to type):


<hancock.article> <dc:title> _:lit1.
<hancock.article> <dc:subject> <http://example.org/subject_codes/networks>.
<hancock.article> <prism:releaseTime> _:lit2.

_:lit2 <rdf:value> "2001-10-12".

_:lit1 <rdf:value> "<i>CRN</i> Interview: Ellen Hancock, Exodus Communications".
_:lit1 <xml:lang> "en-US".
_:lit1 <rdf:type> <rdf2:xmlLiteral>
_:lit1 <rdf2:ns> _:gen3

_:gen3 <rdf:type> <rdf:Bag>
_:gen3 <rdf:_1> "xmlns=\"the XHTML namespace URI\"".
_:gen3 <rdf:_2> "xmlns:rdf=\"the RDF 1.0 namespace URI\"".
_:gen3 <rdf:_3> "xmlns:dc=\"the Dublin Core namespace URI\"".
_:gen3 <rdf:_4> "xmlns:prism=\"the PRISM namespace URI\"".

(Not sure what to do about the character encoding. I assume
that we don't specify it, requiring instead that all Unicode
strings in an n-triples file are carried in some mandatory
encoding.)

Note that the generated identifiers should distinguish between
the IDs of the nodes for literal strings and the IDs for generic
anonymous nodes which happen to contain an rdf:value. Otherwise
we won't be able to round-trip things like:

  <dc:creator>John Smith</dc:creator>
  <dc:subject rdf:parseType="Resource">
    <rdf:value>Dogs</rdf:value>
  </dc:subject>


Ron Daniel Jr.
Standards Architect
Tel: +1 415 778 3113
Fax: +1 415 778 3131
Email: rdaniel@interwoven.com 

Register for GearUp 2001, Oct. 9-12
The Year's Hottest Content Infrastructure Conference
Visit www.interwoven.com/gearup2001
Received on Friday, 12 October 2001 16:56:19 UTC