Re: after-hours conversation (#literal-as-resources #literal-is-xml-structure #xmllang #graph #identity-anon-resources #literal-subjects) from Pat Hayes on 2001-10-13 (w3c-rdfcore-wg@w3.org from October 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Fri, 12 Oct 2001 21:16:15 -0500
To: "Ron Daniel" <rdaniel@interwoven.com>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101018b7ed49aa576f@[205.160.76.193]>
>A few of us remained on the call after the official close
>of the 2001-10-12 teleconference. We had some more discussion
>of parseType="Literal" and other non-controversial topics.
>I thought the rest of the group might be interested in one
>suggestion that was made for handling literals (all
>literals, not just XML literals). I do not recall any
>objections being made to this proposal.

I may have not been fully understanding it at the time :-)

>It was:
>
>1) With respect to literals, let's define a solution based on where
>    we want to be with RDF 2, then treat RDF 1 M&S as a special case,
>    or simple mapping, down from that.
>
>2) In RDF 2, let each occurrence of a literal be a prince/b/whatever_node,
>    identified in whatever way we decide to handle the things we used to
>    call anonymous resources.

I don't like that. I want to label the node with the literal itself, 
just like we do now, not be forced into writing two nodes where one 
will do.

>3) That node will have an rdf:value property whose value is the
>    literal's character string.
>    (A corollary might be that rdf:value properties are the only ones
>     that actually have character strings as values. That would be
>     conceptually cleanest. However, it might be a pain in practice as
>     things like xml:lang and some other things might be best served
>     with immediate string values. TBD.)
>
>4) The xml:lang is another property of that node.

So for example where we would now write

aaa example:property lll .

you want us to write

aaa example:property _:1 .
_:1 rdf:value lll .

and maybe also

_:1 xml:lang "fr" .

Nope, I object. Clunky and unnecessary.  I want to be able to just write

aaa example:property lll .

or, if I really need to, in a slight extension of Ntriples:

aaa example:property _:1:lll .

_:1 xml:lang "fr" .

"_:1:lll", by the way, is a node, identified in an Ntriple++ document 
with a nodeID (formerly called a bNode :-) which is *not* blank, but 
is labelled with a literal.
That Ntriples++ document indicates an RDF graph with three nodes 
labelled aaa, lll, and "fr", and with two edges labelled with 
example:property and xml:lang.

Notice that

aaa example:property lll .

and

aaa example:property _:1:lll .

are the *same* graph. The node IDs don't occur in the graph, they are 
only used by Ntriples++  to construct a graph from a lexicalization, 
in just the way that Ntriples does; the rule is: make a graph from 
the triples in which URI nodes are merged and any two nodes with the 
same nodeID are merged, but no other merges are done, then throw away 
the nodeIDs. If that would give a node more than one label, the 
Ntriples++ parser barfs.

>5) If the literal had rdf:parseType="Literal", this will be reflected
>    in the model by giving that node an rdf:type property with an
>    appropriate value, perhaps rdf2:xmlLiteral.
>
>6) The namespace bindings in effect for XML literals will appear
>    as another property (or set of properties) of that node.

I like Peter P-S's idea better, to allow them to be inferred from 
range assertions involving namespace IDs. In fact, I like having both 
options, which we could with the new MT extension :-)

>7) This mechanism will allow 'Literals as subjects' in RDF 2.

That can be done independently of this. However, having literals as 
subjects will require some way to identify different occurrences of 
the same literal in N-triples (as in Ntriples++, eg.)

>8) Literals as subjects are not part of the RDF 1 M&S.

Another good argument for using range information to specify datatypes.

>
>9) The 2.0 model can be rendered in 1.0 syntax by:
>    a) Rendering the xml:lang property as an attribute on an element at
>       an appropriate scoping level
>    b) Rendering the rdf:type property whose value is "rdf2:xmlLiteral" as
>       an rdf:parseType attribute with the value "Literal".
>    c) Rendering the xmlns properties as attributes at the appropriate
>       scoping level (which probably means 'as high as possible').
>    d) Not rendering any other properties of the literal.
>       (This means that a 2.0 model cannot be round-tripped through the
>        1.0 syntax. That is OK. A 1.0 syntax will still be
>        round-trippable through the 2.0 model.)
>
>10) We still have the question of how to express the language and parse
>     type in the 1.0 model (i.e. n-triples). We have at least the following
>     choices:
>    a) Leave n-triples with three fields. Literals can only appear in
>       the third, object, field. Literals follow some grammar like:
>          Literal := QUOTE literal_string (DELIM1 lang_string)?
>                       (DELIM2 xmlns_string)? UNQUOTE
>       and we argue for awhile over the characters we actually use for
>       the terminals QUOTE, DELIM1, DELIM2, and UNQUOTE.
>    b) Let statements in an n-triples document which have literal values
>       contain more than 3 fields (which, to me, seems no different
>       than (a) since we still have to argue over how things will be
>       delimited).
>    c) Say that the 1.0 model was never defined clearly, and just start with
>       the 2.0 model, letting the 1.0 syntax be the thing that requires
>       various restrictions. (In other words, the n-triples representation
>       would use the p/b/anon_nodes to carry xml:lang, rdf:type, and
>       namespace properties as separate statements.)
>    d) something else?
>
>Currently, I'd be OK with 10(c).
>
>As an example, here's an XML document with embedded RDF, followed
>by a possible n-triples representation of the RDF portion.
>
><?xml version="1.0" encoding="ISO-8859-1"?>
><m:article  xmlns="the XHTML namespace URI"
>             xmlns:rdf="the RDF 1.0 URI"
>             xmlns:dc="the Dublin Core namespace URI"
>             xmlns:prism="the PRISM namespace URI"
>             xmlns:m="a magazine article message namespace"
>             xml:lang="en-US">
>  <rdf:RDF>
>    <rdf:Description rdf:about="">
>      <dc:title rdf:parseType="Literal"><i>CRN</i> Interview: Ellen 
>Hancock, Exodus Communications</dc:title>
>      <dc:subject rdf:resource="http://example.org/subject_codes/networks"/>
>      <prism:releaseTime>2001-10-12</prism:releaseTime>
>    </rdf:Description>
>  </rdf:RDF>
>  <body>
>    <m:headline>Interview: Ellen Hancock, Exodus Communications</m:headline>
>    <p>If this were a real story, there would be lots of stuff here.</p>
>    <p>Some of that stuff would include pithy quotes from Ms. Hancock,
>       such as <quote prism:speaker="Ellen Hancock">Like Mark Twain said,
>       <quote prism:speaker="Samuel Clemens">It's better to keep one's
>       mouth shut and appear a fool, than to open it and remove all 
>doubt</quote>.
>       Too bad that Ron Daniel guy doesn't follow that advice</quote>.</p>
>    <p>But it's not a real story, so there isn't.</p>
>  </body>
></article>
>
>Assuming the file was called hancock.article, the n-triples might look
>like the following (modulo the use of QNames instead of full URIs because
>I'm lazy and think full URIs are hard to read and harder to type):
>
>
><hancock.article> <dc:title> _:lit1.
><hancock.article> <dc:subject> <http://example.org/subject_codes/networks>.
><hancock.article> <prism:releaseTime> _:lit2.
>
>_:lit2 <rdf:value> "2001-10-12".
>
>_:lit1 <rdf:value> "<i>CRN</i> Interview: Ellen Hancock, Exodus 
>Communications".
>_:lit1 <xml:lang> "en-US".
>_:lit1 <rdf:type> <rdf2:xmlLiteral>
>_:lit1 <rdf2:ns> _:gen3
>
>_:gen3 <rdf:type> <rdf:Bag>
>_:gen3 <rdf:_1> "xmlns=\"the XHTML namespace URI\"".
>_:gen3 <rdf:_2> "xmlns:rdf=\"the RDF 1.0 namespace URI\"".
>_:gen3 <rdf:_3> "xmlns:dc=\"the Dublin Core namespace URI\"".
>_:gen3 <rdf:_4> "xmlns:prism=\"the PRISM namespace URI\"".

This all seems like a lot of bother. If literals are just strings, it 
ought to be easier than this. In RDF2 we will want to have at least 
full XML datatyping for literals; DAML +OIL already has something 
close.

Why is 2001-10-12 a character string, and not a date? I'd rather have:

<hancock.article> <prism:releaseTime>  "2001-10-12" .
<prism:releaseTime> rdfs:range xsdd:YMDDate .

or some such, and be able then to infer that this particular 
"2001-10-12" is a date rather than, say, an integer in some wierd 
notation, or a string.

Pat Hayes
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Friday, 12 October 2001 22:16:31 UTC