Datatyping, reification, syntactic tidyness

Proposal:

The RDF specification explicits says that implementations of the RDF graph
may represent literal nodes with the same label as a single node or as
multiple nodes; and that nothing in the specs allow these different
implementations to be distinguished. Hence, an operation like:

  RDFGraph.countLiteralNodes()

cannot be defined in a way that conforms with our recommendation.

========================


Consider

<rdf:Description rdf:bagID="Reify">
  <eg:p1 rdf:datatype="&xsd;int">10</eg:p1>
  <eg:p2 rdf:datatype="&xsd;int">10</eg:p2>
  <eg:p3 >10</eg:p1>
  <eg:p4 >10</eg:p2>
</rdf:Description>

This creates a graph with:

 four initial triples
 sixteen triples reifying those four triples
 five triples forming the bag

This message is about:
- how many Literal nodes are there?
- do we care?

My preference is to be able to systematically say we do not care.

There are at least two literal nodes, one labelled with an int 10, the other
labelled with a RDF String Literal "10". Since these labels are different
the nodes must be different.

Of the twentyfive triples in the graph eight have literal objects, thus
there are at most eight literal nodes.

A syntactically tidy implementation would stop at two nodes.

A thorough untidy one would have eight nodes.
Some would argue that the object of the rdf:object triple in the reification
is the same node as the object of the original triple. Thus an
implementation following this rationale would get four literals.

Of course, sensible implementations could choose to treat datatyped literals
tidyly and RDF String Literals untidyly (or vice versa) which suggests that
maybe six is also a plausible number of literals.


If in fact, our normative  serialization  of the graph does not allow us to
distinguish these cases then we do not need to, and in fact, SHOULD NOT say
either way.

The model theory needs to reflect this inability to represent the two
different cases and not depend on some hidden node identity that we cannot
serialize (this only rules out certain types of untidiness in the model
theory).

In fact, we should explicitly say that we are not saying, and that this is
deliberately underspecified, since nothing depends on it.

I believe that these two RDF/XML documents are entirely equivalent:

<rdf:Description rdf:bagID="Reify">
  <eg:p1 rdf:datatype="&xsd;int">10</eg:p1>
</rdf:Description>


<rdf:Description rdf:nodeID="subj">
  <eg:p1 rdf:datatype="&xsd;int">10</eg:p1>
</rdf:Description>
<rdf:Bag rdf:ID="Reify">
  <rdf:li>
   <rdf:Statement>
      <rdf:subject rdf:nodeID="subj"/>
      <rdf:predicate rdf:resource="&eg;p1/>
      <rdf:object rdf:datatype="&xsd;int">10</rdf:object>
    </rdf:Statement>
  </rdf:li>
</rdf:Bag>

And I buy Guha's point at the Bristol F2F that with untidy literal semantics
rdf:object refers to the syntax of the triple not its semantics.

Jeremy

Received on Wednesday, 11 September 2002 05:27:05 UTC