Re: rdfms-graph: Food for thought from pat hayes on 2001-07-24 (w3c-rdfcore-wg@w3.org from July 2001)

From: pat hayes <phayes@ai.uwf.edu>
Date: Mon, 23 Jul 2001 18:17:02 -0700
To: Graham Klyne <Graham.Klyne@Baltimore.com>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <v04210110b782788e96ea@[130.107.66.237]>
>At 02:15 PM 7/12/01 -0700, pat hayes wrote:
>>>At 01:01 PM 7/11/01 -0500, Aaron Swartz wrote:
>>>>Here's some questions (with proposed answers) to get us thinking 
>>>>about rdfms-graph. I'm also curious whether there are other 
>>>>questions that should be considered part of the resolution -- the 
>>>>issue description isn't really enough for me to tell.
>>>>
>>>>1) Does an RDF graph have a URI?
>>>>
>>>>It is a Resource, and it can. M&S does not define a specific one.
>>>
>>>I agree.  I'm not sure anything more needs to be said.  Any such 
>>>URI is not part of the abstract syntax/model.
>>
>>I agree except with the last sentence. The abstract syntax does not 
>>exclude URIs that may happen to denote an RDF graph, so we need to 
>>say that it means when they do occur. I think we can simply be 
>>agnostic about it.
>
>I think I was a bit sloppy there -- I meant to say they don't have a 
>distinguished or required place in the abstract syntax -- which I 
>think is consistent with what you say.

Right, I think we agree on this.

>
>>>>2) Is an RDF graph a set or a bag?
>>>>
>>>>A set, as it has conjunctive assertion semantics, or whatever 
>>>>they're called:
>>>>     (A && A) => (A)
>>>
>>>Again, I agree.  I'm not sure anything more needs to be said.
>>
>>The fact that it has this semantics doesn't mean that it is a set. 
>>Logical expressions are not sets, but (and A A ) still implies A 
>>and vice versa. The issue is whether we want to say that an RDF 
>>graph cannot contain two copies of the same triple, not what we 
>>interpret those triples to be saying. I would urge that it would be 
>>harmless to let them be bags, and insisting that they are sets 
>>places an unnecessary burden on a parser (which would need to 
>>remove all duplications whenever it merged two graphs), so let them 
>>be bags.
>
>Hmmm... I think this is a design choice that could go either way, 
>each of which puts some kind of constraint on the parser:
>
>Bag - requires that the parser accurately create a separate copy of 
>each triple defined by the input;  this might be a burden for some 
>implementations -- e.g. a relational database that uses the 3 triple 
>components as the primary key in a table.
>
>Set - requires that the parser detects duplicates, as you say.

Its not so much the burden on a parser that worries me, but whether 
saying that a syntax (even an abstract syntax) consists of a *set* of 
expressions even makes sense. RDF is a language, after all, and it is 
even a language which is supposed to be adapted to the situation 
where content written in the language might be found in many sources 
scattered (literally) around the planet. So what if a triple occurs 
on a web page in China and the same triple occurs on another page in 
Oklahoma? Are they the same triple or not? It seems clear that they 
are not. So if an RDF document is a set of triples, does that mean 
that any well-formed RDF document cannot contain both of them? (Or is 
it just wrongheaded to even talk about a triple - as opposed to a 
lexicalisation of a triple - being on a web page? That would make 
sense, but then I don't see the utility of the entire RDF graph 
model, since it isn't needed and it seems to be at odds with the 
basic philosophy of RDF.)

Linguistics has long used a distinction between an expression (eg a 
word) and a token of that expression. It makes sense to say that 
there is a single English word 'beauty', say, but it would be 
foolhardy to define a syntax in which there could only be one *token* 
of that word, since as soon as you wrote it twice you would be 
breaking the rules. It makes sense to talk about sets of expressions 
when those are Platonic abstractions, but not when they are physical 
tokens. I am still unsure what the RDF M&S authors had in mind when 
they said that an RDF graph was a set of triples. If the graph is a 
mathematical abstraction, then that makes sense; but if it is 
anything like a syntax, then it makes a lot less sense.

>In defining the abstract syntax, I think we are trying to capture 
>something of the *intent* (read "meaning") of the RDF language used, 
>rather than to prescribe a particular implementation.  Calling an 
>RDF graph a "set" seems to more closely reflect the intent.
>
>One might say it doesn't matter which it is (set or bag), and in a 
>sense that would be true.  But then we lose a small practical 
>benefit of the approach this group has been followings, viz. to 
>prescribe a collection of triples that correspond to any RDF graph.

We can have this either way. If the graph allows duplicate arcs then 
the collection is a bag; if not, it is a set. The bag interpretation 
makes the correspondence between graphs and documents 1:1, which 
seems to me to be an advantage.

>In defining and formalizing N-triples, I think the Bag approach of 
>not forbidding duplicates is right, because that means the semantics 
>is explicitly defined by the formalism.  In defining the mapping 
>from RDF/XML to N-triples, the choice is less clear (to me).

OK, I agree it isnt *clear* :-)

>So I'll offer a test case:
>
><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>         xmlns:ex="http//example.org/#">
>  <rdf:Description about="http://example.org/#Subject" >
>    <ex:property>Value</ex:property>
>  </rdf:Description>
>  <rdf:Description about="http://example.org/#Subject" >
>    <ex:property>Value</ex:property>
>  </rdf:Description>
></rdf:RDF>
>
>For which, SiRPAC says the triples are:
>
><http://example.org/#Subject> <http//example.org/#property> "Value" .
><http://example.org/#Subject> <http//example.org/#property> "Value" .
>
>Similarly, the graph displayed has two identical arcs.
>
>This seems to be at odds with RDFM&S, section 5, (P162) which says 
>"There is a set called Statements, each element of which is a triple 
>of the form ...":
>
>http://lists.w3.org/Archives/Public/www-archive/2001Jun/att-0021/00-part#162

Yes, but see above. I have never understood exactly what this is 
supposed to mean. If those statements are merely abstractions, then 
what does it mean to say that they occur in a document? If they are 
anything like statements (lower-case), then one would expect a 
document to be made of tokens rather than abstractions.

>>>>3) Can a node exist in a graph without any properties?
>>>>
>>>>Yes. This is indicated in the current XML syntax with an empty 
>>>>Description element.
>>>
>>>Here, I disagree:  there is no obvious way to represent an 
>>>isolated node in an abstract syntax/model based on triples.  I 
>>>think an empty <Description> adds nothing to the semantics so 
>>>should not appear in the abstract syntax/model.
>>
>>As a matter of general methodology, the question to ask is whether 
>>allowing it would cause any harm. I can't see any harm, so would 
>>opt for not forbidding it.
>
>The "harm" I see is that (a) not forbidding would seem to suggest a 
>requirement to represent it, and (b) it's not clear to me how one 
>would represent (in N-triples) a node on its own.  Also, I think a 
>node not part of a property has no "meaning", so allowing this would 
>seem to introduce spurious alternatives for the same semantics.  It 
>seems simpler to just ignore isolated nodes at the abstract syntax 
>level.

OK, I now agree with you on this point and withdraw my earlier opting.

Pat Hayes

---------------------------------------------------------------------
(650)859 6569 w
(650)494 3973 h (until September)
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Monday, 23 July 2001 21:16:56 UTC