Re: RDF-ISSUE-25 (Deprecate Reification): Should we deprecate (RDF 2004) reification? [Cleanup tasks] from Richard Cyganiak on 2011-04-08 (public-rdf-wg@w3.org from April 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Fri, 8 Apr 2011 20:12:24 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: Ivan Herman <ivan@w3.org>, Eric Prud'hommeaux <eric@w3.org>, David Wood <dpw@talis.com>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <951D5E0B-7C07-4934-B992-E316F6339A3A@cyganiak.de>

Sandro,

I am sorry but I will not respond to the substance of this message.

There was a survey whose results you know well. A charter was written up. It says that the WG must standardize the multigraph stuff, and that it must deprecate (whatever that means) reification.

Now you are suggesting that reification could be the way to address the multigraph stuff. I find this perverse and strongly object to the proposal. I think discussing it at all is a waste of working group resources.

Richard


On 8 Apr 2011, at 19:04, Sandro Hawke wrote:

> On Fri, 2011-04-08 at 11:02 +0100, Richard Cyganiak wrote:
>> On 8 Apr 2011, at 05:42, Sandro Hawke wrote:
>>> <u1> { <a> <b> 1, 2 }
>>> <u2> { <a> <c> 3, 4 }
>>> 
>>> would be:
>>> 
>>> <u1> eg:hasTriple [ rdf:subject <a>; rdf:predicate <b>; rdf:object 1 ],
>>>                 [ rdf:subject <a>; rdf:predicate <b>; rdf:object 2 ].
>>> 
>>> <u2> eg:hasTriple [ rdf:subject <a>; rdf:predicate <c>; rdf:object 3 ],
>>>                 [ rdf:subject <a>; rdf:predicate <c>; rdf:object 4 ].
>>> 
>>> So, why do SPARQL folks prefer TriG and N-Quads to these forms?  I don't
>>> know.    
>> 
>> The second is about five times more verbose. 
> 
> I'm not sure what you mean by "the second", but yes, these forms *feel*
> verbose.    But are they?
> 
> Compared to TriG, these forms use about 10 bytes more per graph and
> about 45 bytes more per triple.   I expect when gzip'd those numbers
> would drop to about 2 bytes and 5 bytes.   Probably not a big deal.
> 
> If you're storing the reified graphs in a g-box, then yes, you have a
> ~5x expansion of the original graphs, but is that a fair comparison?
> With the other approaches, you CAN'T store the result in a g-box.  So
> this is comparing apples to ... empty space.
> 
>> It is unsuitable for hand-writing.
> 
> Agreed, much like N-Triples or N-Quads.   Or RDFa or RDF/XML. 
> 
>> To be even remotely readable and efficiently processable, it relies on something that is not significant in RDF: order of statements.
> 
> I don't agree with your characterization here.   This only depends on
> the ordering of statements the way lots of RDF data which serializes
> objects does -- if the parser hands you the triples with good locality,
> the data can be streamed; otherwise it must be buffered or queried.
> 
>> It is brittle because it raises the question of what to do with incomplete reified triples. 
> 
> It seems to me this is just like with any other error or omission in
> the inputs.  You can ignore it, ask a human for help, or whatever, as
> appropriate to your situation.    The semantics are clear enough: some
> graph has some triples, but you haven't been told exactly what they are.
> 
>> Its verbosity explodes exponentially when one wants to say that Alice said that Bob said that Charlie said something.
> 
> I don't think so.   
> 
> How would you do this in TriG, which also doesn't support nesting?
> Something like this:
> 
> <Alice> said <AlicesClaim>.
> <AlicesClaim> { <Bob> said <BobsClaim> }
> <BobsClaim>  {<Charlie> said <CharliesClaim> } 
> 
> That structure translates exactly to what I'm describing above, with
> just the constant-factor expansion.
> 
>>> If you put that into N-Triples and sort it by predicate, performing the import is going to
>>> require holding the entire structure in memory.  But a valid response might be, "don't do that".
>> 
>> "Don't do that" is not a practical response. The order of statements is not significant in RDF, and not maintained by many systems.
> 
> I disagree that it's not a practical response.   The point here is that
> if you want to convey a multi-graph dataset in a format which allows it
> to be parsed with constant memory and linear time, you will need to pay
> attention to the order in which triples occur in that serialization.
> And what you have to do is exactly what you have to do in serializing
> turtle to avoid extraneous node ids -- using [] and () whenever possible
> -- it's not a new or special or obscure thing.
> 
> This approach has the feature that you *can* store your multi-graph
> dataset in a single graph, but that doesn't mean you ever have to do
> that.  Just as you parse TriG with a special parser and load the data
> into a quad-store, if you're doing serious multi-graph work, you would
> presumably keep using a special parsing system and special store, for
> efficiency, even though with this approach you don't really have to.
> 
>    -- Sandro (who can't believe he's defending RDF reification...)
> 
>

Received on Friday, 8 April 2011 19:12:54 UTC