Re: Problem with auto-generated fragment IDs for graph names

* Pat Hayes <phayes@ihmc.us> [2013-02-13 23:16-0600]
> Manu, let me try to put the other case, in terms that approximate your self-confidence that you must be right. Obviously I am speaking here as an individual, not on behalf of the WG.
> 
> On Feb 13, 2013, at 9:24 PM, Manu Sporny wrote:
> 
> > On 02/13/2013 05:11 PM, Richard Cyganiak wrote:
> >> PROPOSAL: Put @id on all graphs.
> >> 
> >> Why the aversion against simple and obvious solutions?
> > 
> > The simple and obvious solution you propose is wrong for developers.
> 
> For all developers? That seems like a rather strong claim. 
> 
> > 
> > It attempts to side-step an arbitrary constraint imposed on developers
> > by RDF Concepts by making developers lives harder. Worse, it ignores the
> > reality of transient messages, including transient RDF Datasets that
> > must be identified with document-local identifiers if the digital
> > signatures are going to work out.
> 
> Well, this is the first time I have heard of "transient RDF". RDF, as far as I have always understood, was never intended to be transient. It is intended for publishing data on the Web. So it sounds as though you are simply using it for a purpose for which it was not designed, and never intended to be used. Perhaps your problems may arise from this mismatch between the intentions of the designers and your planned use.

I'd characterize this more as "quoting RDF", which we've been wrestling with since the beginning. I'm motivated to fix this not because of an interest in JSON-LD or Web Payments, but because quoting is a universal need:
  Bob says "the moon is made of green cheese".

In the old days, the party line was that one uses reification for signing:
  _:statement1 dc:author "Bob" ;
               rdf:subject :TheMoon ;
	       rdf:predicate :madeOf ;
	       rdf:object :greenCheese .

The analog in named graphs would be a bnode-labeled graph:
  _:statement1 dc:author "Bob" . _:statement1 { :TheMoon :madeOf :greenCheese } .

Except we've recently decided not to allow bnodes as graph labels, so:
  <statement1> dc:author "Bob" . <statement1> { :TheMoon :madeOf :greenCheese } .


Normally, we shake a finger at someone who invents URLs that they don't intend to honor. Why is this case different?


> > Look at this from the standpoint of a Web Payments message. Something
> > that is completely transient, but needs to be digitally signed:
> > 
> > [{
> >  "@graph": {
> >    "source": "http://mybank.com/accounts/manu",
> >    "destination": "http://yourbank.com/accounts/richard",
> >    "amount": "5.00",
> >    "currency": "USD"
> >  }
> > },{
> >  "@graph": {
> >    "source": "http://mybank.com/accounts/manu",
> >    "destination": "http://yourbank.com/accounts/kingsley",
> >    "amount": "5.00",
> >    "currency": "USD"
> >  }
> > }]
> > 
> > You are stating that instead of doing the thing above, that we have to
> > now require all developers to generate identifiers for that dataset by
> > specifying an IRI for each graph:
> 
> But is this such a an onerous condition? Applications autogenerate IRIs all the time. All that is *required* is that they satisfy the syntax requirements of the IRI spec, which are quite remarkably slack. There is no requirement that your generated IRIs have to be anything like as long as the ones in your examples, below. You were going to have to generate bnodeIDs in any case. Seems to me that the difference between generating "_:<somethingUnique>" and "baz:<somethingUnique>" (or for that matter "http://<somethingUnique>") is pretty unimportant. 
> 
> The other observation I would make is that if these things really are transient, and if they will be used only internally to your application system and will never be published on the Web, then the RDF and SPARQL normative conditions really do not apply. You can make up your own syntax and use it any way you want. Why are you even talking about the RDF specifications, when you are apparently not going to use this stuff in any context - that is, Web publication - where the RDF specifications apply?
> 
> > [{
> >  "@id": "http://payswarm.com/transients#graph-38234jlkfsj9834u",
> >  "@graph": {
> >    "source": "http://mybank.com/accounts/manu",
> >    "destination": "http://yourbank.com/accounts/richard",
> >    "amount": "5.00",
> >    "currency": "USD"
> >  }
> > },{
> >  "@id": "http://payswarm.com/transients#graph-38234jlkfsj9834u",
> >  "@graph": {
> >    "source": "http://mybank.com/accounts/manu",
> >    "destination": "http://yourbank.com/accounts/kingsley",
> >    "amount": "5.00",
> >    "currency": "USD"
> >  }
> > }]
> > 
> > Why make developers jump through hoops because of some deficiency in
> > RDF?
> 
> Why do you characterize it as a deficiency? It is not a deficiency to require named graphs to have names. What you want to do is something that does not make sense (apparently, anyway) in the context of RDF datasets. It is not a deficiency in a hammer that it cannot be easily used as a screwdriver. 
> 
> > They don't have to do this for JSON. What we're proposing is that
> > we can auto-generate the IDs to get around RDFs deficiency by using
> > "graph:" IRIs, but only when we HAVE to serialize down to another RDF
> > serialization format (like NQuads, which we have to do when doing the
> > RDF Graph Normalization stuff). So, JSON-LD developers can happily use
> > the first bit of markup and can remain completely unaware that graph
> > name identifiers are automatically created for them when they normalize
> > to the NQuad serialization format:
> 
> I like the idea of allowing JSON developers to stay with JSON. That seems to solve the problems quite nicely. Why do we need to discuss this any further?
> 
> > 
> > _:c14n1
> >  <https://example.com/vocab#source>
> >    <http://mybank.com/accounts/manu>
> >      <graph:1> .
> > _:c14n1
> >  <http://example.com/vocab#destination>
> >    <http://yourbank.com/accounts/richard>
> >      <graph:1> .
> > _:c14n1
> >  <http://example.com/vocab#amount>
> >    "5.00"
> >      <graph:1> .
> > _:c14n1
> >  <http://example.com/vocab#currency>
> >    "USD"
> >      <graph:1> .
> > _:c14n2
> >  <https://example.com/vocab#source>
> >    <http://mybank.com/accounts/manu>
> >      <graph:2> .
> > _:c14n2
> >  <http://example.com/vocab#destination>
> >    <http://yourbank.com/accounts/kingsley>
> >      <graph:2> .
> > _:c14n2
> >  <http://example.com/vocab#amount>
> >    "5.00"
> >      <graph:2> .
> > _:c14n2
> >  <http://example.com/vocab#currency>
> >    "USD"
> >      <graph:2> .
> > 
> >> You seem to consistently choose the path of greatest resistance.
> > 
> > I consistently reject solutions that are anti-developer or anti-author. :)
> > 
> > I want people to look at RDF and say "Oh, that makes sense." instead of
> > "WTF? Why do I have to explicitly name graphs in certain cases when that
> > requirement doesn't exist at all for blank nodes?!"
> 
> Anyone who asks this question apparently does not fully understand what is meant by a blank node. Maybe you need to re-read the RDF specs.
> 
> > 
> > This WG is punting on trying to solve the problem of document-local
> > identifiers. I get that.
> 
> No, the WG has taken the idea of a named graph, used by the SPARQL WG to help define datasets, and run with it. You apparently have a completely different idea in mind, not a dataset. I encourage you to take this new idea and run with it. But it is a new idea. 
> 
> > There are, however, repercussions for doing so.
> > I was asked to go back and think about using fragment identifiers as
> > auto-generated graph names. After discussing it with our CTO, it became
> > clear that fragment identifiers for graph names expose a particularly
> > problematic serialization issue when serializing without a document
> > base. That is, it isn't clear whether this will be viewed as valid in a
> > quad-store:
> > 
> > _:foo <http://example.org/bar> _:baz <#_graph:1> .
> > 
> > The quad above is digitally signed in the Web Payments work without a
> > base IRI. It is important that all processors that process it DO NOT
> > add a base IRI, otherwise the signatures will no longer match the
> > data in the quad store. However, <#_graph:1> isn't an absolute IRI and
> > is thus invalid in the RDF model. So, the only solution that we can see
> > is to use an absolute IRI that is meant to be interpreted as a
> > document-local identifier:
> > 
> > _:foo <http://example.org/bar> _:baz <graph:1> .
> > 
> > The above works, but has the downside of needing a new IRI scheme, which
> > none of us want, but hey, that's the best option we have right now
> > beside this one:
> > 
> > _:foo <http://example.org/bar> _:baz <_:graph1> .
> > 
> > ... which is what we had been using for the past two years before
> > realizing that RDF Concepts forbids that sort of thing.
> 
> RDF Concepts doesnt forbid anything. It may say that this is not legal RDF dataset, but still *you* can use it. Just don't publish it and call it a dataset. 
> 
> > This would be
> > the ideal solution if it weren't for the limitation imposed by the set
> > of RDF documents that assign special meaning to "_:" and restrict its
> > usage to be only for blank node identifiers and not also for
> > document-local identifiers.
> 
> The "_:" syntax is not part of the RDF model and it is not part of the graph syntax. RDF does not "restrict its use to not be used for document-local identifiers". RDF simply does not have ANY notion of a document-local identifier AT ALL. Maybe it should, but that would be a major change to the RDF model and would have repercussions all the way through the specs. 
> 
> It would have been nice if you had raised this earlier. I appreciate your frustration at only recently discovering that you apparently had not properly read the RDF specs, but that is hardly a problem that the WG has to take responsibility for fixing. 
> 
> Pat
> 
> > 
> > -- manu
> > 
> > -- 
> > Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
> > Founder/CEO - Digital Bazaar, Inc.
> > blog: Aaron Swartz, PaySwarm, and Academic Journals
> > http://manu.sporny.org/2013/payswarm-journals/
> > 
> > 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 

-- 
-ericP

Received on Thursday, 14 February 2013 14:02:34 UTC