Re: Problem with auto-generated fragment IDs for graph names from Richard Cyganiak on 2013-02-13 (public-rdf-wg@w3.org from February 2013)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 13 Feb 2013 22:11:19 +0000
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: RDF WG <public-rdf-wg@w3.org>, Linked JSON <public-linked-json@w3.org>
Message-Id: <B79343E9-431A-43C2-9C75-D08AE0C2675A@cyganiak.de>
Manu,

PROPOSAL: Put @id on all graphs.

Why the aversion against simple and obvious solutions? You seem to consistently choose the path of greatest resistance.

Best,
Richard



On 13 Feb 2013, at 19:50, Manu Sporny wrote:

> We had a conversation about using auto-generated fragment identifiers
> for graph names during the call today. We have found a problem with
> that solution - it's incompatible with RDF when the document doesn't
> have a base IRI. In the case of the Web Payments work, the document
> MUST NOT have a base IRI because the message is transient.
> 
> [Wed 12:34] <manu> cygri: Can you express a relative IRI in an RDF
> serialization w/o a base? Is this valid: <#foo> foaf:knows _:bar . (if
> there is no base set for the document?)
> [Wed 12:34] <manu> cygri, gkellogg, markus: This is what we're thinking
> right now - for RDF Dataset Normalization: Fragment identifiers may be
> used to name graphs that do not have a name associated with them in the
> model. If a name is generated for a graph, the prefix '#_graph:' MUST be
> used and that document-local identifier MAY be changed by processing
> algorithms such as the RDF Dataset Normalization Algorithm.
> [Wed 12:34] <cygri> manu, it is valid, however the document has an
> implicit base in this case
> [Wed 12:35] <manu> cygri: What's the implicit base?
> [Wed 12:35] <cygri> manu, short answer: the document location
> [Wed 12:35] <manu> cygri: What happens if the document doesn't have a
> location? It's a transient message. :)
> [Wed 12:35] <cygri> the IRI RFC has a long section about it
> [Wed 12:36] <cygri> in that case different implementations do different
> things
> [Wed 12:36] <cygri> (actually it's in the URI RFC - 3986)
> [Wed 12:37] |<-- davidwood has left irc.w3.org:6665 (Ping timeout: 60
> seconds)
> [Wed 12:37] <cygri> manu, you say: "If a name is generated for a graph,
> the prefix 'irc://irc.w3.org/#_graph:' MUST be used"
> [Wed 12:37] <cygri> who is doing the generation in that case?
> [Wed 12:37] <manu> cygri: Well, what I really want to know is if this is
> okay to output in NQuads during RDF Dataset Normalization: _:foo
> foaf:knows :_bar <#_graph:1> . ?
> [Wed 12:37] <manu> cygri: Well, we have to ensure that those sorts of
> graph names MUST be able to be renamed by the RDF Dataset Normalization
> algorithm.
> [Wed 12:38] <manu> also, we have use cases where it doesn't make sense
> to have any sort of document base - everything is transient (as in a
> financial message sent from point A to point B.
> [Wed 12:38] <cygri> i would say that in JSON-LD, fragments of the shape
> #_graph: are reserved for the alogrithm
> [Wed 12:39] <manu> cygri: Yes, but that wouldn't apply to just JSON-LD,
> it would apply to RDF Dataset Normalization as well.
> [Wed 12:39] <manu> If we choose that, all of RDF will have to use it.
> [Wed 12:40] <gkellogg> I commonly output things such as <#_graph:1> in
> Turtle, and count on the parser having a document base to make it absolute.
> [Wed 12:40] <manu> and, we would generate the following NQuads (and it
> would have to be viewed as valid - no base document): _:foo foaf:knows
> :_bar <#_graph:1> .
> [Wed 12:40] <gkellogg> I don't tend to do this in NTriples, though, or
> in N-Quads, but I don't see why not.
> [Wed 12:40] <manu> gkellogg: Yes, but in our case, we specifically don't
> want a document base because there isn't one.
> [Wed 12:41] <manu> gkellogg: My concern is more theoretical - is
> <#_graph:1> valid RDF if there is no base document?
> [Wed 12:41] <gkellogg> My processors, if they don't have a base, just
> continue to use relative IRIs.
> [Wed 12:41] <cygri> classic n-triples doesn't have relative IRIs, so you
> need to write out the full ones. we talked about changing that but i'm
> not sure where that went, so am not sure about n-quads
> [Wed 12:41] <manu> gkellogg: Right, we can do that too - but is it valid?
> [Wed 12:41] <cygri> however in turtle and rdf/xml you can simply write
> relative IRIs in your doc, and not specify a base, and it will work
> [Wed 12:41] <gkellogg> In a document it's fine, it's just when parsed
> with ought a base that there is an issue.
> [Wed 12:41] <gkellogg> Right.
> [Wed 12:42] <cygri> i'm not sure what it means when you say, "if we do
> that, all of RDF has to use that"
> [Wed 12:42] <gkellogg> Most processors should be able to parse documents
> without an explicit base. It is certainly done in testing all the time.
> [Wed 12:42] <cygri> the issue seems to be JSON-LD specific because no
> other syntax wants to have named graphs without explicit names
> [Wed 12:42] <manu> cygri: We're defining the "RDF Dataset Normalization
> Algorithm", not the "JSON-LD Normalization Algorithm"
> [Wed 12:42] <gkellogg> I don't think RDF requires that IRIs be absolute,
> does it?
> [Wed 12:43] <cygri> gkellogg, the RDF data model requires IRIs to be
> absolute
> [Wed 12:43] <cygri> but that doesn't mean they have to be absolute in
> surface syntaxes
> [Wed 12:43] <cygri> it means if you want to know what RDF graph exactly
> it is, you need a base
> [Wed 12:43] <gkellogg> Well, then there is an issue for normalization?
> [Wed 12:44] <cygri> you can define normalization in terms of relative IRIs
> [Wed 12:44] <manu> our CTO: This is not inspiring confidence in the
> decision to use fragment identifiers as auto-generated graph names.
> [Wed 12:45] <gkellogg> Actually, I think it's fine; just use N-Quads as
> the serialization format. The normalization can determine how to handle
> base-less documents.
> [Wed 12:46] <cygri> sorry, i have to run
> [Wed 12:46] <manu> gkellogg: What does it do for base-less documents?
> [Wed 12:46] <manu> cygri, no problem - thanks for the input.
> [Wed 12:46] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
> [Wed 12:46] <manu> gkellogg: base-less documents are invalid from an RDF
> data model perspective.
> [Wed 12:47] <gkellogg> N-Quads has no problem, you just need to figure
> out what to do in the normalization algorithm. You probably don't
> require that N-Quads express absolute IRIs in this case.
> [Wed 12:47] <cygri> there's a note on relative IRIs in this section:
> http://www.w3.org/TR/rdf11-concepts/#section-IRIs
> [Wed 12:47] <gkellogg> Only as an abstract model, not a concrete model,
> AFAIU
> [Wed 12:47] <cygri> not sure if that helps
> [Wed 12:47] <cygri> anyway, ttyl
> [Wed 12:47] <manu> gkellogg: Yeah, we can do that - but it's invalid
> RDF, isn't it? To flip the argument - you can express everything as
> document-local using blank nodes.
> [Wed 12:47] |<-- cygri has left irc.w3.org:6665 (cygri)
> [Wed 12:49] <gkellogg> It's not syntactically invalid. It's just a
> matter of how you go from concrete (N-Quads) to abstract (RDF-Concepts).
> I either provide a base IRI to the parser, or I just continue to use
> relative IRIs, which isn't a problem in practice.
> [Wed 12:49] <gkellogg> What do you do with an RDFa document if you don't
> have a document base?
> [Wed 12:49] <manu> but there is no way to express everything as
> document-local if you have two graphs... that is, RDF has no way to
> express transient messages if there is more than one graph in the dataset.
> [Wed 12:50] <manu> that's the crux of the problem, here. I think.
> [Wed 12:50] -->| cygri (~cygri@public.cloak) has joined #rdf-wg
> [Wed 12:50] <manu> gkellogg: Well, the problem here is that all the
> normalizers need to agree on what to do in this case... but the "right
> thing" to do is to not associate it with a base document, because there
> isn't one for the transient message.
> [Wed 12:50] <gkellogg> What I heard is that FragIDs are considered to be
> document-local.
> [Wed 12:50] <manu> yes, that's true, but that's not the issue. :)
> [Wed 12:50] <manu> Here's the problem:
> [Wed 12:51] <manu> In PaySwarm, we have many digitally signed messages
> that are completely transient.
> [Wed 12:51] <gkellogg> Why can't the normalization algorithm be written
> to work on relative IRIs?
> [Wed 12:51] <manu> That is, there is absolutely no base - there could
> never be a base.
> [Wed 12:51] <manu> because the message is transient.
> [Wed 12:52] <gkellogg> Understood. Express it as N-Quads using relative
> IRIs, and allow Normalization to work with relative IRIs.
> [Wed 12:52] <manu> We then express that transient message in RDF in
> order to digitally sign it... something like (and this is exactly what
> would be signed): _:foofoaf:knows :_bar < #_graph:1> .
> [Wed 12:52] <gkellogg> From concepts, that just means that the graph is
> not "well-defined"
> [Wed 12:53] <manu> (well, except foaf:knows" would be an absolute IRI)
> [Wed 12:53] <gkellogg> I'm not saying there are no absolute IRIs, I'm
> just saying that you tolerate relative IRIs, and that normalization is
> defined to work in the absence of an implicit base IRI.
> [Wed 12:53] <manu> and technically, the NQuad output would be this:
> _:c14n1 <http://xmlns.com/foaf/0.1/> _:c14n2 <#_graph:1> .
> [Wed 12:53] <gkellogg> Right
> [Wed 12:54] <manu> right, but I'm concerned that RDF WG members are
> going to argue that the above is an invalid document if base isn't defined.
> [Wed 12:55] <manu> (because relative IRIs are not allowed in the RDF
> data model) ... or they're "not well-defined".
> [Wed 12:55] <gkellogg> It's explicitly not an invalid document,
> according to concepts. It just isn't a "well-defined" graph without it.
> [Wed 12:55] <gkellogg> Doesn't mean it's invalid.
> [Wed 12:55] <manu> Okay, but if we did this:
> [Wed 12:55] |<-- cygri has left irc.w3.org:6665 (cygri)
> [Wed 12:55] <manu> _:foo foaf:knows :_bar <urn:graph:1> .
> [Wed 12:55] <manu> everything would be just fine.
> [Wed 12:55] <manu> (from an RDF perspective)
> [Wed 12:55] <gkellogg> Ues
> [Wed 12:55] <gkellogg> Yes
> [Wed 12:56] <manu> right, so, are fragment identifiers the correct
> solution in this case?
> [Wed 12:56] <gkellogg> Requires minting an IRI (or URN) scheme, which
> there was some resistance to.
> [Wed 12:56] <manu> because they lead to graphs that are not well defined.
> [Wed 12:56] <gkellogg> I think using fragids is the direction of the
> group, and IMO the right way to go.
> [Wed 12:56] <manu> well, at least we don't end up with well-defined
> graphs in the case where we mint a new IRI/URN scheme...
> [Wed 12:57] <gkellogg> They're only not well-defined if you don't
> provide a document base. You can define normalization to not require a
> document base.
> [Wed 12:57] |<-- davidwood has left irc.w3.org:6665 (Client closed
> connection)
> [Wed 12:57] <manu> seems very hacky to me, the solution isn't very
> clean... too much uncertainty about what it means in RDF.
> [Wed 12:57] <gkellogg> If you think about it, normalization is just a
> concrete normalization step. The resulting graph will be well-defined if
> it is used with a base IRI
> [Wed 12:58] <manu> gkellogg: yes, but if you use a base IRI, none of the
> digital signatures will work anymore - the data will be wrong.
> [Wed 12:58] <manu> gkellogg: In order for the digital signature to work
> out, you must specifically NOT use a base IRI.
> [Wed 12:59] <gkellogg> The point is, it's fine if the dataset is not
> well-defined. It can always be well-defined at a theoretical later date.
> [Wed 12:59] <manu> or it could be defined in a way that breaks all the
> digital signatures at a later date.
> [Wed 12:59] <gkellogg> Just specify that in the algorithm.
> [Wed 12:59] <gkellogg> Normalization MUST NOT use a base IRI to ground
> the input document.
> [Wed 13:01] <gkellogg> I'd say, write it up, send it to the RDF WG
> mailing list, and see if someone raises objections. Given that it was
> the direction given today, I think it's a reasonable way to go.
> [Wed 13:02] <manu> gkellogg: yeah, will do that - thanks.
> [Wed 13:08] <markus> manu: who creates the named graphs? i.e., we do
> they come from?
> [Wed 13:09] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
> [Wed 13:10] <markus> I don't feel comfortable with minting frag IDs for
> unlabeled named graphs at all
> [Wed 13:12] <Zakim> SW_RDFWG()11:00AM has ended
> [Wed 13:12] <Zakim> Attendees were
> [Wed 13:12] <Zakim> Zakim-bot will be restarted in 3 minutes to recover
> caller state; please save your agenda status. Apologies for the
> inconvenience
> [Wed 13:12] <markus> manu, gkellogg: actually there's another issue. if
> you mint fragIds for unlabeled named graphs, you mint a fragId for the
> blank node at the same time.. { "property": "this is a blank node that
> is also a graph", "@graph": [ ... ] }
> [Wed 13:13] <gkellogg> Yes, that's true no mater what we do.
> [Wed 13:14] <markus> so it won't work AFAICS.. the only clean solution
> is to require named graphs to be labeled with an IRI (since we are not
> allowed to use bnodeIds)
> [Wed 13:14] <markus> It's still not clear to me where the unlabeled
> named graphs come from in the first place.. who is the creator of the
> dataset containing them?
> [Wed 13:18] |<-- davidwood has left irc.w3.org:6665 (Client closed
> connection)
> [Wed 13:18] * Zakim is departing
> [Wed 13:18] |<-- Zakim has left irc.w3.org:6665 ("Leaving")
> [Wed 13:22] -->| cygri (~cygri@public.cloak) has joined #rdf-wg
> [Wed 13:23] |<-- SteveH has left irc.w3.org:6665 (SteveH)
> [Wed 13:24] -->| Zakim (zakim@public.cloak) has joined #rdf-wg
> [Wed 13:28] <manu> markus: The PaySwarm software creates the "unnamed"
> named graphs when it needs to communicate with another peer on the
> network. The message is purely transient, there is no base document.
> [Wed 13:29] <manu> markus: requiring that all named graphs be labeled
> with an IRI doesn't make sense at all to us - every message sent across
> PaySwarm now needs to have a name associated with it? Why do that when
> we can automatically generate a name?
> [Wed 13:29] <markus> can't the software create an IRI that is guaranteed
> to not collide?
> [Wed 13:29] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
> [Wed 13:30] <manu> Markus: Be more specific about the IRI that is being
> created - is it something like this: http://example.com/data#graph1 or
> is it something like this: graph:1 ?
> [Wed 13:30] |<-- cygri has left irc.w3.org:6665 (Ping timeout: 60 seconds)
> [Wed 13:31] <markus> I agree, it should be possible to have multiple
> *un*named graphs.. but that's apparently not happening in this version
> of RDF
> [Wed 13:32] <manu> markus: The software can create an IRI, but that IRI
> must have two very special properties to work for the digital signature
> case: 1) It MUST be a document-local identifier that when expressed in
> NQuads is valid in the RDF data model, 2) It must be able to be re-named
> by the RDF Dataset Normalization Algorithm.
> [Wed 13:32] <markus> well.. couldn't the payswarm software mint some
> IRIs in a dedicated space.. e.g., http://payswarm.org/.../names/....
> [Wed 13:32] <markus> what requires it to be document-local?
> [Wed 13:32] <manu> markus: This isn't about PaySwarm - it's about RDF
> Dataset Normalization - what does /that/ algorithm do... PaySwarm can do
> anything it wants to, but we want to do something that is going to
> eventually be standardized.
> [Wed 13:32] <manu> markus: The message is transient, it has no document.
> [Wed 13:32] <markus> if it isn't, there is no reason to re-name it
> [Wed 13:33] <manu> markus: You /have/ to be able to rename it for the
> RDF Dataset Normalization algorithm to work...
> [Wed 13:33] <markus> In RDF datasets all named graphs are required to
> have names, so the problem is not there but in JSON-LD (and in payswarm)
> [Wed 13:33] <manu> the whole purpose of the algorithm is to re-name
> anything that is document local in a very specific way.
> [Wed 13:34] <markus> yes, but in RDF graph names are *not* document local
> [Wed 13:34] <manu> that's exactly the problem!
> [Wed 13:34] <markus> so you'll never ever have to rename them
> [Wed 13:34] <markus> which brings as back to my previous question. what
> requires them to be document-local?
> [Wed 13:34] <manu> Markus, this is what you're suggesting: Transient
> messages that are transmitted from point A to point B MUST be given names.
> [Wed 13:35] <manu> Is that what you're asserting?
> [Wed 13:35] <markus> no.. the graphs in those messages must be given names
> [Wed 13:35] <manu> Why?
> [Wed 13:35] <manu> I don't have to do that with any other transient
> protocol I use.
> [Wed 13:36] <manu> I definitely don't have to do that w/ JSON - so why
> do I have to do that with RDF?
> [Wed 13:36] <markus> well.. that's the underlying data model.. other
> data models don't require IRIs at all e.g. We drop properties which are
> not mapped to IRIs e.g.
> [Wed 13:36] <markus> I see only one solution to that.. change the data model
> [Wed 13:37] <manu> Yes, but the underlying data model is completely
> flawed if I can't express messages transiently! :)
> [Wed 13:37] <markus> JSON-LD's data model supports it but then obviously
> you can't round-trip to RDF
> [Wed 13:37] <manu> No, there are multiple solutions to this problem...
> [Wed 13:38] <manu> The only thing we need to make sure is that the
> auto-generated graph identifer 1) MUST be a document-local identifier
> that when expressed in NQuads is valid in the RDF data model, 2) MUST be
> able to be re-named by the RDF Dataset Normalization Algorithm.
> [Wed 13:38] <markus> can you enumerate them?
> [Wed 13:38] <manu> There are two possibilities here: <#_graph:1> and graph:1
> [Wed 13:39] <manu> The first fails requirement #1
> [Wed 13:39] <manu> The second passes both requirement #1 and #2
> [Wed 13:39] <markus> 1) is impossible, because graph names MUST be
> absolute IRIs in RDF (unless you change the RDF's data model)
> [Wed 13:39] <manu> graph:1 is an absolute IRI :)
> [Wed 13:40] <markus> I've never heard of document local IRIs.. the whole
> point of IRIs is that they are global
> [Wed 13:40] <markus> s/document local IRIs/document-local absolute IRIs/
> [Wed 13:41] <markus> there might be a third option.. keep everything as
> is but perform the RDF Dataset serialization algorithm on flattened JSON-LD
> [Wed 13:42] <markus> data coming from RDF will never have bnodes as
> graph names
> [Wed 13:42] <markus> data coming from JSON-LD might, but that's all
> handled within JSON-LD
> [Wed 13:43] <markus> the byte-stream you sign would look a slightly
> different.. but who cares?
> [Wed 13:43] <markus> since JSON-LD is a superset of RDF that would work
> in all situations I can think of
> [Wed 13:45] <markus> the only thing that wouldn't.. is to represent data
> using bnodes in graph names in plain-RDF.. but that's due to a
> limitation of RDF
> [Wed 13:47] * manu is thinking about markus' suggestion.
> [Wed 13:50] <manu> markus: I think you're wrong re: "the whole point of
> IRIs is that they are global" - search RFC 3987 for the word "global" or
> "universal" and you won't find it used in the way that you use it, IIRC.
> [Wed 13:52] |<-- davidwood has left irc.w3.org:6665 (Client closed
> connection)
> [Wed 13:53] <markus> for these kind of things you should always look at
> the URI RFC
> [Wed 13:54] <markus> RFC 3986: "URIs have a global scope and are
> interpreted consistently regardless of context, though the result of
> that interpretation may be in relation to the end-user's context"
> [Wed 13:54] <markus> I think that's quite clear
> [Wed 13:55] <manu> global scope !== global identifier
> [Wed 13:55] <manu> while the scope may be global (it is)
> [Wed 13:55] <markus> ... "For example, "http://localhost/" has the same
> interpretation for every user of that reference, even though the network
> interface corresponding to "localhost" may be different for each
> end-user: interpretation is independent of access"
> [Wed 13:55] <manu> the end-users' context in this case is the document.
> [Wed 13:55] <manu> and graph:1 is interpreted via that context.
> [Wed 13:55] <markus> no.. it's an identifier that has a global scope
> [Wed 13:56] <manu> markus: Yes, exactly the same case as localhost.
> [Wed 13:56] <manu> localhost is always interpreted via your network.
> [Wed 13:56] <manu> graph:1 is always interpreted via your JSON-LD processor.
> [Wed 13:56] <manu> (and the JSON-LD processor chooses to interpret it as
> document-local)
> [Wed 13:56] <markus> no.. the interpretation (and that's what RDF is all
> about) is the same.. it's the loca machine
> [Wed 13:56] <markus> accessing it will lead to different results
> [Wed 13:57] <markus> thus "interpretation is independent of access"
> [Wed 13:58] -->| dlongley (~dlongley@public.cloak) has joined #rdf-wg
> [Wed 13:59] <dlongley> markus: manu let me know about the graph naming
> discussion going on in here
> [Wed 13:59] <dlongley> and your suggestion to normalize using JSON-LD as
> the serialization
> [Wed 14:00] <dlongley> the problem with that approach is that you
> couldn't transmit the data you signed via another RDF serialization
> [Wed 14:00] <dlongley> because it's a data model problem
> [Wed 14:00] <dlongley> you signed data that can't be appropriately
> represented in RDF
> [Wed 14:00] <markus> yes, that's the whole point
> [Wed 14:00] <dlongley> that isn't a solution to the problem
> [Wed 14:01] <dlongley> particularly for payswarm... where RDFa is used
> heavily as a serialization
> [Wed 14:01] <markus> well in RDF you can't do it because no
> document-local identifiers are allowed as graph names
> [Wed 14:01] <dlongley> for previously signed graphs
> [Wed 14:01] <markus> rdf is not a dataset syntax
> [Wed 14:01] <markus> sorry, I meant RDFa
> [Wed 14:01] <dlongley> if you generated some data with unnamed graphs
> and then signed it using JSON-LD ...
> [Wed 14:02] <dlongley> how could you represent the signed data using RDFa?
> [Wed 14:02] <markus> you can't represent graphs at all in RDFa
> [Wed 14:02] <dlongley> at this time you can't put named graphs in RDFa
> [Wed 14:02] <dlongley> but that will likely not always be the case
> [Wed 14:02] <markus> it's a graph syntax, not a dataset syntax
> [Wed 14:02] <dlongley> ok, red herring.
> [Wed 14:02] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
> [Wed 14:02] <dlongley> pick a dataset syntax.
> [Wed 14:03] <dlongley> now you can't transmit the data using that syntax.
> [Wed 14:03] <manu> markus: RDFa /will/ be a Dataset syntax eventually -
> within a couple of years.
> [Wed 14:03] <markus> manu: with bnodes as graph names?
> [Wed 14:04] <manu> markus: With dataset-local identifiers, hopefully, yes.
> [Wed 14:04] <markus> the point is, in any RDF dataset syntax there won't
> exist any named graphs without an absolute IRI
> [Wed 14:04] <markus> at least not before the RDF data model is changed
> [Wed 14:04] <manu> markus: Why do you think that graph:1 isn't an
> absolute IRI?
> [Wed 14:05] <markus> it is an absolute IRI
> [Wed 14:05] <manu> Just because it's a dataset-local identifier doesn't
> mean it isn't also an absolute IRI (stretching definitions here, I know)
> [Wed 14:05] <manu> Okay, then the RDF data model doesn't need to change?
> [Wed 14:05] <markus> we had that discussion before.. absolute IRIs have
> global scope, see RFC3986
> [Wed 14:06] <manu> you can have global scope and interpret the
> identifier based on a local context (the document context, in this example)
> [Wed 14:06] <dlongley> "An identifier embodies the information required
> to distinguish what is being identified from all other things within its
> scope of identification."
> [Wed 14:06] <markus> what I'm saying is that if you stay within RDF you
> won't have a problem normalizing/signing since no unlabeled named graphs
> exist
> [Wed 14:07] <dlongley> the scope of identification for "graph:" is the
> local document
> [Wed 14:07] <manu> and in this case, the scope is the document iself.
> [Wed 14:07] <markus> the problem arises since you wanna create named
> graphs but don't want to name them
> [Wed 14:07] |<-- davidwood has left irc.w3.org:6665 (Client closed
> connection)
> [Wed 14:07] <manu> markus: No, we never said we don't want to name them
> /when they're serialized to NQuads"
> [Wed 14:07] <manu> we just don't want to name them before that.
> [Wed 14:08] <markus> dlongley: manu and I discussed this before. RFC
> 3986: "URIs have a global scope and are interpreted consistently
> regardless of context, though the result of that interpretation may be
> in relation to the end-user's context"
> [Wed 14:08] <manu> naming them is a part of the RDF Dataset
> Normalization Algorithm.
> [Wed 14:08] <markus> .. "For example, "http://localhost/" has the same
> interpretation for every user of that reference, even though the network
> interface corresponding to "localhost" may be different for each
> end-user: interpretation is independent of access"
> [Wed 14:08] <markus> the interpretation (and that's what RDF is all
> about) is the same.. it's the loca machine
> [Wed 14:08] <markus> accessing it will lead to different results
> [Wed 14:08] <markus> thus "interpretation is independent of access"
> [Wed 14:08] <manu> markus: Yes, exactly
> [Wed 14:09] <manu> you are interpreting it via the JSON-LD processor,
> not "The Web"
> [Wed 14:09] <markus> manu: I disagree.. naming them is part of JSON-LD
> to RDF transformation
> [Wed 14:10] <markus> that's also the reason why we currently can't
> roundtrip that kind of data
> [Wed 14:10] <markus> because you can't represent it in RDF
> [Wed 14:10] <manu> What can't you represent in RDF?
> [Wed 14:11] <markus> I know I asked that already some time ago.. but are
> you really dealing with datasets in payswarm of with graphs?
> [Wed 14:11] <markus> I still haven't looked at the specs
> [Wed 14:11] <markus> but it seems that the graph name isn't important
> [Wed 14:12] <manu> We are deciding to throw an error if somebody tries
> to use something other than the default graph for now, because I can't
> imagine this problem will be solved soon.
> [Wed 14:12] <markus> are there multiple graphs that you need to sign?
> [Wed 14:12] <manu> however, if you are to do digital signatures
> correctly (and represent them in RDF correctly), you should use named
> graphs and sign the named graph.
> [Wed 14:12] <manu> and yes, there may be multiple graphs that we need to
> sign.
> [Wed 14:12] <markus> in one document
> [Wed 14:12] <markus> ?
> [Wed 14:12] <manu> yep
> [Wed 14:13] <dlongley> we need to sign arbitrary JSON-LD.
> [Wed 14:13] <manu> for example: a multi-party digital contract that is
> counter-signed by the PaySwarm Authority.
> [Wed 14:13] <dlongley> if JSON-LD supports it, we need to be able to
> sign it.
> [Wed 14:13] <markus> ok.. what if we would drop support for bnode IDs as
> graph names in JSON-LD?
> [Wed 14:14] <dlongley> graphs have to be given names in order to
> normalize them
> [Wed 14:14] <markus> have you a flowchart or something were I could
> quickyl get an idea of the data-flows between the participants?
> [Wed 14:14] <manu> markus: We can't use BNode IDs as graph names in
> JSON-LD, right?
> [Wed 14:14] <markus> we can, currently
> [Wed 14:14] <manu> markus: Ha - no, unfortunately not right now.
> [Wed 14:15] <markus> but it doesn't round-trip to RDF
> [Wed 14:15] <manu> markus: Well, the RDF WG isn't going to let that fly
> because it doesn't match the definition of a blank node identifier.
> [Wed 14:15] <manu> at least, that's what I think a LC comment is going to be
> [Wed 14:15] <dlongley> here's what matters: in payswarm, we must be able
> to sign arbitrary JSON-LD documents.
> [Wed 14:15] <manu> you can't name graphs using bnode identifiers.
> [Wed 14:15] <dlongley> if someone can put an unnamed graph into a
> JSON-LD document, then that's a problem.
> [Wed 14:15] <markus> well.. when I presented the data model some time
> ago and enumerated the differences no one seemed to object
> [Wed 14:15] <markus> they accepted that JSON-LD will be a superset of RDF
> [Wed 14:15] <manu> (and digital signatures has almost nothing to do with
> blank node identifiers for graph names, btw)
> [Wed 14:16] <dlongley> there are 2 solutions: disallow unnamed graphs in
> JSON-LD, come up with a way to name the unnamed graphs using
> document-local identifiers that works for RDF.
> [Wed 14:16] <markus> that's what I proposed to manu earlier.. disallow
> unnamed graphs in JSON-LD
> [Wed 14:16] <dlongley> yes, and that's not the preferred solution
> [Wed 14:16] <dlongley> it's the fallback.
> [Wed 14:16] <manu> markus: I didn't object because I thought blank node
> identifiers could be used to name graphs for RDF (and that they were
> updating the spec to reflect that).
> [Wed 14:17] <dlongley> it would be much nicer if we didn't force people
> to name their unnamed graphs.
> [Wed 14:17] <markus> manu: I'm talking about half an hour ago :-P
> [Wed 14:17] <manu> I think it's ridiculous to tell people to include
> syntax that is completely unnecessary. :)
> [Wed 14:17] <manu> Why force people to name graphs when they don't need to?
> [Wed 14:17] <markus> dlongley: completely agree.. but that's apparently
> not something the RDF WG is going to accept
> [Wed 14:17] <dlongley> there may be a case where it also generates an
> issue for comparing two datasets
> [Wed 14:17] <manu> Right now, the answer is: Because the RDF data model
> says so - which is a really bad argument.
> [Wed 14:18] <manu> in fact, I outright reject that argument.
> [Wed 14:18] <markus> yes, but you can't have both.. either you change
> the RDF data model (which won't happen).. or you accept that the data
> won't round-trip
> [Wed 14:18] <dlongley> i'm not convinced that graph:1 won't work.
> [Wed 14:19] <manu> I think the real reason is that nobody in the RDF WG
> believes that we'll come to a consensus on this and that the group is
> exhausted after discussing the topic. There is no desire to address the
> problem.
> [Wed 14:19] <dlongley> i'm still trying to wrap my mind around it.
> [Wed 14:19] <manu> markus: Yes, I don't see why graph:1 can't work, and
> be compatible with the RDF 1.1 Concepts/Data Model
> [Wed 14:19] <markus> it works.. but you are automatically creating
> *global* identifiers.. nothing is there to prevent collissions..
> [Wed 14:19] <dlongley> i'm not convinced of that.
> [Wed 14:20] <manu> I do see why #_graph:1 is problematic (it's not valid
> for transient messages)
> [Wed 14:20] <dlongley> that's what i'm trying to wrap my mind around.
> [Wed 14:20] <dlongley> the analogy of "localhost" having a "global
> meaning" doesn't necessarily preclude the use case here
> [Wed 14:20] <markus> graph:1 is the same as minting
> http://payswarm.org/graph/1
> [Wed 14:20] <dlongley> "graph:1" has a global meaning ...
> [Wed 14:21] <markus> yes.. just as http://payswarm.org/graph/1
> [Wed 14:21] <dlongley> it's an identifier for the first graph in the
> document you're looking at.
> [Wed 14:21] <dlongley> it always means that.
> [Wed 14:21] * manu nods.
> [Wed 14:21] <dlongley> now... if you go and actually look at its data...
> [Wed 14:21] <markus> :-)
> [Wed 14:21] <dlongley> then you're talking about the result of the
> end-user's interpretation.
> [Wed 14:21] <dlongley> and that can change.
> [Wed 14:21] <dlongley> so, to me, that seems to work for RFC 3986
> [Wed 14:22] <markus> and if someone else makes statements about
> http://payswarm.org/graph/1 which conflict with your statements?
> [Wed 14:22] <markus> say, you put it in a quad store?
> [Wed 14:22] <markus> sorry.. same applies to graph:1
> [Wed 14:22] <dlongley> you mean like if someone says: "localhost/foo"
> and i don't have that on my machine?
> [Wed 14:23] <dlongley> seems like the same situation to me.
> [Wed 14:23] <markus> no.. that's accessing it.. not interpreting it
> [Wed 14:23] <markus> those are two different things
> [Wed 14:23] <dlongley> you're saying that someone can make a statement
> about localhost/foo ...
> [Wed 14:23] <markus> and have been debated to death (HTTP-14)
> [Wed 14:23] <dlongley> and it won't conflict with my own statements?
> [Wed 14:23] <dlongley> ever?
> [Wed 14:23] <markus> yes, whatever statement he likes
> [Wed 14:24] <markus> it will conflict with yours
> [Wed 14:24] <dlongley> right...
> [Wed 14:24] <markus> because URIs are global
> [Wed 14:24] <dlongley> and it's not a problem
> [Wed 14:24] <dlongley> you know what "localhost" means.
> [Wed 14:24] <dlongley> you know that "localhost", when accessed, means
> your local machine, nothing else
> [Wed 14:24] <dlongley> how is that any different for "graph:1"?
> [Wed 14:24] <markus> simplest thing.. import two datasets using those
> graph names into a RDF quad store
> [Wed 14:25] <markus> then do a SPARQL query for that graph name
> [Wed 14:25] <dlongley> localhost/1 and localhost/2
> [Wed 14:25] <markus> what will you get back?
> [Wed 14:25] <dlongley> everything that matches those graph names
> [Wed 14:25] <dlongley> just like you would with localhost
> [Wed 14:25] <markus> all statements made about every statement about
> every "first graph in a document" ever imported
> [Wed 14:26] <markus> exactly
> [Wed 14:26] <dlongley> what happens when someone uploads a dataset to a
> quad store that has a bunch of localhost URIs in it?
> [Wed 14:26] <markus> exactly the same thing
> [Wed 14:26] <dlongley> right
> [Wed 14:27] <dlongley> you are losing the "dataset"
> [Wed 14:27] <dlongley> when you do that.
> [Wed 14:27] <markus> you are losing the local scope you need
> [Wed 14:27] <dlongley> yeah, you can't use a quad store to solve that
> problem.
> [Wed 14:28] <markus> but to illustrate it
> [Wed 14:28] <dlongley> the problem here is that a graph isn't a node.
> [Wed 14:28] <dlongley> which, IMO, is the wrong way to go.
> [Wed 14:28] <markus> if bnodeIds would be allowed, the would be changed
> during the import.. so that clashes would never occur
> [Wed 14:29] <dlongley> right
> [Wed 14:29] <markus> yes, I completely agree with that.. and I'm not
> happy with the RDF WGs decision about that
> [Wed 14:29] <dlongley> there's already a requirement that you can do
> that with "graph:1"
> [Wed 14:29] <dlongley> otherwise it doesn't work anyway
> [Wed 14:29] <dlongley> so a quad store that understood "graph:1" would do so
> [Wed 14:30] <dlongley> but, i understand how that is no longer analogous
> to localhost.
> [Wed 14:30] <markus> IRIs are opaque.. they are global identifiers
> [Wed 14:30] <dlongley> right
> [Wed 14:31] <dlongley> for the same reasons (w/quad store storage)
> <#_graph:1> won't work.
> [Wed 14:31] <markus> exactly
> [Wed 14:31] <dlongley> which means there is no solution other than
> forcing people to name their graphs
> [Wed 14:31] <markus> so without introducing bnodes as graph names I
> can't see a solution
> [Wed 14:31] <markus> yes.. at least I can't see any
> [Wed 14:32] <markus> or you live with the fact that it won't round-trip
> to RDF
> [Wed 14:32] <dlongley> well, we can't do that
> [Wed 14:32] <dlongley> we will have to start rejecting data
> [Wed 14:32] <dlongley> which may be unexpected
> [Wed 14:32] <dlongley> (will be unexpected)
> [Wed 14:33] <markus> no, you can accept all data from RDF.. but you
> can't output it in RDF
> [Wed 14:33] <dlongley> well, we have to be able to normalize
> [Wed 14:33] <markus> RDF -> JSON-LD works without problems.. the other
> direction doesn't.. same as for bnodes in properties
> [Wed 14:33] <dlongley> and the data we normalize must be compatible with RDF
> [Wed 14:33] <dlongley> right
> [Wed 14:34] <markus> then there's no way I see without requiring named
> graphs to be named (with an absolute IRI)
> [Wed 14:34] <dlongley> yeah, i can't think of another solution
> [Wed 14:36] <dlongley> ugh, it seems so easily solved by allowing graph
> names to be bnode IDs.
> [Wed 14:36] <markus> nevertheless, I think we should keep supporting
> bnode IDs as graph names in JSON-LD (but mark the feature as at-risk)
> [Wed 14:36] <dlongley> i would like to know the drawbacks to that approach
> [Wed 14:36] <markus> yes.. everything is already there
> [Wed 14:36] <dlongley> the practical ones ...
> [Wed 14:36] <markus> ask the RDF WG :-P
> [Wed 14:37] <markus> or the SPARQL guys
> [Wed 14:37] <dlongley> well, i've been passed along the information that
> there isn't a practical drawback, it is a definition issue
> [Wed 14:37] <dlongley> seems like it would work fine for quad stores and
> sparql
> [Wed 14:38] <dlongley> anyway, i've got to get back to doing other
> stuff, thanks for the discussion.
> [Wed 14:38] <markus> I think so, yes.. I'm not sure about the
> implications on the semantics but AFAIK no semantics have been defined
> for named graphs
> 
> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
> President/CEO - Digital Bazaar, Inc.
> blog: Aaron Swartz, PaySwarm, and Academic Journals
> http://manu.sporny.org/2013/payswarm-journals/
>
Received on Wednesday, 13 February 2013 22:11:47 UTC