Re: Problem with auto-generated fragment IDs for graph names

Agreed. The apparent desire to have unnamed named graphs is pretty strange. It appears that it's caused by wrangling the surface syntax to save a few bytes, at the cost of making your logical representation more complex. However, I've not seen a real-world example of why you want to do this.

This combined with the equally strange desire to have graphs with no BASE URI (contra to the URI RFC) is causing your problems.

If you hadn't made either one of those odd decisions you would have no problem.

NB we create *lots* of transient (i.e. not in any store, maybe you mean something else?) graphs in our system, but we give them base URIs (UUIDs I think), otherwise you have no consistent way to refer to the graph, while it's in flight in the system.

- Steve

On 2013-02-13, at 22:11, Richard Cyganiak <richard@cyganiak.de> wrote:

> Manu,
> 
> PROPOSAL: Put @id on all graphs.
> 
> Why the aversion against simple and obvious solutions? You seem to consistently choose the path of greatest resistance.
> 
> Best,
> Richard
> 
> 
> 
> On 13 Feb 2013, at 19:50, Manu Sporny wrote:
> 
>> We had a conversation about using auto-generated fragment identifiers
>> for graph names during the call today. We have found a problem with
>> that solution - it's incompatible with RDF when the document doesn't
>> have a base IRI. In the case of the Web Payments work, the document
>> MUST NOT have a base IRI because the message is transient.
>> 
>> [Wed 12:34] <manu> cygri: Can you express a relative IRI in an RDF
>> serialization w/o a base? Is this valid: <#foo> foaf:knows _:bar . (if
>> there is no base set for the document?)
>> [Wed 12:34] <manu> cygri, gkellogg, markus: This is what we're thinking
>> right now - for RDF Dataset Normalization: Fragment identifiers may be
>> used to name graphs that do not have a name associated with them in the
>> model. If a name is generated for a graph, the prefix '#_graph:' MUST be
>> used and that document-local identifier MAY be changed by processing
>> algorithms such as the RDF Dataset Normalization Algorithm.
>> [Wed 12:34] <cygri> manu, it is valid, however the document has an
>> implicit base in this case
>> [Wed 12:35] <manu> cygri: What's the implicit base?
>> [Wed 12:35] <cygri> manu, short answer: the document location
>> [Wed 12:35] <manu> cygri: What happens if the document doesn't have a
>> location? It's a transient message. :)
>> [Wed 12:35] <cygri> the IRI RFC has a long section about it
>> [Wed 12:36] <cygri> in that case different implementations do different
>> things
>> [Wed 12:36] <cygri> (actually it's in the URI RFC - 3986)
>> [Wed 12:37] |<-- davidwood has left irc.w3.org:6665 (Ping timeout: 60
>> seconds)
>> [Wed 12:37] <cygri> manu, you say: "If a name is generated for a graph,
>> the prefix 'irc://irc.w3.org/#_graph:' MUST be used"
>> [Wed 12:37] <cygri> who is doing the generation in that case?
>> [Wed 12:37] <manu> cygri: Well, what I really want to know is if this is
>> okay to output in NQuads during RDF Dataset Normalization: _:foo
>> foaf:knows :_bar <#_graph:1> . ?
>> [Wed 12:37] <manu> cygri: Well, we have to ensure that those sorts of
>> graph names MUST be able to be renamed by the RDF Dataset Normalization
>> algorithm.
>> [Wed 12:38] <manu> also, we have use cases where it doesn't make sense
>> to have any sort of document base - everything is transient (as in a
>> financial message sent from point A to point B.
>> [Wed 12:38] <cygri> i would say that in JSON-LD, fragments of the shape
>> #_graph: are reserved for the alogrithm
>> [Wed 12:39] <manu> cygri: Yes, but that wouldn't apply to just JSON-LD,
>> it would apply to RDF Dataset Normalization as well.
>> [Wed 12:39] <manu> If we choose that, all of RDF will have to use it.
>> [Wed 12:40] <gkellogg> I commonly output things such as <#_graph:1> in
>> Turtle, and count on the parser having a document base to make it absolute.
>> [Wed 12:40] <manu> and, we would generate the following NQuads (and it
>> would have to be viewed as valid - no base document): _:foo foaf:knows
>> :_bar <#_graph:1> .
>> [Wed 12:40] <gkellogg> I don't tend to do this in NTriples, though, or
>> in N-Quads, but I don't see why not.
>> [Wed 12:40] <manu> gkellogg: Yes, but in our case, we specifically don't
>> want a document base because there isn't one.
>> [Wed 12:41] <manu> gkellogg: My concern is more theoretical - is
>> <#_graph:1> valid RDF if there is no base document?
>> [Wed 12:41] <gkellogg> My processors, if they don't have a base, just
>> continue to use relative IRIs.
>> [Wed 12:41] <cygri> classic n-triples doesn't have relative IRIs, so you
>> need to write out the full ones. we talked about changing that but i'm
>> not sure where that went, so am not sure about n-quads
>> [Wed 12:41] <manu> gkellogg: Right, we can do that too - but is it valid?
>> [Wed 12:41] <cygri> however in turtle and rdf/xml you can simply write
>> relative IRIs in your doc, and not specify a base, and it will work
>> [Wed 12:41] <gkellogg> In a document it's fine, it's just when parsed
>> with ought a base that there is an issue.
>> [Wed 12:41] <gkellogg> Right.
>> [Wed 12:42] <cygri> i'm not sure what it means when you say, "if we do
>> that, all of RDF has to use that"
>> [Wed 12:42] <gkellogg> Most processors should be able to parse documents
>> without an explicit base. It is certainly done in testing all the time.
>> [Wed 12:42] <cygri> the issue seems to be JSON-LD specific because no
>> other syntax wants to have named graphs without explicit names
>> [Wed 12:42] <manu> cygri: We're defining the "RDF Dataset Normalization
>> Algorithm", not the "JSON-LD Normalization Algorithm"
>> [Wed 12:42] <gkellogg> I don't think RDF requires that IRIs be absolute,
>> does it?
>> [Wed 12:43] <cygri> gkellogg, the RDF data model requires IRIs to be
>> absolute
>> [Wed 12:43] <cygri> but that doesn't mean they have to be absolute in
>> surface syntaxes
>> [Wed 12:43] <cygri> it means if you want to know what RDF graph exactly
>> it is, you need a base
>> [Wed 12:43] <gkellogg> Well, then there is an issue for normalization?
>> [Wed 12:44] <cygri> you can define normalization in terms of relative IRIs
>> [Wed 12:44] <manu> our CTO: This is not inspiring confidence in the
>> decision to use fragment identifiers as auto-generated graph names.
>> [Wed 12:45] <gkellogg> Actually, I think it's fine; just use N-Quads as
>> the serialization format. The normalization can determine how to handle
>> base-less documents.
>> [Wed 12:46] <cygri> sorry, i have to run
>> [Wed 12:46] <manu> gkellogg: What does it do for base-less documents?
>> [Wed 12:46] <manu> cygri, no problem - thanks for the input.
>> [Wed 12:46] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
>> [Wed 12:46] <manu> gkellogg: base-less documents are invalid from an RDF
>> data model perspective.
>> [Wed 12:47] <gkellogg> N-Quads has no problem, you just need to figure
>> out what to do in the normalization algorithm. You probably don't
>> require that N-Quads express absolute IRIs in this case.
>> [Wed 12:47] <cygri> there's a note on relative IRIs in this section:
>> http://www.w3.org/TR/rdf11-concepts/#section-IRIs
>> [Wed 12:47] <gkellogg> Only as an abstract model, not a concrete model,
>> AFAIU
>> [Wed 12:47] <cygri> not sure if that helps
>> [Wed 12:47] <cygri> anyway, ttyl
>> [Wed 12:47] <manu> gkellogg: Yeah, we can do that - but it's invalid
>> RDF, isn't it? To flip the argument - you can express everything as
>> document-local using blank nodes.
>> [Wed 12:47] |<-- cygri has left irc.w3.org:6665 (cygri)
>> [Wed 12:49] <gkellogg> It's not syntactically invalid. It's just a
>> matter of how you go from concrete (N-Quads) to abstract (RDF-Concepts).
>> I either provide a base IRI to the parser, or I just continue to use
>> relative IRIs, which isn't a problem in practice.
>> [Wed 12:49] <gkellogg> What do you do with an RDFa document if you don't
>> have a document base?
>> [Wed 12:49] <manu> but there is no way to express everything as
>> document-local if you have two graphs... that is, RDF has no way to
>> express transient messages if there is more than one graph in the dataset.
>> [Wed 12:50] <manu> that's the crux of the problem, here. I think.
>> [Wed 12:50] -->| cygri (~cygri@public.cloak) has joined #rdf-wg
>> [Wed 12:50] <manu> gkellogg: Well, the problem here is that all the
>> normalizers need to agree on what to do in this case... but the "right
>> thing" to do is to not associate it with a base document, because there
>> isn't one for the transient message.
>> [Wed 12:50] <gkellogg> What I heard is that FragIDs are considered to be
>> document-local.
>> [Wed 12:50] <manu> yes, that's true, but that's not the issue. :)
>> [Wed 12:50] <manu> Here's the problem:
>> [Wed 12:51] <manu> In PaySwarm, we have many digitally signed messages
>> that are completely transient.
>> [Wed 12:51] <gkellogg> Why can't the normalization algorithm be written
>> to work on relative IRIs?
>> [Wed 12:51] <manu> That is, there is absolutely no base - there could
>> never be a base.
>> [Wed 12:51] <manu> because the message is transient.
>> [Wed 12:52] <gkellogg> Understood. Express it as N-Quads using relative
>> IRIs, and allow Normalization to work with relative IRIs.
>> [Wed 12:52] <manu> We then express that transient message in RDF in
>> order to digitally sign it... something like (and this is exactly what
>> would be signed): _:foofoaf:knows :_bar < #_graph:1> .
>> [Wed 12:52] <gkellogg> From concepts, that just means that the graph is
>> not "well-defined"
>> [Wed 12:53] <manu> (well, except foaf:knows" would be an absolute IRI)
>> [Wed 12:53] <gkellogg> I'm not saying there are no absolute IRIs, I'm
>> just saying that you tolerate relative IRIs, and that normalization is
>> defined to work in the absence of an implicit base IRI.
>> [Wed 12:53] <manu> and technically, the NQuad output would be this:
>> _:c14n1 <http://xmlns.com/foaf/0.1/> _:c14n2 <#_graph:1> .
>> [Wed 12:53] <gkellogg> Right
>> [Wed 12:54] <manu> right, but I'm concerned that RDF WG members are
>> going to argue that the above is an invalid document if base isn't defined.
>> [Wed 12:55] <manu> (because relative IRIs are not allowed in the RDF
>> data model) ... or they're "not well-defined".
>> [Wed 12:55] <gkellogg> It's explicitly not an invalid document,
>> according to concepts. It just isn't a "well-defined" graph without it.
>> [Wed 12:55] <gkellogg> Doesn't mean it's invalid.
>> [Wed 12:55] <manu> Okay, but if we did this:
>> [Wed 12:55] |<-- cygri has left irc.w3.org:6665 (cygri)
>> [Wed 12:55] <manu> _:foo foaf:knows :_bar <urn:graph:1> .
>> [Wed 12:55] <manu> everything would be just fine.
>> [Wed 12:55] <manu> (from an RDF perspective)
>> [Wed 12:55] <gkellogg> Ues
>> [Wed 12:55] <gkellogg> Yes
>> [Wed 12:56] <manu> right, so, are fragment identifiers the correct
>> solution in this case?
>> [Wed 12:56] <gkellogg> Requires minting an IRI (or URN) scheme, which
>> there was some resistance to.
>> [Wed 12:56] <manu> because they lead to graphs that are not well defined.
>> [Wed 12:56] <gkellogg> I think using fragids is the direction of the
>> group, and IMO the right way to go.
>> [Wed 12:56] <manu> well, at least we don't end up with well-defined
>> graphs in the case where we mint a new IRI/URN scheme...
>> [Wed 12:57] <gkellogg> They're only not well-defined if you don't
>> provide a document base. You can define normalization to not require a
>> document base.
>> [Wed 12:57] |<-- davidwood has left irc.w3.org:6665 (Client closed
>> connection)
>> [Wed 12:57] <manu> seems very hacky to me, the solution isn't very
>> clean... too much uncertainty about what it means in RDF.
>> [Wed 12:57] <gkellogg> If you think about it, normalization is just a
>> concrete normalization step. The resulting graph will be well-defined if
>> it is used with a base IRI
>> [Wed 12:58] <manu> gkellogg: yes, but if you use a base IRI, none of the
>> digital signatures will work anymore - the data will be wrong.
>> [Wed 12:58] <manu> gkellogg: In order for the digital signature to work
>> out, you must specifically NOT use a base IRI.
>> [Wed 12:59] <gkellogg> The point is, it's fine if the dataset is not
>> well-defined. It can always be well-defined at a theoretical later date.
>> [Wed 12:59] <manu> or it could be defined in a way that breaks all the
>> digital signatures at a later date.
>> [Wed 12:59] <gkellogg> Just specify that in the algorithm.
>> [Wed 12:59] <gkellogg> Normalization MUST NOT use a base IRI to ground
>> the input document.
>> [Wed 13:01] <gkellogg> I'd say, write it up, send it to the RDF WG
>> mailing list, and see if someone raises objections. Given that it was
>> the direction given today, I think it's a reasonable way to go.
>> [Wed 13:02] <manu> gkellogg: yeah, will do that - thanks.
>> [Wed 13:08] <markus> manu: who creates the named graphs? i.e., we do
>> they come from?
>> [Wed 13:09] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
>> [Wed 13:10] <markus> I don't feel comfortable with minting frag IDs for
>> unlabeled named graphs at all
>> [Wed 13:12] <Zakim> SW_RDFWG()11:00AM has ended
>> [Wed 13:12] <Zakim> Attendees were
>> [Wed 13:12] <Zakim> Zakim-bot will be restarted in 3 minutes to recover
>> caller state; please save your agenda status. Apologies for the
>> inconvenience
>> [Wed 13:12] <markus> manu, gkellogg: actually there's another issue. if
>> you mint fragIds for unlabeled named graphs, you mint a fragId for the
>> blank node at the same time.. { "property": "this is a blank node that
>> is also a graph", "@graph": [ ... ] }
>> [Wed 13:13] <gkellogg> Yes, that's true no mater what we do.
>> [Wed 13:14] <markus> so it won't work AFAICS.. the only clean solution
>> is to require named graphs to be labeled with an IRI (since we are not
>> allowed to use bnodeIds)
>> [Wed 13:14] <markus> It's still not clear to me where the unlabeled
>> named graphs come from in the first place.. who is the creator of the
>> dataset containing them?
>> [Wed 13:18] |<-- davidwood has left irc.w3.org:6665 (Client closed
>> connection)
>> [Wed 13:18] * Zakim is departing
>> [Wed 13:18] |<-- Zakim has left irc.w3.org:6665 ("Leaving")
>> [Wed 13:22] -->| cygri (~cygri@public.cloak) has joined #rdf-wg
>> [Wed 13:23] |<-- SteveH has left irc.w3.org:6665 (SteveH)
>> [Wed 13:24] -->| Zakim (zakim@public.cloak) has joined #rdf-wg
>> [Wed 13:28] <manu> markus: The PaySwarm software creates the "unnamed"
>> named graphs when it needs to communicate with another peer on the
>> network. The message is purely transient, there is no base document.
>> [Wed 13:29] <manu> markus: requiring that all named graphs be labeled
>> with an IRI doesn't make sense at all to us - every message sent across
>> PaySwarm now needs to have a name associated with it? Why do that when
>> we can automatically generate a name?
>> [Wed 13:29] <markus> can't the software create an IRI that is guaranteed
>> to not collide?
>> [Wed 13:29] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
>> [Wed 13:30] <manu> Markus: Be more specific about the IRI that is being
>> created - is it something like this: http://example.com/data#graph1 or
>> is it something like this: graph:1 ?
>> [Wed 13:30] |<-- cygri has left irc.w3.org:6665 (Ping timeout: 60 seconds)
>> [Wed 13:31] <markus> I agree, it should be possible to have multiple
>> *un*named graphs.. but that's apparently not happening in this version
>> of RDF
>> [Wed 13:32] <manu> markus: The software can create an IRI, but that IRI
>> must have two very special properties to work for the digital signature
>> case: 1) It MUST be a document-local identifier that when expressed in
>> NQuads is valid in the RDF data model, 2) It must be able to be re-named
>> by the RDF Dataset Normalization Algorithm.
>> [Wed 13:32] <markus> well.. couldn't the payswarm software mint some
>> IRIs in a dedicated space.. e.g., http://payswarm.org/.../names/....
>> [Wed 13:32] <markus> what requires it to be document-local?
>> [Wed 13:32] <manu> markus: This isn't about PaySwarm - it's about RDF
>> Dataset Normalization - what does /that/ algorithm do... PaySwarm can do
>> anything it wants to, but we want to do something that is going to
>> eventually be standardized.
>> [Wed 13:32] <manu> markus: The message is transient, it has no document.
>> [Wed 13:32] <markus> if it isn't, there is no reason to re-name it
>> [Wed 13:33] <manu> markus: You /have/ to be able to rename it for the
>> RDF Dataset Normalization algorithm to work...
>> [Wed 13:33] <markus> In RDF datasets all named graphs are required to
>> have names, so the problem is not there but in JSON-LD (and in payswarm)
>> [Wed 13:33] <manu> the whole purpose of the algorithm is to re-name
>> anything that is document local in a very specific way.
>> [Wed 13:34] <markus> yes, but in RDF graph names are *not* document local
>> [Wed 13:34] <manu> that's exactly the problem!
>> [Wed 13:34] <markus> so you'll never ever have to rename them
>> [Wed 13:34] <markus> which brings as back to my previous question. what
>> requires them to be document-local?
>> [Wed 13:34] <manu> Markus, this is what you're suggesting: Transient
>> messages that are transmitted from point A to point B MUST be given names.
>> [Wed 13:35] <manu> Is that what you're asserting?
>> [Wed 13:35] <markus> no.. the graphs in those messages must be given names
>> [Wed 13:35] <manu> Why?
>> [Wed 13:35] <manu> I don't have to do that with any other transient
>> protocol I use.
>> [Wed 13:36] <manu> I definitely don't have to do that w/ JSON - so why
>> do I have to do that with RDF?
>> [Wed 13:36] <markus> well.. that's the underlying data model.. other
>> data models don't require IRIs at all e.g. We drop properties which are
>> not mapped to IRIs e.g.
>> [Wed 13:36] <markus> I see only one solution to that.. change the data model
>> [Wed 13:37] <manu> Yes, but the underlying data model is completely
>> flawed if I can't express messages transiently! :)
>> [Wed 13:37] <markus> JSON-LD's data model supports it but then obviously
>> you can't round-trip to RDF
>> [Wed 13:37] <manu> No, there are multiple solutions to this problem...
>> [Wed 13:38] <manu> The only thing we need to make sure is that the
>> auto-generated graph identifer 1) MUST be a document-local identifier
>> that when expressed in NQuads is valid in the RDF data model, 2) MUST be
>> able to be re-named by the RDF Dataset Normalization Algorithm.
>> [Wed 13:38] <markus> can you enumerate them?
>> [Wed 13:38] <manu> There are two possibilities here: <#_graph:1> and graph:1
>> [Wed 13:39] <manu> The first fails requirement #1
>> [Wed 13:39] <manu> The second passes both requirement #1 and #2
>> [Wed 13:39] <markus> 1) is impossible, because graph names MUST be
>> absolute IRIs in RDF (unless you change the RDF's data model)
>> [Wed 13:39] <manu> graph:1 is an absolute IRI :)
>> [Wed 13:40] <markus> I've never heard of document local IRIs.. the whole
>> point of IRIs is that they are global
>> [Wed 13:40] <markus> s/document local IRIs/document-local absolute IRIs/
>> [Wed 13:41] <markus> there might be a third option.. keep everything as
>> is but perform the RDF Dataset serialization algorithm on flattened JSON-LD
>> [Wed 13:42] <markus> data coming from RDF will never have bnodes as
>> graph names
>> [Wed 13:42] <markus> data coming from JSON-LD might, but that's all
>> handled within JSON-LD
>> [Wed 13:43] <markus> the byte-stream you sign would look a slightly
>> different.. but who cares?
>> [Wed 13:43] <markus> since JSON-LD is a superset of RDF that would work
>> in all situations I can think of
>> [Wed 13:45] <markus> the only thing that wouldn't.. is to represent data
>> using bnodes in graph names in plain-RDF.. but that's due to a
>> limitation of RDF
>> [Wed 13:47] * manu is thinking about markus' suggestion.
>> [Wed 13:50] <manu> markus: I think you're wrong re: "the whole point of
>> IRIs is that they are global" - search RFC 3987 for the word "global" or
>> "universal" and you won't find it used in the way that you use it, IIRC.
>> [Wed 13:52] |<-- davidwood has left irc.w3.org:6665 (Client closed
>> connection)
>> [Wed 13:53] <markus> for these kind of things you should always look at
>> the URI RFC
>> [Wed 13:54] <markus> RFC 3986: "URIs have a global scope and are
>> interpreted consistently regardless of context, though the result of
>> that interpretation may be in relation to the end-user's context"
>> [Wed 13:54] <markus> I think that's quite clear
>> [Wed 13:55] <manu> global scope !== global identifier
>> [Wed 13:55] <manu> while the scope may be global (it is)
>> [Wed 13:55] <markus> ... "For example, "http://localhost/" has the same
>> interpretation for every user of that reference, even though the network
>> interface corresponding to "localhost" may be different for each
>> end-user: interpretation is independent of access"
>> [Wed 13:55] <manu> the end-users' context in this case is the document.
>> [Wed 13:55] <manu> and graph:1 is interpreted via that context.
>> [Wed 13:55] <markus> no.. it's an identifier that has a global scope
>> [Wed 13:56] <manu> markus: Yes, exactly the same case as localhost.
>> [Wed 13:56] <manu> localhost is always interpreted via your network.
>> [Wed 13:56] <manu> graph:1 is always interpreted via your JSON-LD processor.
>> [Wed 13:56] <manu> (and the JSON-LD processor chooses to interpret it as
>> document-local)
>> [Wed 13:56] <markus> no.. the interpretation (and that's what RDF is all
>> about) is the same.. it's the loca machine
>> [Wed 13:56] <markus> accessing it will lead to different results
>> [Wed 13:57] <markus> thus "interpretation is independent of access"
>> [Wed 13:58] -->| dlongley (~dlongley@public.cloak) has joined #rdf-wg
>> [Wed 13:59] <dlongley> markus: manu let me know about the graph naming
>> discussion going on in here
>> [Wed 13:59] <dlongley> and your suggestion to normalize using JSON-LD as
>> the serialization
>> [Wed 14:00] <dlongley> the problem with that approach is that you
>> couldn't transmit the data you signed via another RDF serialization
>> [Wed 14:00] <dlongley> because it's a data model problem
>> [Wed 14:00] <dlongley> you signed data that can't be appropriately
>> represented in RDF
>> [Wed 14:00] <markus> yes, that's the whole point
>> [Wed 14:00] <dlongley> that isn't a solution to the problem
>> [Wed 14:01] <dlongley> particularly for payswarm... where RDFa is used
>> heavily as a serialization
>> [Wed 14:01] <markus> well in RDF you can't do it because no
>> document-local identifiers are allowed as graph names
>> [Wed 14:01] <dlongley> for previously signed graphs
>> [Wed 14:01] <markus> rdf is not a dataset syntax
>> [Wed 14:01] <markus> sorry, I meant RDFa
>> [Wed 14:01] <dlongley> if you generated some data with unnamed graphs
>> and then signed it using JSON-LD ...
>> [Wed 14:02] <dlongley> how could you represent the signed data using RDFa?
>> [Wed 14:02] <markus> you can't represent graphs at all in RDFa
>> [Wed 14:02] <dlongley> at this time you can't put named graphs in RDFa
>> [Wed 14:02] <dlongley> but that will likely not always be the case
>> [Wed 14:02] <markus> it's a graph syntax, not a dataset syntax
>> [Wed 14:02] <dlongley> ok, red herring.
>> [Wed 14:02] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
>> [Wed 14:02] <dlongley> pick a dataset syntax.
>> [Wed 14:03] <dlongley> now you can't transmit the data using that syntax.
>> [Wed 14:03] <manu> markus: RDFa /will/ be a Dataset syntax eventually -
>> within a couple of years.
>> [Wed 14:03] <markus> manu: with bnodes as graph names?
>> [Wed 14:04] <manu> markus: With dataset-local identifiers, hopefully, yes.
>> [Wed 14:04] <markus> the point is, in any RDF dataset syntax there won't
>> exist any named graphs without an absolute IRI
>> [Wed 14:04] <markus> at least not before the RDF data model is changed
>> [Wed 14:04] <manu> markus: Why do you think that graph:1 isn't an
>> absolute IRI?
>> [Wed 14:05] <markus> it is an absolute IRI
>> [Wed 14:05] <manu> Just because it's a dataset-local identifier doesn't
>> mean it isn't also an absolute IRI (stretching definitions here, I know)
>> [Wed 14:05] <manu> Okay, then the RDF data model doesn't need to change?
>> [Wed 14:05] <markus> we had that discussion before.. absolute IRIs have
>> global scope, see RFC3986
>> [Wed 14:06] <manu> you can have global scope and interpret the
>> identifier based on a local context (the document context, in this example)
>> [Wed 14:06] <dlongley> "An identifier embodies the information required
>> to distinguish what is being identified from all other things within its
>> scope of identification."
>> [Wed 14:06] <markus> what I'm saying is that if you stay within RDF you
>> won't have a problem normalizing/signing since no unlabeled named graphs
>> exist
>> [Wed 14:07] <dlongley> the scope of identification for "graph:" is the
>> local document
>> [Wed 14:07] <manu> and in this case, the scope is the document iself.
>> [Wed 14:07] <markus> the problem arises since you wanna create named
>> graphs but don't want to name them
>> [Wed 14:07] |<-- davidwood has left irc.w3.org:6665 (Client closed
>> connection)
>> [Wed 14:07] <manu> markus: No, we never said we don't want to name them
>> /when they're serialized to NQuads"
>> [Wed 14:07] <manu> we just don't want to name them before that.
>> [Wed 14:08] <markus> dlongley: manu and I discussed this before. RFC
>> 3986: "URIs have a global scope and are interpreted consistently
>> regardless of context, though the result of that interpretation may be
>> in relation to the end-user's context"
>> [Wed 14:08] <manu> naming them is a part of the RDF Dataset
>> Normalization Algorithm.
>> [Wed 14:08] <markus> .. "For example, "http://localhost/" has the same
>> interpretation for every user of that reference, even though the network
>> interface corresponding to "localhost" may be different for each
>> end-user: interpretation is independent of access"
>> [Wed 14:08] <markus> the interpretation (and that's what RDF is all
>> about) is the same.. it's the loca machine
>> [Wed 14:08] <markus> accessing it will lead to different results
>> [Wed 14:08] <markus> thus "interpretation is independent of access"
>> [Wed 14:08] <manu> markus: Yes, exactly
>> [Wed 14:09] <manu> you are interpreting it via the JSON-LD processor,
>> not "The Web"
>> [Wed 14:09] <markus> manu: I disagree.. naming them is part of JSON-LD
>> to RDF transformation
>> [Wed 14:10] <markus> that's also the reason why we currently can't
>> roundtrip that kind of data
>> [Wed 14:10] <markus> because you can't represent it in RDF
>> [Wed 14:10] <manu> What can't you represent in RDF?
>> [Wed 14:11] <markus> I know I asked that already some time ago.. but are
>> you really dealing with datasets in payswarm of with graphs?
>> [Wed 14:11] <markus> I still haven't looked at the specs
>> [Wed 14:11] <markus> but it seems that the graph name isn't important
>> [Wed 14:12] <manu> We are deciding to throw an error if somebody tries
>> to use something other than the default graph for now, because I can't
>> imagine this problem will be solved soon.
>> [Wed 14:12] <markus> are there multiple graphs that you need to sign?
>> [Wed 14:12] <manu> however, if you are to do digital signatures
>> correctly (and represent them in RDF correctly), you should use named
>> graphs and sign the named graph.
>> [Wed 14:12] <manu> and yes, there may be multiple graphs that we need to
>> sign.
>> [Wed 14:12] <markus> in one document
>> [Wed 14:12] <markus> ?
>> [Wed 14:12] <manu> yep
>> [Wed 14:13] <dlongley> we need to sign arbitrary JSON-LD.
>> [Wed 14:13] <manu> for example: a multi-party digital contract that is
>> counter-signed by the PaySwarm Authority.
>> [Wed 14:13] <dlongley> if JSON-LD supports it, we need to be able to
>> sign it.
>> [Wed 14:13] <markus> ok.. what if we would drop support for bnode IDs as
>> graph names in JSON-LD?
>> [Wed 14:14] <dlongley> graphs have to be given names in order to
>> normalize them
>> [Wed 14:14] <markus> have you a flowchart or something were I could
>> quickyl get an idea of the data-flows between the participants?
>> [Wed 14:14] <manu> markus: We can't use BNode IDs as graph names in
>> JSON-LD, right?
>> [Wed 14:14] <markus> we can, currently
>> [Wed 14:14] <manu> markus: Ha - no, unfortunately not right now.
>> [Wed 14:15] <markus> but it doesn't round-trip to RDF
>> [Wed 14:15] <manu> markus: Well, the RDF WG isn't going to let that fly
>> because it doesn't match the definition of a blank node identifier.
>> [Wed 14:15] <manu> at least, that's what I think a LC comment is going to be
>> [Wed 14:15] <dlongley> here's what matters: in payswarm, we must be able
>> to sign arbitrary JSON-LD documents.
>> [Wed 14:15] <manu> you can't name graphs using bnode identifiers.
>> [Wed 14:15] <dlongley> if someone can put an unnamed graph into a
>> JSON-LD document, then that's a problem.
>> [Wed 14:15] <markus> well.. when I presented the data model some time
>> ago and enumerated the differences no one seemed to object
>> [Wed 14:15] <markus> they accepted that JSON-LD will be a superset of RDF
>> [Wed 14:15] <manu> (and digital signatures has almost nothing to do with
>> blank node identifiers for graph names, btw)
>> [Wed 14:16] <dlongley> there are 2 solutions: disallow unnamed graphs in
>> JSON-LD, come up with a way to name the unnamed graphs using
>> document-local identifiers that works for RDF.
>> [Wed 14:16] <markus> that's what I proposed to manu earlier.. disallow
>> unnamed graphs in JSON-LD
>> [Wed 14:16] <dlongley> yes, and that's not the preferred solution
>> [Wed 14:16] <dlongley> it's the fallback.
>> [Wed 14:16] <manu> markus: I didn't object because I thought blank node
>> identifiers could be used to name graphs for RDF (and that they were
>> updating the spec to reflect that).
>> [Wed 14:17] <dlongley> it would be much nicer if we didn't force people
>> to name their unnamed graphs.
>> [Wed 14:17] <markus> manu: I'm talking about half an hour ago :-P
>> [Wed 14:17] <manu> I think it's ridiculous to tell people to include
>> syntax that is completely unnecessary. :)
>> [Wed 14:17] <manu> Why force people to name graphs when they don't need to?
>> [Wed 14:17] <markus> dlongley: completely agree.. but that's apparently
>> not something the RDF WG is going to accept
>> [Wed 14:17] <dlongley> there may be a case where it also generates an
>> issue for comparing two datasets
>> [Wed 14:17] <manu> Right now, the answer is: Because the RDF data model
>> says so - which is a really bad argument.
>> [Wed 14:18] <manu> in fact, I outright reject that argument.
>> [Wed 14:18] <markus> yes, but you can't have both.. either you change
>> the RDF data model (which won't happen).. or you accept that the data
>> won't round-trip
>> [Wed 14:18] <dlongley> i'm not convinced that graph:1 won't work.
>> [Wed 14:19] <manu> I think the real reason is that nobody in the RDF WG
>> believes that we'll come to a consensus on this and that the group is
>> exhausted after discussing the topic. There is no desire to address the
>> problem.
>> [Wed 14:19] <dlongley> i'm still trying to wrap my mind around it.
>> [Wed 14:19] <manu> markus: Yes, I don't see why graph:1 can't work, and
>> be compatible with the RDF 1.1 Concepts/Data Model
>> [Wed 14:19] <markus> it works.. but you are automatically creating
>> *global* identifiers.. nothing is there to prevent collissions..
>> [Wed 14:19] <dlongley> i'm not convinced of that.
>> [Wed 14:20] <manu> I do see why #_graph:1 is problematic (it's not valid
>> for transient messages)
>> [Wed 14:20] <dlongley> that's what i'm trying to wrap my mind around.
>> [Wed 14:20] <dlongley> the analogy of "localhost" having a "global
>> meaning" doesn't necessarily preclude the use case here
>> [Wed 14:20] <markus> graph:1 is the same as minting
>> http://payswarm.org/graph/1
>> [Wed 14:20] <dlongley> "graph:1" has a global meaning ...
>> [Wed 14:21] <markus> yes.. just as http://payswarm.org/graph/1
>> [Wed 14:21] <dlongley> it's an identifier for the first graph in the
>> document you're looking at.
>> [Wed 14:21] <dlongley> it always means that.
>> [Wed 14:21] * manu nods.
>> [Wed 14:21] <dlongley> now... if you go and actually look at its data...
>> [Wed 14:21] <markus> :-)
>> [Wed 14:21] <dlongley> then you're talking about the result of the
>> end-user's interpretation.
>> [Wed 14:21] <dlongley> and that can change.
>> [Wed 14:21] <dlongley> so, to me, that seems to work for RFC 3986
>> [Wed 14:22] <markus> and if someone else makes statements about
>> http://payswarm.org/graph/1 which conflict with your statements?
>> [Wed 14:22] <markus> say, you put it in a quad store?
>> [Wed 14:22] <markus> sorry.. same applies to graph:1
>> [Wed 14:22] <dlongley> you mean like if someone says: "localhost/foo"
>> and i don't have that on my machine?
>> [Wed 14:23] <dlongley> seems like the same situation to me.
>> [Wed 14:23] <markus> no.. that's accessing it.. not interpreting it
>> [Wed 14:23] <markus> those are two different things
>> [Wed 14:23] <dlongley> you're saying that someone can make a statement
>> about localhost/foo ...
>> [Wed 14:23] <markus> and have been debated to death (HTTP-14)
>> [Wed 14:23] <dlongley> and it won't conflict with my own statements?
>> [Wed 14:23] <dlongley> ever?
>> [Wed 14:23] <markus> yes, whatever statement he likes
>> [Wed 14:24] <markus> it will conflict with yours
>> [Wed 14:24] <dlongley> right...
>> [Wed 14:24] <markus> because URIs are global
>> [Wed 14:24] <dlongley> and it's not a problem
>> [Wed 14:24] <dlongley> you know what "localhost" means.
>> [Wed 14:24] <dlongley> you know that "localhost", when accessed, means
>> your local machine, nothing else
>> [Wed 14:24] <dlongley> how is that any different for "graph:1"?
>> [Wed 14:24] <markus> simplest thing.. import two datasets using those
>> graph names into a RDF quad store
>> [Wed 14:25] <markus> then do a SPARQL query for that graph name
>> [Wed 14:25] <dlongley> localhost/1 and localhost/2
>> [Wed 14:25] <markus> what will you get back?
>> [Wed 14:25] <dlongley> everything that matches those graph names
>> [Wed 14:25] <dlongley> just like you would with localhost
>> [Wed 14:25] <markus> all statements made about every statement about
>> every "first graph in a document" ever imported
>> [Wed 14:26] <markus> exactly
>> [Wed 14:26] <dlongley> what happens when someone uploads a dataset to a
>> quad store that has a bunch of localhost URIs in it?
>> [Wed 14:26] <markus> exactly the same thing
>> [Wed 14:26] <dlongley> right
>> [Wed 14:27] <dlongley> you are losing the "dataset"
>> [Wed 14:27] <dlongley> when you do that.
>> [Wed 14:27] <markus> you are losing the local scope you need
>> [Wed 14:27] <dlongley> yeah, you can't use a quad store to solve that
>> problem.
>> [Wed 14:28] <markus> but to illustrate it
>> [Wed 14:28] <dlongley> the problem here is that a graph isn't a node.
>> [Wed 14:28] <dlongley> which, IMO, is the wrong way to go.
>> [Wed 14:28] <markus> if bnodeIds would be allowed, the would be changed
>> during the import.. so that clashes would never occur
>> [Wed 14:29] <dlongley> right
>> [Wed 14:29] <markus> yes, I completely agree with that.. and I'm not
>> happy with the RDF WGs decision about that
>> [Wed 14:29] <dlongley> there's already a requirement that you can do
>> that with "graph:1"
>> [Wed 14:29] <dlongley> otherwise it doesn't work anyway
>> [Wed 14:29] <dlongley> so a quad store that understood "graph:1" would do so
>> [Wed 14:30] <dlongley> but, i understand how that is no longer analogous
>> to localhost.
>> [Wed 14:30] <markus> IRIs are opaque.. they are global identifiers
>> [Wed 14:30] <dlongley> right
>> [Wed 14:31] <dlongley> for the same reasons (w/quad store storage)
>> <#_graph:1> won't work.
>> [Wed 14:31] <markus> exactly
>> [Wed 14:31] <dlongley> which means there is no solution other than
>> forcing people to name their graphs
>> [Wed 14:31] <markus> so without introducing bnodes as graph names I
>> can't see a solution
>> [Wed 14:31] <markus> yes.. at least I can't see any
>> [Wed 14:32] <markus> or you live with the fact that it won't round-trip
>> to RDF
>> [Wed 14:32] <dlongley> well, we can't do that
>> [Wed 14:32] <dlongley> we will have to start rejecting data
>> [Wed 14:32] <dlongley> which may be unexpected
>> [Wed 14:32] <dlongley> (will be unexpected)
>> [Wed 14:33] <markus> no, you can accept all data from RDF.. but you
>> can't output it in RDF
>> [Wed 14:33] <dlongley> well, we have to be able to normalize
>> [Wed 14:33] <markus> RDF -> JSON-LD works without problems.. the other
>> direction doesn't.. same as for bnodes in properties
>> [Wed 14:33] <dlongley> and the data we normalize must be compatible with RDF
>> [Wed 14:33] <dlongley> right
>> [Wed 14:34] <markus> then there's no way I see without requiring named
>> graphs to be named (with an absolute IRI)
>> [Wed 14:34] <dlongley> yeah, i can't think of another solution
>> [Wed 14:36] <dlongley> ugh, it seems so easily solved by allowing graph
>> names to be bnode IDs.
>> [Wed 14:36] <markus> nevertheless, I think we should keep supporting
>> bnode IDs as graph names in JSON-LD (but mark the feature as at-risk)
>> [Wed 14:36] <dlongley> i would like to know the drawbacks to that approach
>> [Wed 14:36] <markus> yes.. everything is already there
>> [Wed 14:36] <dlongley> the practical ones ...
>> [Wed 14:36] <markus> ask the RDF WG :-P
>> [Wed 14:37] <markus> or the SPARQL guys
>> [Wed 14:37] <dlongley> well, i've been passed along the information that
>> there isn't a practical drawback, it is a definition issue
>> [Wed 14:37] <dlongley> seems like it would work fine for quad stores and
>> sparql
>> [Wed 14:38] <dlongley> anyway, i've got to get back to doing other
>> stuff, thanks for the discussion.
>> [Wed 14:38] <markus> I think so, yes.. I'm not sure about the
>> implications on the semantics but AFAIK no semantics have been defined
>> for named graphs
>> 
>> -- manu
>> 
>> -- 
>> Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
>> President/CEO - Digital Bazaar, Inc.
>> blog: Aaron Swartz, PaySwarm, and Academic Journals
>> http://manu.sporny.org/2013/payswarm-journals/
>> 
> 
> 

-- 
Steve Harris
Experian
+44 20 3042 4132
Registered in England and Wales 653331 VAT # 887 1335 93
80 Victoria Street, London, SW1E 5JL

Received on Thursday, 14 February 2013 11:49:34 UTC