Problem with auto-generated fragment IDs for graph names from Manu Sporny on 2013-02-13 (public-linked-json@w3.org from February 2013)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Wed, 13 Feb 2013 14:50:01 -0500
To: RDF WG <public-rdf-wg@w3.org>, Linked JSON <public-linked-json@w3.org>
Message-ID: <511BEE69.40806@digitalbazaar.com>
We had a conversation about using auto-generated fragment identifiers
for graph names during the call today. We have found a problem with
that solution - it's incompatible with RDF when the document doesn't
have a base IRI. In the case of the Web Payments work, the document
MUST NOT have a base IRI because the message is transient.

[Wed 12:34] <manu> cygri: Can you express a relative IRI in an RDF
serialization w/o a base? Is this valid: <#foo> foaf:knows _:bar . (if
there is no base set for the document?)
[Wed 12:34] <manu> cygri, gkellogg, markus: This is what we're thinking
right now - for RDF Dataset Normalization: Fragment identifiers may be
used to name graphs that do not have a name associated with them in the
model. If a name is generated for a graph, the prefix '#_graph:' MUST be
used and that document-local identifier MAY be changed by processing
algorithms such as the RDF Dataset Normalization Algorithm.
[Wed 12:34] <cygri> manu, it is valid, however the document has an
implicit base in this case
[Wed 12:35] <manu> cygri: What's the implicit base?
[Wed 12:35] <cygri> manu, short answer: the document location
[Wed 12:35] <manu> cygri: What happens if the document doesn't have a
location? It's a transient message. :)
[Wed 12:35] <cygri> the IRI RFC has a long section about it
[Wed 12:36] <cygri> in that case different implementations do different
things
[Wed 12:36] <cygri> (actually it's in the URI RFC - 3986)
[Wed 12:37] |<-- davidwood has left irc.w3.org:6665 (Ping timeout: 60
seconds)
[Wed 12:37] <cygri> manu, you say: "If a name is generated for a graph,
the prefix 'irc://irc.w3.org/#_graph:' MUST be used"
[Wed 12:37] <cygri> who is doing the generation in that case?
[Wed 12:37] <manu> cygri: Well, what I really want to know is if this is
okay to output in NQuads during RDF Dataset Normalization: _:foo
foaf:knows :_bar <#_graph:1> . ?
[Wed 12:37] <manu> cygri: Well, we have to ensure that those sorts of
graph names MUST be able to be renamed by the RDF Dataset Normalization
algorithm.
[Wed 12:38] <manu> also, we have use cases where it doesn't make sense
to have any sort of document base - everything is transient (as in a
financial message sent from point A to point B.
[Wed 12:38] <cygri> i would say that in JSON-LD, fragments of the shape
#_graph: are reserved for the alogrithm
[Wed 12:39] <manu> cygri: Yes, but that wouldn't apply to just JSON-LD,
it would apply to RDF Dataset Normalization as well.
[Wed 12:39] <manu> If we choose that, all of RDF will have to use it.
[Wed 12:40] <gkellogg> I commonly output things such as <#_graph:1> in
Turtle, and count on the parser having a document base to make it absolute.
[Wed 12:40] <manu> and, we would generate the following NQuads (and it
would have to be viewed as valid - no base document): _:foo foaf:knows
:_bar <#_graph:1> .
[Wed 12:40] <gkellogg> I don't tend to do this in NTriples, though, or
in N-Quads, but I don't see why not.
[Wed 12:40] <manu> gkellogg: Yes, but in our case, we specifically don't
want a document base because there isn't one.
[Wed 12:41] <manu> gkellogg: My concern is more theoretical - is
<#_graph:1> valid RDF if there is no base document?
[Wed 12:41] <gkellogg> My processors, if they don't have a base, just
continue to use relative IRIs.
[Wed 12:41] <cygri> classic n-triples doesn't have relative IRIs, so you
need to write out the full ones. we talked about changing that but i'm
not sure where that went, so am not sure about n-quads
[Wed 12:41] <manu> gkellogg: Right, we can do that too - but is it valid?
[Wed 12:41] <cygri> however in turtle and rdf/xml you can simply write
relative IRIs in your doc, and not specify a base, and it will work
[Wed 12:41] <gkellogg> In a document it's fine, it's just when parsed
with ought a base that there is an issue.
[Wed 12:41] <gkellogg> Right.
[Wed 12:42] <cygri> i'm not sure what it means when you say, "if we do
that, all of RDF has to use that"
[Wed 12:42] <gkellogg> Most processors should be able to parse documents
without an explicit base. It is certainly done in testing all the time.
[Wed 12:42] <cygri> the issue seems to be JSON-LD specific because no
other syntax wants to have named graphs without explicit names
[Wed 12:42] <manu> cygri: We're defining the "RDF Dataset Normalization
Algorithm", not the "JSON-LD Normalization Algorithm"
[Wed 12:42] <gkellogg> I don't think RDF requires that IRIs be absolute,
does it?
[Wed 12:43] <cygri> gkellogg, the RDF data model requires IRIs to be
absolute
[Wed 12:43] <cygri> but that doesn't mean they have to be absolute in
surface syntaxes
[Wed 12:43] <cygri> it means if you want to know what RDF graph exactly
it is, you need a base
[Wed 12:43] <gkellogg> Well, then there is an issue for normalization?
[Wed 12:44] <cygri> you can define normalization in terms of relative IRIs
[Wed 12:44] <manu> our CTO: This is not inspiring confidence in the
decision to use fragment identifiers as auto-generated graph names.
[Wed 12:45] <gkellogg> Actually, I think it's fine; just use N-Quads as
the serialization format. The normalization can determine how to handle
base-less documents.
[Wed 12:46] <cygri> sorry, i have to run
[Wed 12:46] <manu> gkellogg: What does it do for base-less documents?
[Wed 12:46] <manu> cygri, no problem - thanks for the input.
[Wed 12:46] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
[Wed 12:46] <manu> gkellogg: base-less documents are invalid from an RDF
data model perspective.
[Wed 12:47] <gkellogg> N-Quads has no problem, you just need to figure
out what to do in the normalization algorithm. You probably don't
require that N-Quads express absolute IRIs in this case.
[Wed 12:47] <cygri> there's a note on relative IRIs in this section:
http://www.w3.org/TR/rdf11-concepts/#section-IRIs
[Wed 12:47] <gkellogg> Only as an abstract model, not a concrete model,
AFAIU
[Wed 12:47] <cygri> not sure if that helps
[Wed 12:47] <cygri> anyway, ttyl
[Wed 12:47] <manu> gkellogg: Yeah, we can do that - but it's invalid
RDF, isn't it? To flip the argument - you can express everything as
document-local using blank nodes.
[Wed 12:47] |<-- cygri has left irc.w3.org:6665 (cygri)
[Wed 12:49] <gkellogg> It's not syntactically invalid. It's just a
matter of how you go from concrete (N-Quads) to abstract (RDF-Concepts).
I either provide a base IRI to the parser, or I just continue to use
relative IRIs, which isn't a problem in practice.
[Wed 12:49] <gkellogg> What do you do with an RDFa document if you don't
have a document base?
[Wed 12:49] <manu> but there is no way to express everything as
document-local if you have two graphs... that is, RDF has no way to
express transient messages if there is more than one graph in the dataset.
[Wed 12:50] <manu> that's the crux of the problem, here. I think.
[Wed 12:50] -->| cygri (~cygri@public.cloak) has joined #rdf-wg
[Wed 12:50] <manu> gkellogg: Well, the problem here is that all the
normalizers need to agree on what to do in this case... but the "right
thing" to do is to not associate it with a base document, because there
isn't one for the transient message.
[Wed 12:50] <gkellogg> What I heard is that FragIDs are considered to be
document-local.
[Wed 12:50] <manu> yes, that's true, but that's not the issue. :)
[Wed 12:50] <manu> Here's the problem:
[Wed 12:51] <manu> In PaySwarm, we have many digitally signed messages
that are completely transient.
[Wed 12:51] <gkellogg> Why can't the normalization algorithm be written
to work on relative IRIs?
[Wed 12:51] <manu> That is, there is absolutely no base - there could
never be a base.
[Wed 12:51] <manu> because the message is transient.
[Wed 12:52] <gkellogg> Understood. Express it as N-Quads using relative
IRIs, and allow Normalization to work with relative IRIs.
[Wed 12:52] <manu> We then express that transient message in RDF in
order to digitally sign it... something like (and this is exactly what
would be signed): _:foofoaf:knows :_bar < #_graph:1> .
[Wed 12:52] <gkellogg> From concepts, that just means that the graph is
not "well-defined"
[Wed 12:53] <manu> (well, except foaf:knows" would be an absolute IRI)
[Wed 12:53] <gkellogg> I'm not saying there are no absolute IRIs, I'm
just saying that you tolerate relative IRIs, and that normalization is
defined to work in the absence of an implicit base IRI.
[Wed 12:53] <manu> and technically, the NQuad output would be this:
_:c14n1 <http://xmlns.com/foaf/0.1/> _:c14n2 <#_graph:1> .
[Wed 12:53] <gkellogg> Right
[Wed 12:54] <manu> right, but I'm concerned that RDF WG members are
going to argue that the above is an invalid document if base isn't defined.
[Wed 12:55] <manu> (because relative IRIs are not allowed in the RDF
data model) ... or they're "not well-defined".
[Wed 12:55] <gkellogg> It's explicitly not an invalid document,
according to concepts. It just isn't a "well-defined" graph without it.
[Wed 12:55] <gkellogg> Doesn't mean it's invalid.
[Wed 12:55] <manu> Okay, but if we did this:
[Wed 12:55] |<-- cygri has left irc.w3.org:6665 (cygri)
[Wed 12:55] <manu> _:foo foaf:knows :_bar <urn:graph:1> .
[Wed 12:55] <manu> everything would be just fine.
[Wed 12:55] <manu> (from an RDF perspective)
[Wed 12:55] <gkellogg> Ues
[Wed 12:55] <gkellogg> Yes
[Wed 12:56] <manu> right, so, are fragment identifiers the correct
solution in this case?
[Wed 12:56] <gkellogg> Requires minting an IRI (or URN) scheme, which
there was some resistance to.
[Wed 12:56] <manu> because they lead to graphs that are not well defined.
[Wed 12:56] <gkellogg> I think using fragids is the direction of the
group, and IMO the right way to go.
[Wed 12:56] <manu> well, at least we don't end up with well-defined
graphs in the case where we mint a new IRI/URN scheme...
[Wed 12:57] <gkellogg> They're only not well-defined if you don't
provide a document base. You can define normalization to not require a
document base.
[Wed 12:57] |<-- davidwood has left irc.w3.org:6665 (Client closed
connection)
[Wed 12:57] <manu> seems very hacky to me, the solution isn't very
clean... too much uncertainty about what it means in RDF.
[Wed 12:57] <gkellogg> If you think about it, normalization is just a
concrete normalization step. The resulting graph will be well-defined if
it is used with a base IRI
[Wed 12:58] <manu> gkellogg: yes, but if you use a base IRI, none of the
digital signatures will work anymore - the data will be wrong.
[Wed 12:58] <manu> gkellogg: In order for the digital signature to work
out, you must specifically NOT use a base IRI.
[Wed 12:59] <gkellogg> The point is, it's fine if the dataset is not
well-defined. It can always be well-defined at a theoretical later date.
[Wed 12:59] <manu> or it could be defined in a way that breaks all the
digital signatures at a later date.
[Wed 12:59] <gkellogg> Just specify that in the algorithm.
[Wed 12:59] <gkellogg> Normalization MUST NOT use a base IRI to ground
the input document.
[Wed 13:01] <gkellogg> I'd say, write it up, send it to the RDF WG
mailing list, and see if someone raises objections. Given that it was
the direction given today, I think it's a reasonable way to go.
[Wed 13:02] <manu> gkellogg: yeah, will do that - thanks.
[Wed 13:08] <markus> manu: who creates the named graphs? i.e., we do
they come from?
[Wed 13:09] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
[Wed 13:10] <markus> I don't feel comfortable with minting frag IDs for
unlabeled named graphs at all
[Wed 13:12] <Zakim> SW_RDFWG()11:00AM has ended
[Wed 13:12] <Zakim> Attendees were
[Wed 13:12] <Zakim> Zakim-bot will be restarted in 3 minutes to recover
caller state; please save your agenda status. Apologies for the
inconvenience
[Wed 13:12] <markus> manu, gkellogg: actually there's another issue. if
you mint fragIds for unlabeled named graphs, you mint a fragId for the
blank node at the same time.. { "property": "this is a blank node that
is also a graph", "@graph": [ ... ] }
[Wed 13:13] <gkellogg> Yes, that's true no mater what we do.
[Wed 13:14] <markus> so it won't work AFAICS.. the only clean solution
is to require named graphs to be labeled with an IRI (since we are not
allowed to use bnodeIds)
[Wed 13:14] <markus> It's still not clear to me where the unlabeled
named graphs come from in the first place.. who is the creator of the
dataset containing them?
[Wed 13:18] |<-- davidwood has left irc.w3.org:6665 (Client closed
connection)
[Wed 13:18] * Zakim is departing
[Wed 13:18] |<-- Zakim has left irc.w3.org:6665 ("Leaving")
[Wed 13:22] -->| cygri (~cygri@public.cloak) has joined #rdf-wg
[Wed 13:23] |<-- SteveH has left irc.w3.org:6665 (SteveH)
[Wed 13:24] -->| Zakim (zakim@public.cloak) has joined #rdf-wg
[Wed 13:28] <manu> markus: The PaySwarm software creates the "unnamed"
named graphs when it needs to communicate with another peer on the
network. The message is purely transient, there is no base document.
[Wed 13:29] <manu> markus: requiring that all named graphs be labeled
with an IRI doesn't make sense at all to us - every message sent across
PaySwarm now needs to have a name associated with it? Why do that when
we can automatically generate a name?
[Wed 13:29] <markus> can't the software create an IRI that is guaranteed
to not collide?
[Wed 13:29] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
[Wed 13:30] <manu> Markus: Be more specific about the IRI that is being
created - is it something like this: http://example.com/data#graph1 or
is it something like this: graph:1 ?
[Wed 13:30] |<-- cygri has left irc.w3.org:6665 (Ping timeout: 60 seconds)
[Wed 13:31] <markus> I agree, it should be possible to have multiple
*un*named graphs.. but that's apparently not happening in this version
of RDF
[Wed 13:32] <manu> markus: The software can create an IRI, but that IRI
must have two very special properties to work for the digital signature
case: 1) It MUST be a document-local identifier that when expressed in
NQuads is valid in the RDF data model, 2) It must be able to be re-named
by the RDF Dataset Normalization Algorithm.
[Wed 13:32] <markus> well.. couldn't the payswarm software mint some
IRIs in a dedicated space.. e.g., http://payswarm.org/.../names/....
[Wed 13:32] <markus> what requires it to be document-local?
[Wed 13:32] <manu> markus: This isn't about PaySwarm - it's about RDF
Dataset Normalization - what does /that/ algorithm do... PaySwarm can do
anything it wants to, but we want to do something that is going to
eventually be standardized.
[Wed 13:32] <manu> markus: The message is transient, it has no document.
[Wed 13:32] <markus> if it isn't, there is no reason to re-name it
[Wed 13:33] <manu> markus: You /have/ to be able to rename it for the
RDF Dataset Normalization algorithm to work...
[Wed 13:33] <markus> In RDF datasets all named graphs are required to
have names, so the problem is not there but in JSON-LD (and in payswarm)
[Wed 13:33] <manu> the whole purpose of the algorithm is to re-name
anything that is document local in a very specific way.
[Wed 13:34] <markus> yes, but in RDF graph names are *not* document local
[Wed 13:34] <manu> that's exactly the problem!
[Wed 13:34] <markus> so you'll never ever have to rename them
[Wed 13:34] <markus> which brings as back to my previous question. what
requires them to be document-local?
[Wed 13:34] <manu> Markus, this is what you're suggesting: Transient
messages that are transmitted from point A to point B MUST be given names.
[Wed 13:35] <manu> Is that what you're asserting?
[Wed 13:35] <markus> no.. the graphs in those messages must be given names
[Wed 13:35] <manu> Why?
[Wed 13:35] <manu> I don't have to do that with any other transient
protocol I use.
[Wed 13:36] <manu> I definitely don't have to do that w/ JSON - so why
do I have to do that with RDF?
[Wed 13:36] <markus> well.. that's the underlying data model.. other
data models don't require IRIs at all e.g. We drop properties which are
not mapped to IRIs e.g.
[Wed 13:36] <markus> I see only one solution to that.. change the data model
[Wed 13:37] <manu> Yes, but the underlying data model is completely
flawed if I can't express messages transiently! :)
[Wed 13:37] <markus> JSON-LD's data model supports it but then obviously
you can't round-trip to RDF
[Wed 13:37] <manu> No, there are multiple solutions to this problem...
[Wed 13:38] <manu> The only thing we need to make sure is that the
auto-generated graph identifer 1) MUST be a document-local identifier
that when expressed in NQuads is valid in the RDF data model, 2) MUST be
able to be re-named by the RDF Dataset Normalization Algorithm.
[Wed 13:38] <markus> can you enumerate them?
[Wed 13:38] <manu> There are two possibilities here: <#_graph:1> and graph:1
[Wed 13:39] <manu> The first fails requirement #1
[Wed 13:39] <manu> The second passes both requirement #1 and #2
[Wed 13:39] <markus> 1) is impossible, because graph names MUST be
absolute IRIs in RDF (unless you change the RDF's data model)
[Wed 13:39] <manu> graph:1 is an absolute IRI :)
[Wed 13:40] <markus> I've never heard of document local IRIs.. the whole
point of IRIs is that they are global
[Wed 13:40] <markus> s/document local IRIs/document-local absolute IRIs/
[Wed 13:41] <markus> there might be a third option.. keep everything as
is but perform the RDF Dataset serialization algorithm on flattened JSON-LD
[Wed 13:42] <markus> data coming from RDF will never have bnodes as
graph names
[Wed 13:42] <markus> data coming from JSON-LD might, but that's all
handled within JSON-LD
[Wed 13:43] <markus> the byte-stream you sign would look a slightly
different.. but who cares?
[Wed 13:43] <markus> since JSON-LD is a superset of RDF that would work
in all situations I can think of
[Wed 13:45] <markus> the only thing that wouldn't.. is to represent data
using bnodes in graph names in plain-RDF.. but that's due to a
limitation of RDF
[Wed 13:47] * manu is thinking about markus' suggestion.
[Wed 13:50] <manu> markus: I think you're wrong re: "the whole point of
IRIs is that they are global" - search RFC 3987 for the word "global" or
"universal" and you won't find it used in the way that you use it, IIRC.
[Wed 13:52] |<-- davidwood has left irc.w3.org:6665 (Client closed
connection)
[Wed 13:53] <markus> for these kind of things you should always look at
the URI RFC
[Wed 13:54] <markus> RFC 3986: "URIs have a global scope and are
interpreted consistently regardless of context, though the result of
that interpretation may be in relation to the end-user's context"
[Wed 13:54] <markus> I think that's quite clear
[Wed 13:55] <manu> global scope !== global identifier
[Wed 13:55] <manu> while the scope may be global (it is)
[Wed 13:55] <markus> ... "For example, "http://localhost/" has the same
interpretation for every user of that reference, even though the network
interface corresponding to "localhost" may be different for each
end-user: interpretation is independent of access"
[Wed 13:55] <manu> the end-users' context in this case is the document.
[Wed 13:55] <manu> and graph:1 is interpreted via that context.
[Wed 13:55] <markus> no.. it's an identifier that has a global scope
[Wed 13:56] <manu> markus: Yes, exactly the same case as localhost.
[Wed 13:56] <manu> localhost is always interpreted via your network.
[Wed 13:56] <manu> graph:1 is always interpreted via your JSON-LD processor.
[Wed 13:56] <manu> (and the JSON-LD processor chooses to interpret it as
document-local)
[Wed 13:56] <markus> no.. the interpretation (and that's what RDF is all
about) is the same.. it's the loca machine
[Wed 13:56] <markus> accessing it will lead to different results
[Wed 13:57] <markus> thus "interpretation is independent of access"
[Wed 13:58] -->| dlongley (~dlongley@public.cloak) has joined #rdf-wg
[Wed 13:59] <dlongley> markus: manu let me know about the graph naming
discussion going on in here
[Wed 13:59] <dlongley> and your suggestion to normalize using JSON-LD as
the serialization
[Wed 14:00] <dlongley> the problem with that approach is that you
couldn't transmit the data you signed via another RDF serialization
[Wed 14:00] <dlongley> because it's a data model problem
[Wed 14:00] <dlongley> you signed data that can't be appropriately
represented in RDF
[Wed 14:00] <markus> yes, that's the whole point
[Wed 14:00] <dlongley> that isn't a solution to the problem
[Wed 14:01] <dlongley> particularly for payswarm... where RDFa is used
heavily as a serialization
[Wed 14:01] <markus> well in RDF you can't do it because no
document-local identifiers are allowed as graph names
[Wed 14:01] <dlongley> for previously signed graphs
[Wed 14:01] <markus> rdf is not a dataset syntax
[Wed 14:01] <markus> sorry, I meant RDFa
[Wed 14:01] <dlongley> if you generated some data with unnamed graphs
and then signed it using JSON-LD ...
[Wed 14:02] <dlongley> how could you represent the signed data using RDFa?
[Wed 14:02] <markus> you can't represent graphs at all in RDFa
[Wed 14:02] <dlongley> at this time you can't put named graphs in RDFa
[Wed 14:02] <dlongley> but that will likely not always be the case
[Wed 14:02] <markus> it's a graph syntax, not a dataset syntax
[Wed 14:02] <dlongley> ok, red herring.
[Wed 14:02] -->| davidwood (~Adium@public.cloak) has joined #rdf-wg
[Wed 14:02] <dlongley> pick a dataset syntax.
[Wed 14:03] <dlongley> now you can't transmit the data using that syntax.
[Wed 14:03] <manu> markus: RDFa /will/ be a Dataset syntax eventually -
within a couple of years.
[Wed 14:03] <markus> manu: with bnodes as graph names?
[Wed 14:04] <manu> markus: With dataset-local identifiers, hopefully, yes.
[Wed 14:04] <markus> the point is, in any RDF dataset syntax there won't
exist any named graphs without an absolute IRI
[Wed 14:04] <markus> at least not before the RDF data model is changed
[Wed 14:04] <manu> markus: Why do you think that graph:1 isn't an
absolute IRI?
[Wed 14:05] <markus> it is an absolute IRI
[Wed 14:05] <manu> Just because it's a dataset-local identifier doesn't
mean it isn't also an absolute IRI (stretching definitions here, I know)
[Wed 14:05] <manu> Okay, then the RDF data model doesn't need to change?
[Wed 14:05] <markus> we had that discussion before.. absolute IRIs have
global scope, see RFC3986
[Wed 14:06] <manu> you can have global scope and interpret the
identifier based on a local context (the document context, in this example)
[Wed 14:06] <dlongley> "An identifier embodies the information required
to distinguish what is being identified from all other things within its
scope of identification."
[Wed 14:06] <markus> what I'm saying is that if you stay within RDF you
won't have a problem normalizing/signing since no unlabeled named graphs
exist
[Wed 14:07] <dlongley> the scope of identification for "graph:" is the
local document
[Wed 14:07] <manu> and in this case, the scope is the document iself.
[Wed 14:07] <markus> the problem arises since you wanna create named
graphs but don't want to name them
[Wed 14:07] |<-- davidwood has left irc.w3.org:6665 (Client closed
connection)
[Wed 14:07] <manu> markus: No, we never said we don't want to name them
/when they're serialized to NQuads"
[Wed 14:07] <manu> we just don't want to name them before that.
[Wed 14:08] <markus> dlongley: manu and I discussed this before. RFC
3986: "URIs have a global scope and are interpreted consistently
regardless of context, though the result of that interpretation may be
in relation to the end-user's context"
[Wed 14:08] <manu> naming them is a part of the RDF Dataset
Normalization Algorithm.
[Wed 14:08] <markus> .. "For example, "http://localhost/" has the same
interpretation for every user of that reference, even though the network
interface corresponding to "localhost" may be different for each
end-user: interpretation is independent of access"
[Wed 14:08] <markus> the interpretation (and that's what RDF is all
about) is the same.. it's the loca machine
[Wed 14:08] <markus> accessing it will lead to different results
[Wed 14:08] <markus> thus "interpretation is independent of access"
[Wed 14:08] <manu> markus: Yes, exactly
[Wed 14:09] <manu> you are interpreting it via the JSON-LD processor,
not "The Web"
[Wed 14:09] <markus> manu: I disagree.. naming them is part of JSON-LD
to RDF transformation
[Wed 14:10] <markus> that's also the reason why we currently can't
roundtrip that kind of data
[Wed 14:10] <markus> because you can't represent it in RDF
[Wed 14:10] <manu> What can't you represent in RDF?
[Wed 14:11] <markus> I know I asked that already some time ago.. but are
you really dealing with datasets in payswarm of with graphs?
[Wed 14:11] <markus> I still haven't looked at the specs
[Wed 14:11] <markus> but it seems that the graph name isn't important
[Wed 14:12] <manu> We are deciding to throw an error if somebody tries
to use something other than the default graph for now, because I can't
imagine this problem will be solved soon.
[Wed 14:12] <markus> are there multiple graphs that you need to sign?
[Wed 14:12] <manu> however, if you are to do digital signatures
correctly (and represent them in RDF correctly), you should use named
graphs and sign the named graph.
[Wed 14:12] <manu> and yes, there may be multiple graphs that we need to
sign.
[Wed 14:12] <markus> in one document
[Wed 14:12] <markus> ?
[Wed 14:12] <manu> yep
[Wed 14:13] <dlongley> we need to sign arbitrary JSON-LD.
[Wed 14:13] <manu> for example: a multi-party digital contract that is
counter-signed by the PaySwarm Authority.
[Wed 14:13] <dlongley> if JSON-LD supports it, we need to be able to
sign it.
[Wed 14:13] <markus> ok.. what if we would drop support for bnode IDs as
graph names in JSON-LD?
[Wed 14:14] <dlongley> graphs have to be given names in order to
normalize them
[Wed 14:14] <markus> have you a flowchart or something were I could
quickyl get an idea of the data-flows between the participants?
[Wed 14:14] <manu> markus: We can't use BNode IDs as graph names in
JSON-LD, right?
[Wed 14:14] <markus> we can, currently
[Wed 14:14] <manu> markus: Ha - no, unfortunately not right now.
[Wed 14:15] <markus> but it doesn't round-trip to RDF
[Wed 14:15] <manu> markus: Well, the RDF WG isn't going to let that fly
because it doesn't match the definition of a blank node identifier.
[Wed 14:15] <manu> at least, that's what I think a LC comment is going to be
[Wed 14:15] <dlongley> here's what matters: in payswarm, we must be able
to sign arbitrary JSON-LD documents.
[Wed 14:15] <manu> you can't name graphs using bnode identifiers.
[Wed 14:15] <dlongley> if someone can put an unnamed graph into a
JSON-LD document, then that's a problem.
[Wed 14:15] <markus> well.. when I presented the data model some time
ago and enumerated the differences no one seemed to object
[Wed 14:15] <markus> they accepted that JSON-LD will be a superset of RDF
[Wed 14:15] <manu> (and digital signatures has almost nothing to do with
blank node identifiers for graph names, btw)
[Wed 14:16] <dlongley> there are 2 solutions: disallow unnamed graphs in
JSON-LD, come up with a way to name the unnamed graphs using
document-local identifiers that works for RDF.
[Wed 14:16] <markus> that's what I proposed to manu earlier.. disallow
unnamed graphs in JSON-LD
[Wed 14:16] <dlongley> yes, and that's not the preferred solution
[Wed 14:16] <dlongley> it's the fallback.
[Wed 14:16] <manu> markus: I didn't object because I thought blank node
identifiers could be used to name graphs for RDF (and that they were
updating the spec to reflect that).
[Wed 14:17] <dlongley> it would be much nicer if we didn't force people
to name their unnamed graphs.
[Wed 14:17] <markus> manu: I'm talking about half an hour ago :-P
[Wed 14:17] <manu> I think it's ridiculous to tell people to include
syntax that is completely unnecessary. :)
[Wed 14:17] <manu> Why force people to name graphs when they don't need to?
[Wed 14:17] <markus> dlongley: completely agree.. but that's apparently
not something the RDF WG is going to accept
[Wed 14:17] <dlongley> there may be a case where it also generates an
issue for comparing two datasets
[Wed 14:17] <manu> Right now, the answer is: Because the RDF data model
says so - which is a really bad argument.
[Wed 14:18] <manu> in fact, I outright reject that argument.
[Wed 14:18] <markus> yes, but you can't have both.. either you change
the RDF data model (which won't happen).. or you accept that the data
won't round-trip
[Wed 14:18] <dlongley> i'm not convinced that graph:1 won't work.
[Wed 14:19] <manu> I think the real reason is that nobody in the RDF WG
believes that we'll come to a consensus on this and that the group is
exhausted after discussing the topic. There is no desire to address the
problem.
[Wed 14:19] <dlongley> i'm still trying to wrap my mind around it.
[Wed 14:19] <manu> markus: Yes, I don't see why graph:1 can't work, and
be compatible with the RDF 1.1 Concepts/Data Model
[Wed 14:19] <markus> it works.. but you are automatically creating
*global* identifiers.. nothing is there to prevent collissions..
[Wed 14:19] <dlongley> i'm not convinced of that.
[Wed 14:20] <manu> I do see why #_graph:1 is problematic (it's not valid
for transient messages)
[Wed 14:20] <dlongley> that's what i'm trying to wrap my mind around.
[Wed 14:20] <dlongley> the analogy of "localhost" having a "global
meaning" doesn't necessarily preclude the use case here
[Wed 14:20] <markus> graph:1 is the same as minting
http://payswarm.org/graph/1
[Wed 14:20] <dlongley> "graph:1" has a global meaning ...
[Wed 14:21] <markus> yes.. just as http://payswarm.org/graph/1
[Wed 14:21] <dlongley> it's an identifier for the first graph in the
document you're looking at.
[Wed 14:21] <dlongley> it always means that.
[Wed 14:21] * manu nods.
[Wed 14:21] <dlongley> now... if you go and actually look at its data...
[Wed 14:21] <markus> :-)
[Wed 14:21] <dlongley> then you're talking about the result of the
end-user's interpretation.
[Wed 14:21] <dlongley> and that can change.
[Wed 14:21] <dlongley> so, to me, that seems to work for RFC 3986
[Wed 14:22] <markus> and if someone else makes statements about
http://payswarm.org/graph/1 which conflict with your statements?
[Wed 14:22] <markus> say, you put it in a quad store?
[Wed 14:22] <markus> sorry.. same applies to graph:1
[Wed 14:22] <dlongley> you mean like if someone says: "localhost/foo"
and i don't have that on my machine?
[Wed 14:23] <dlongley> seems like the same situation to me.
[Wed 14:23] <markus> no.. that's accessing it.. not interpreting it
[Wed 14:23] <markus> those are two different things
[Wed 14:23] <dlongley> you're saying that someone can make a statement
about localhost/foo ...
[Wed 14:23] <markus> and have been debated to death (HTTP-14)
[Wed 14:23] <dlongley> and it won't conflict with my own statements?
[Wed 14:23] <dlongley> ever?
[Wed 14:23] <markus> yes, whatever statement he likes
[Wed 14:24] <markus> it will conflict with yours
[Wed 14:24] <dlongley> right...
[Wed 14:24] <markus> because URIs are global
[Wed 14:24] <dlongley> and it's not a problem
[Wed 14:24] <dlongley> you know what "localhost" means.
[Wed 14:24] <dlongley> you know that "localhost", when accessed, means
your local machine, nothing else
[Wed 14:24] <dlongley> how is that any different for "graph:1"?
[Wed 14:24] <markus> simplest thing.. import two datasets using those
graph names into a RDF quad store
[Wed 14:25] <markus> then do a SPARQL query for that graph name
[Wed 14:25] <dlongley> localhost/1 and localhost/2
[Wed 14:25] <markus> what will you get back?
[Wed 14:25] <dlongley> everything that matches those graph names
[Wed 14:25] <dlongley> just like you would with localhost
[Wed 14:25] <markus> all statements made about every statement about
every "first graph in a document" ever imported
[Wed 14:26] <markus> exactly
[Wed 14:26] <dlongley> what happens when someone uploads a dataset to a
quad store that has a bunch of localhost URIs in it?
[Wed 14:26] <markus> exactly the same thing
[Wed 14:26] <dlongley> right
[Wed 14:27] <dlongley> you are losing the "dataset"
[Wed 14:27] <dlongley> when you do that.
[Wed 14:27] <markus> you are losing the local scope you need
[Wed 14:27] <dlongley> yeah, you can't use a quad store to solve that
problem.
[Wed 14:28] <markus> but to illustrate it
[Wed 14:28] <dlongley> the problem here is that a graph isn't a node.
[Wed 14:28] <dlongley> which, IMO, is the wrong way to go.
[Wed 14:28] <markus> if bnodeIds would be allowed, the would be changed
during the import.. so that clashes would never occur
[Wed 14:29] <dlongley> right
[Wed 14:29] <markus> yes, I completely agree with that.. and I'm not
happy with the RDF WGs decision about that
[Wed 14:29] <dlongley> there's already a requirement that you can do
that with "graph:1"
[Wed 14:29] <dlongley> otherwise it doesn't work anyway
[Wed 14:29] <dlongley> so a quad store that understood "graph:1" would do so
[Wed 14:30] <dlongley> but, i understand how that is no longer analogous
to localhost.
[Wed 14:30] <markus> IRIs are opaque.. they are global identifiers
[Wed 14:30] <dlongley> right
[Wed 14:31] <dlongley> for the same reasons (w/quad store storage)
<#_graph:1> won't work.
[Wed 14:31] <markus> exactly
[Wed 14:31] <dlongley> which means there is no solution other than
forcing people to name their graphs
[Wed 14:31] <markus> so without introducing bnodes as graph names I
can't see a solution
[Wed 14:31] <markus> yes.. at least I can't see any
[Wed 14:32] <markus> or you live with the fact that it won't round-trip
to RDF
[Wed 14:32] <dlongley> well, we can't do that
[Wed 14:32] <dlongley> we will have to start rejecting data
[Wed 14:32] <dlongley> which may be unexpected
[Wed 14:32] <dlongley> (will be unexpected)
[Wed 14:33] <markus> no, you can accept all data from RDF.. but you
can't output it in RDF
[Wed 14:33] <dlongley> well, we have to be able to normalize
[Wed 14:33] <markus> RDF -> JSON-LD works without problems.. the other
direction doesn't.. same as for bnodes in properties
[Wed 14:33] <dlongley> and the data we normalize must be compatible with RDF
[Wed 14:33] <dlongley> right
[Wed 14:34] <markus> then there's no way I see without requiring named
graphs to be named (with an absolute IRI)
[Wed 14:34] <dlongley> yeah, i can't think of another solution
[Wed 14:36] <dlongley> ugh, it seems so easily solved by allowing graph
names to be bnode IDs.
[Wed 14:36] <markus> nevertheless, I think we should keep supporting
bnode IDs as graph names in JSON-LD (but mark the feature as at-risk)
[Wed 14:36] <dlongley> i would like to know the drawbacks to that approach
[Wed 14:36] <markus> yes.. everything is already there
[Wed 14:36] <dlongley> the practical ones ...
[Wed 14:36] <markus> ask the RDF WG :-P
[Wed 14:37] <markus> or the SPARQL guys
[Wed 14:37] <dlongley> well, i've been passed along the information that
there isn't a practical drawback, it is a definition issue
[Wed 14:37] <dlongley> seems like it would work fine for quad stores and
sparql
[Wed 14:38] <dlongley> anyway, i've got to get back to doing other
stuff, thanks for the discussion.
[Wed 14:38] <markus> I think so, yes.. I'm not sure about the
implications on the semantics but AFAIK no semantics have been defined
for named graphs

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
President/CEO - Digital Bazaar, Inc.
blog: Aaron Swartz, PaySwarm, and Academic Journals
http://manu.sporny.org/2013/payswarm-journals/
Received on Wednesday, 13 February 2013 19:50:43 UTC