Re: [GRAPHS] g-box - abstraction or concrete? from Nathan on 2011-02-28 (public-rdf-wg@w3.org from February 2011)

From: Nathan <nathan@webr3.org>
Date: Mon, 28 Feb 2011 12:25:58 +0000
To: Ivan Herman <ivan@w3.org>
CC: public-rdf-wg WG <public-rdf-wg@w3.org>, Sandro Hawke <sandro@w3.org>, Pat Hayes <phayes@ihmc.us>, Manu Sporny <msporny@digitalbazaar.com>
Message-ID: <4D6B9456.8040207@webr3.org>
Ivan Herman wrote:
>> box - an abstract box which can contain statements, and whose contents can vary over time
>>
>> box-realization - a realization of a box, some process coupled to some memory which can manage realizations of the box's state/contents and change the state from one to another, change the contents of the box.
>>
>> snapshot - an abstract snapshot of the state/contents of a box at time t, a mathematical set of statements, a g-snap
>>
>> snapshot-realization - a realization of a snapshot, a distinct immutable collection of triples in memory, or some lexical representation of them, a g-text
> 
> So, coming back to Sandro's terminology, you do have g-snaps and g-texts, and you separate the g-box into g-box-abstract and g-box-concrete. I am not sure I understand and agree with that line of needs (I am getting afraid that this thread will overcomplicate things) but let us go with that for now.

I too believe it will potentially over complicate things, however to 
explain (hopefully).

If you give a g-box a name, to what does it refer? processes are spawned 
and killed, g-box in the web sense can swap over different servers, can 
be purely abstract such that only a snapshot-realization ever exists, 
and so on - all the same properties http resource / information 
resources have - likewise named g-boxes in the sparql world, you can 
dump out dbpedia in trig and load it somewhere else, infact there exist 
several (understatement) realizations of the same named g-box, one for 
every person who has the dbpedia dataset loaded in to a quad store, 
hence why I suggest that there must be an abstract box and a set of 1+ 
box-realizations.

>>  Issue 1:
>>  Snapshot-realizations are anonymous
> 
> I am not sure what this means

well, if I send you a chunk of RDF in some format, what's the identifier 
for it?

>> and there is no way to tell that
>>  two snapshot-realizations realize snapshots of the state of the same
>>  box, or to tell which state (Sn-1, Sn-5) they are snapshots of.
>>
> 
> I presume because you consider a g-snap representing a g-box at a time 't'. And because the name of a box (if any) does not transpire to a g-snap.

yes, let's say you have a bunch of triples held in some mutable runtime 
object structure, and you give somebody a copy of it, then you "add" a 
few triples and take some away, and give the same person the new copy of 
it, just by looking at the two copies, how can that person tell where 
they came from, if they are snapshot-realizations of the same g-box, and 
which one is the latest/current one?

>>  Thus, in order to incorporate the concepts of box or box-realization
>>  in to RDF, some form of box identification, and some form of state
>>  identification would need to be added.
>>
> 
> Box identification - this is the vague notion of a named graph, right? And state identification is, well, the other thingy (quoted graphs and friends). Right?

box identification = vague notion of named graph yes
state identification = box identifier + the ability to say "version-3" 
or "valid/retrieved at x time".

> [snip]
>> Okay, so you've given us a rule (R)
>>
>>  { ?a <b> <c> } => { <e> <f> ?a }
>>
>> Now, (thanks Ivan) you've given us a rule which has variable identifiers in it, so we better clear up what variables identifiers are too, and blank node identifiers whilst were here (so as not to confuse the two).
>>
>> A Blank Node Identifier is temporary reference, bound to a blank node at a particular time - since it's an identifier it belongs in the realization space, and since it's a temporary identifier it belongs in the snapshot space, thus blank node identifiers are scoped to snapshot-realizations. Blank Nodes are therefore only existentially quantified within snapshots.
> 
> I do not follow that. On the same token any URI reference within a box should be a temporary reference, too, what is the difference?

the difference is that given two snapshot-realizations a uri <u> refers 
to the same thing in both of them, and a blank node identifier _:b1 does 
not; to be able to change the scoping of blank node identifiers to box 
level a box-identifier would need to be strapped to each 
snapshot-realization such that if two snapshot-realizations had the same 
box-identifier associated then _:b1 would refer to the same thing in 
both of them. Which is not the way it works today, if I get your foaf 
file today and it has _:b1 in it, and tomorrow it has _:b1, i cannot 
tell that the blank node being referred to is the same, I can with <u>. 
Am I missing something here? we have text in all the specs to say just this.

>>  Issue 2:
>>  Blank Nodes are only existentially quantified within snapshots, which
>>  means they aren't quantified at box level, which means they can't
>>  exist at box level.
>>
> 
> And I still do not understand that conclusion.

if they do exist at box level, how can you tell?

>>  Thus, in order to incorporate the concepts of box or box-realizations
>>  in to RDF, the semantics of blank node identifiers and their scope
>>  of existential quantification would need to be changed. B.C. break.
> 
> And we should not go there. On a more general level we have an obligation not to break the current RDF model and make only the absolutely necessary and minimal changes to it. We cannot and should not harm the deployment of RDF out there. I know this is not a technical argument, so it may be misplaced at this point of the argumentation, but I do believe we have to be conscious of the issue...

agree

>> Is it worth continuing this line of thought? boxes clearly do exist in semantic web land (sparql update of "named graphs" for example, and the need for "graph changes over time"), but they don't currently exist in RDF, and the two issues listed above are far from minor. Even if we incorporate boxes, both sparql and the web don't provide for any notion of time, and even if we did work out a way to have the concept of states over time in there, we'd need to change to a temporal logic.
> 
> And I am not absolutely sure that the introduction of time as a core notion is necessary at this point. As I said above, I am afraid that things are getting overcomplicated although I cannot point my finger at exactly where things go wrong.

likewise, i agree - imho we'd be very unwise add the concept of time to 
RDF, perhaps some other tech, but I figure it would be so complicated 
that barely anybody would use it, those who need RDF diff / patch or 
changesets can layer on by other means by simply having an identifier 
for the g-box.

> I went back to Pat's email that started this discussion (thanks Pat:-). And I went back to Sandro's mail for the terminology (thanks Sandro:-). It did look simple at the time:-); We have these notions
> 
> 1. Abstract RDF graph, ie, a mathematical abstraction as a set of triples (per Pat), a.k.a. a g-snap (per Sandro's terminology). It is an abstract thing like a number (as opposed to a numeral)
> 2. (Concrete) RDF graph a.k.a. graph token (per Pat), a.k.a. a g-box (per Sandro's terminology), that, when poked, return a textual representation (a.k.a. serialization a.k.a. g-text) of an abstract graph. g-boxes may have a URI. Blank nodes are bound to a specific g-box. (Are g-boxes non-informational resources if they have a URI? I guess...)
> 
> There is no time in all this. Well, no temporal logic, because, of course, you poke a box at a particular time but, say, if you poke my personal (non-informational resource) URI, the system will provide a foaf file today that is different than the one yesterday. But the community could live with that without temporal logic, why couldn't it tomorrow?

I agree, although I took Pat's graph-token to mean g-text and not g-box, 
and take "concrete rdf graph" to mean g-text as well, different to the 
concept of g-box. And further, I take the notion of g-box to be an 
information resource (something for which you can get a realization 
(copy/instance) of it's current state, a representation, a g-text) - 
which is a major difference in our thinking, one I'd like to iron out, 
perhaps off-list.

> [snip]
> 
>> My take away on this, is that if people want "named graphs" we can only accommodate "named snapshot realizations", which means that if you find at some point two different snapshot-realizations bearing the same name, well frankly you're up the creek without a paddle! We could provide for "quoted graphs" which would allow people to describe what a snapshot-realization is (retrieved from here at date x etc) but then we're moving more towards N3 (a good thing imho).
>>
>> Another choice is to formalize what's required for the presence of boxes, such that boxes exist and can be given names, but you can only ever "get" the current state of the box (thus negating the need for mentioning state, state changes or moving to temporal logic), this would make room for other specs to piece together layers of the cake such as some dataset synchronization method, or say adding versioning meta data to http responses in order to cater for this need. The only thing that would need addressed for this would be the scope of blank node quantification.
>>
>> Personally, I'd say let's go for adding quoted graphs, variables, add the concept of box but only ever account for the current state, and scope blank node identifiers to being at box level. This would allow for the community to cover all use cases either in or out of RDF and layer on other bits to the sem web stack where needed. Practically this could be quoted graphs added to turtle, and some trig like format which could refer to a named-box and show the snapshot realization of the current state of that box.
>>
> 
> Interestingly, after going a great round, we may end up by the same things... Because, at the end of the day, we may have two different notions that we want to formalize and/or give syntax to:
> 
> 1. A syntax for an abstract graph, much like we use the "1"^^xsd:integer for the mathematical abstraction of the number 1. This is what {<a> <b> <c>.} is in TriG or N3

by that definition, if "1"^^xsd:integer is a literal, then so must be 
{<a> <b> <c>.} !?

> 2. The notion of a URI given to an RDF graph (RDF graph token for Pat, g-box for Sandro). We may or may not provide a specific syntax for that

as earlier, graph-token = snapshot-realization = g-text from what I can 
tell, fundamentally different to g-box, which is the critical 
distinction I'm trying to get across here, one is naming something which 
can have different values/representations over time, the other is not.

> As a side issue (but I think it is important): it is true that, in a very general sense, it is attractive to talk about Graph Literals. However, literals have a very specific sense and role in the RDF semantics, as well as in the OWL semantics, it has the separate notion of D-entailment in RDFS, even more datatype reasoning in OWL, the current RDF has restriction as for the usage of literals in subject position, etc. If we talk about graph literals, I am afraid we would open up Pandora's box. As a specific issue, if we allow for graph literals in subject position but then we would have to look at the general issue of literals-as-subject in general and that has already created a formidable turmoil on the mailing list. Let us not go there. I would therefore suggest _not_ to talk about graph literals but a very specific thing (that we may have to add to the RDF document separately). We can call it quoted graph, g-snap-realization, whatever...

so we could never "{<a> <b> <c>.}"^^x:graph ? - although yes, 
quoted-graph is fine by me and matches what I'm thinking. The literal 
part is just whether the thing is a literal or not (which I'd say it is, 
and as per your definition 1. above).

Cheers,

Nathan
Received on Monday, 28 February 2011 12:53:26 UTC