Scoping bnodes (was: Re: Blank Nodes Re: Toward easier RDF: a proposal) from Pat Hayes on 2018-11-27 (semantic-web@w3.org from November 2018)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 26 Nov 2018 23:43:44 -0600
To: thomas lörtsch <tl@rat.io>
Cc: Tim Berners-Lee <timbl@w3.org>, SW-forum Web <semantic-web@w3.org>
Message-ID: <25dab3f5-4528-209b-5655-c8b713a828c4@ihmc.us>
> On Nov 25, 2018, at 11:14 AM, thomas lörtsch <tl@rat.io> wrote:
> 
> 
> 
>> On 22. Nov 2018, at 13:02, Tim Berners-Lee <timbl@w3.org> wrote:
>> 
>> David
>> 
>> I agree with your resolution to make RDF easier to use for real  developers, whatever they are.  But I do not despair at the level that you do, I am more hopeful.
>> Let me pick just one of your points (with a new subject as suggested).
>> 
>> 
>>> On 2018-11 -21, at 22:40, David Booth <david@dbooth.org> wrote:
>>> 
>>> 3. Blank nodes.  They are an important convenience for RDF
>>> authors,
>>> 
>> Yes, here I agree.  The default data language for developers at the moment
>> if JSON, and that is full of blank nodes.  Every {} in JSON is equivalent to a blank node [] in turtle
>> 
>> Where in JSON you write
>> 
>> { “name”: “Fred Bloggs”,
>> “address”: {
>>  “number”:  123,
>>  “street”: “Acacia Avenue” }
>> }
>> 
>> in turtle you write
>> 
>> [ :name “Fred Bloggs”; 
>> :address [
>>    :number  123;
>>    :street  “Acacia Avenue” ]
>> ] 
>> 
>> Which is just as simple as the JSON.  When you look at Turtle as a language
>> to write and to generate it is I think nice.
>> 
>> 
>> IMO this is a good example that bnodes actually are foremost: structure. 
> I used to think of them as plastic bags: you put things in them to transport them or keep them together but they carry no meaning in themselves (not counting the advertisements usually printed on them as "meaning", of course).
> 
> Bnodes allow graphs to encode nested lists (trees). That is useful because although graphs are very flexible, in real life we often prefer less flexible data structures like lists, nested lists, tables. At least I do when I write things down. Those structures are very useful. They add some, well, structure, to what we want to express. Do they carry "meaning"? I’d say yes but normally I don’t refer to the structure itself. In contrary it’s so useful because I don’t have to explicate it - it’s just there, as bullet points, indentation, columns and rows.
> 
> Sometimes I do want to adress a specific location in that structure. Then it’s useful to be able to give that bnode an identifier (and the ability to do so is a plus for RDF). However a triple with a bnode seperated from the other triples containing that same bnode can always only be so useful. It’s like taking two cells out of a bigger table, without headings or the full row. How far can that possibly get you? I think that some of the complaints voiced in this thread are based on unreasonable expectations and on a lack of understanding what bnodes are and can be.
> 
> Maybe unreasonable expectations at a deeper level are the core of the problem: the usefulness of graphs as data structures is limited, maybe more limited than RDF likes to admit. They are not always the most appropriate solution. We often use much more structured approaches to information modelling like trees and tables, and for good reasons. 
> RDF might be much more useful if it had a way to integrate those structures instead of trying to mimick them - and integrate itself better into other datastructures. Then maybe we would need less blank nodes.
> Nested lists as first class citizens in RDF would be a good thing. Also tables. There were discussions about "dark triples" pre the 2004 spec but I couldn’t find much in the mailinglist archives on the thinking behind it. 
> But putting more emphasis on linking into existing data structures - like into certain cells in a RDBMS table or subtrees in a JSON document - might be helpful as well.
> 
> My main problem with bnodes is that it’s so hard to see where one structure ends and the next one begins, and what that structure actually is: a list? nested? how deep? a table even? an n-ary relation? where does that end? which node represents its main role?
> A relational table or a nested list make that much easier. In a graph it takes extra effort to mark and characterize boundaries and substructures. RDF tries to do all that with just the bnodes and they are overloaded. That’s why it can be much harder to figure out what’s going on in an RDF based system than in a RDBMS based application - despite all the self describing properties etc. 

I think this is a very basic and important point. It is what I 
meant, expressed differently, by saying that RDF has no way to 
indicate scope. Bnodes in RDF are, logically, existentially 
quantified variables, but RDF has no way to indicate, and 
therefore no way for anyone to know, where the quantifiers are 
which bind those variables. So, for example, if we assume they 
are just outside each RDF document, then we should standardize 
bnodeIDs apart when merging; but if we assume they have larger 
scope, then maybe we shouldn’t. Bnodes introduced to encode 
structures like n-ary relational assertions, or lists, or some 
complicated piece of OWL syntax, should have a very narrow scope 
corresponding to the exact boundaries of those structures, and 
hence should be ‘invisible’ from outside (which is why it is fine 
to make them vanish in a higher-level syntax using [ ] or ( ).)

Ideally, RDF2 should provide for these structures directly, but 
maybe we can get the benefit with a relatively tiny step, just by 
having a syntax for RDF which has explicit scoping brackets. Off 
the cuff, imagine a variant of NTriples in which a subset of 
triples can be enclosed in brackets, say [  ] (or something else 
if thse are already taken) to indicate that any bnode ID in a 
triple inside the bracket is local to those triples, ie is 
‘bound'. Current RDF engines which do not make use of this 
information can simply ignore them, since they do not change the 
RDF meaning of the graph, but they may provide useful information 
to newer engines. For example, they might make it a lot easier to 
parse OWL syntax (‘Manchester’ syntax) from OWL/RDF.

Putting brackets around an entire graph says, in effect, that all 
bnodeIDs in this graph are local to the graph: omitting them 
allows the possibility of sharing a bnode with some other graph 
(as in RDF datasets).

A better system, which would allow for more elaborate structures, 
would be to have convention of labelled scope brackets of the 
form [ID ], where ID is any alphanumeric string, which is 
understood to ‘bind’ only bnodes with ids of the form _:string 
where ID is an initial substring of string. So for example [A  ] 
binds _:A1 and _:A17 but not _:B1. This would allow the full 
expressiveness of nested quantification without very much extra 
work at all, and again it could be simply ignored by current RDF 
engines without harm, although they might be missing out on some 
of the meaning being expressed by this more elaborate notation. 
And if you leave out the ID, then this defaults to the simpler 
notation in the previous paragraph, so bc is automatic.

The scope identifier should only be attached to one bracket, to 
make this kind of silliness

[A ,,,,[B,,,,,]A….]B

impossible.

This could be used to hide the internal strcuture of RDF lists:

[L
_:a rdf:first x:A .
_:a rdf:rest _:Lb .
_:Lb rdf:first x:B.
_:Lb rdf:rest rdf:nil .
]
could be abbreviated as something like
{x:A,x:B}
and this treated like a new kind of RDF name, which of course 
becomes the first bnodeID (_:a) when compiled into RDF triples 
(which is why that bnodeID is not included in the scope, so it 
can act as the ‘name' of the list elsewhere in the graph.)

Anyway, if someone ever convenes an RDF2 WG, I offer this idea 
for consideration.

Pat Hayes

-----------------------------------
call or text to 850 291 0667
www.ihmc.us/groups/phayes/
www.facebook.com/the.pat.hayes
Received on Tuesday, 27 November 2018 05:44:22 UTC