Re: Blank Nodes Re: Toward easier RDF: a proposal from Patrick J Hayes on 2018-12-03 (semantic-web@w3.org from December 2018)

From: Patrick J Hayes <phayes@ihmc.us>
Date: Mon, 3 Dec 2018 15:38:52 -0600
To: Nathan Rixham <nathan@webr3.org>
CC: thomas lörtsch <tl@rat.io>, Tim Berners-Lee <timbl@w3.org>, W3C Semantic Web IG <semantic-web@w3.org>
Message-ID: <B444CA0A-F7C2-4907-8E3B-61A1B3AC2BEA@ihmc.us>
> On Dec 3, 2018, at 3:13 PM, Nathan Rixham <nathan@webr3.org> wrote:
> 
> 
> On Mon, Dec 3, 2018 at 8:47 PM PatHayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>> On Nov 25, 2018, at 11:14 AM, thomas lörtsch <tl@rat.io <mailto:tl@rat.io>> wrote:
>>> On 22. Nov 2018, at 13:02, Tim Berners-Lee <timbl@w3.org <mailto:timbl@w3.org>> wrote:
>>> 
>>> David
>>> 
>>> I agree with your resolution to make RDF easier to use for real  developers, whatever they are.  But I do not despair at the level that you do, I am more hopeful.
>>> Let me pick just one of your points (with a new subject as suggested).
>>> 
>>> 
>>>> On 2018-11 -21, at 22:40, David Booth <david@dbooth.org <mailto:david@dbooth.org>> wrote:
>>>> 
>>>> 3. Blank nodes.  They are an important convenience for RDF
>>>> authors,
>>> 
>>> Yes, here I agree.  The default data language for developers at the moment
>>> if JSON, and that is full of blank nodes.  Every {} in JSON is equivalent to a blank node [] in turtle
>>> 
>>> Where in JSON you write
>>> 
>>> { “name”: “Fred Bloggs”,
>>> “address”: {
>>>  “number”:  123,
>>>  “street”: “Acacia Avenue” }
>>> }
>>> 
>>> in turtle you write
>>> 
>>> [ :name “Fred Bloggs”; 
>>> :address [
>>>    :number  123;
>>>    :street  “Acacia Avenue” ]
>>> ] 
>>> 
>>> Which is just as simple as the JSON.  When you look at Turtle as a language
>>> to write and to generate it is I think nice.
>> 
>> 
>> IMO this is a good example that bnodes actually are foremost: structure. 
>> 
>> I used to think of them as plastic bags: you put things in them to transport them or keep them together but they carry no meaning in themselves (not counting the advertisements usually printed on them as "meaning", of course).
>> 
>> Bnodes allow graphs to encode nested lists (trees). That is useful because although graphs are very flexible, in real life we often prefer less flexible data structures like lists, nested lists, tables. At least I do when I write things down. Those structures are very useful. They add some, well, structure, to what we want to express. Do they carry "meaning"? I’d say yes but normally I don’t refer to the structure itself. In contrary it’s so useful because I don’t have to explicate it - it’s just there, as bullet points, indentation, columns and rows.
>> 
>> Sometimes I do want to adress a specific location in that structure. Then it’s useful to be able to give that bnode an identifier (and the ability to do so is a plus for RDF). However a triple with a bnode seperated from the other triples containing that same bnode can always only be so useful. It’s like taking two cells out of a bigger table, without headings or the full row. How far can that possibly get you? I think that some of the complaints voiced in this thread are based on unreasonable expectations and on a lack of understanding what bnodes are and can be.
>> 
>> Maybe unreasonable expectations at a deeper level are the core of the problem: the usefulness of graphs as data structures is limited, maybe more limited than RDF likes to admit. They are not always the most appropriate solution. We often use much more structured approaches to information modelling like trees and tables, and for good reasons. 
>> RDF might be much more useful if it had a way to integrate those structures instead of trying to mimick them - and integrate itself better into other datastructures. Then maybe we would need less blank nodes.
>> Nested lists as first class citizens in RDF would be a good thing. Also tables. There were discussions about "dark triples" pre the 2004 spec but I couldn’t find much in the mailinglist archives on the thinking behind it. 
>> But putting more emphasis on linking into existing data structures - like into certain cells in a RDBMS table or subtrees in a JSON document - might be helpful as well.
>> 
>> My main problem with bnodes is that it’s so hard to see where one structure ends and the next one begins, and what that structure actually is: a list? nested? how deep? a table even? an n-ary relation? where does that end? which node represents its main role?
>> A relational table or a nested list make that much easier. In a graph it takes extra effort to mark and characterize boundaries and substructures. RDF tries to do all that with just the bnodes and they are overloaded. That’s why it can be much harder to figure out what’s going on in an RDF based system than in a RDBMS based application - despite all the self describing properties etc. 
> 
> I think this is a very basic and important point. It is what I meant, expressed differently, by saying that RDF has no way to indicate scope. Bnodes in RDF are, logically, existentially quantified variables, but RDF has no way to indicate, and therefore no way for anyone to know, where the quantifiers are which bind those variables. So, for example, if we assume they are just outside each RDF document, then we should standardize bnodeIDs apart when merging; but if we assume they have larger scope, then maybe we shouldn’t. Bnodes introduced to encode structures like n-ary relational assertions, or lists, or some complicated piece of OWL syntax, should have a very narrow scope corresponding to the exact boundaries of those structures, and hence should be ‘invisible’ from outside (which is why it is fine to make them vanish in a higher-level syntax using [ ] or ( ).) 
> 
> Ideally, RDF2 should provide for these structures directly, but maybe we can get the benefit with a relatively tiny step, just by having a syntax for RDF which has explicit scoping brackets. Off the cuff, imagine a variant of NTriples in which a subset of triples can be enclosed in brackets, say [  ] (or something else if thse are already taken) to indicate that any bnode ID in a triple inside the bracket is local to those triples, ie is ‘bound'. Current RDF engines which do not make use of this information can simply ignore them, since they do not change the RDF meaning of the graph, but they may provide useful information to newer engines. For example, they might make it a lot easier to parse OWL syntax (‘Manchester’ syntax) from OWL/RDF. 
> 
> Putting brackets around an entire graph says, in effect, that all bnodeIDs in this graph are local to the graph: omitting them allows the possibility of sharing a bnode with some other graph (as in RDF datasets).
> 
> A better system, which would allow for more elaborate structures, would be to have convention of labelled scope brackets of the form [ID ], where ID is any alphanumeric string, which is understood to ‘bind’ only bnodes with ids of the form _:string where ID is an initial substring of string. So for example [A  ] binds _:A1 and _:A17 but not _:B1. This would allow the full expressiveness of nested quantification without very much extra work at all, and again it could be simply ignored by current RDF engines without harm, although they might be missing out on some of the meaning being expressed by this more elaborate notation. And if you leave out the ID, then this defaults to the simpler notation in the previous paragraph, so bc is automatic. 
> 
> The scope identifier should only be attached to one bracket, to make this kind of silliness
> 
> [A ,,,,[B,,,,,]A….]B
> 
> impossible.
> 
> This could be used to hide the internal strcuture of RDF lists:
> 
> [L 
> _:a rdf:first x:A .
> _:a rdf:rest _:Lb .
> _:Lb rdf:first x:B.
> _:Lb rdf:rest rdf:nil .
> ]
> could be abbreviated as something like 
> {x:A,x:B}
> and this treated like a new kind of RDF name, which of course becomes the first bnodeID (_:a) when compiled into RDF triples (which is why that bnodeID is not included in the scope, so it can act as the ‘name' of the list elsewhere in the graph.)
> 
> Pat, to me it looks like you're describing an RDF Dataset where Blank Node CANNOT be shared between the RDF Graphs, it would achieve the same no?

Close, but one cannot include a dataset as a component of an RDF graph. I had in mind that this would simply be a ’scoped subgraph’ of some larger graoh. 
> 
> Open question: why can the scope of quantification not be the edge of the RDF Graph

Where is that edge? The RDF specs say that an RDF graph is a /set/ of triples. What determknes the ‘edge’ of a set? If you mean the document describing the graph, then yes, that is a natural default assumption, I agree, but as soon as you start taking bits of RDF from many sources and combining them, those boundaries get lost. And that was the intended purpose of RDF, to allow information from many sources to be combined and used together. 

> , what is the use case / requirement for blank nodes to be shared between graphs?

The issue is not so much a use case for sharing, as how to even know when bnodes are NOT being shared.  For one example, many users expect to be able to use bnodeIDs in the result of a query ‘outside’ the graph being queried, eg in subsequent queries. 

Pat
Received on Monday, 3 December 2018 21:39:32 UTC