Re: Blank Nodes Re: Toward easier RDF: a proposal from Henry Story on 2018-12-03 (semantic-web@w3.org from December 2018)

From: Henry Story <henry.story@bblfish.net>
Date: Mon, 3 Dec 2018 22:53:01 +0100
To: Anthony Moretti <anthony.moretti@gmail.com>
Cc: nathan <nathan@webr3.org>, Patrick Hayes <phayes@ihmc.us>, tl@rat.io, Sir Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>
Message-Id: <BEA30DA2-0DE0-45EA-B03B-0711C2BD683E@bblfish.net>
> On 3 Dec 2018, at 22:38, Anthony Moretti <anthony.moretti@gmail.com> wrote:
> 
> Cheers for the replies, Henry. To do with Hugh's example:
> 
>     [ a "PostalAddress”;
>         :streetAddress: “1 High St",
>         :addressLocality geo:london;
>     ];
> 
>     Seems typical of our discussion.
>     But what if I also later get a triple
>     geo:london :inRegion geo:ontario .
> 
> A PostalAddress would have minimum criteria to make it valid, for example having a Region might be a requirement. To test whether the above blank node is valid you would dereference geo:London and see if it has Region information, which means that in this example it isn't actually a valid PostalAddress until that triple with geo:Ontario is added.
> 
> It's the same as a fraction missing a denominator, the following blank node wouldn't be a valid Fraction until that information was added.
> 
>     {
>         type: Fraction,
>         numerator: 2
>     }
> 
> Maybe just like IEEE 754 defines floating point numbers and also their relevant operations, standards like schema:PostalAddress should possibly define relevant operations like equality checking too. It would be like having a standard library and everybody could be sure that when they use a standard type it will compared in the same way by everybody.

I think in OWL you can do that. You can say that every person has two parents for example. The good thing
is that you can work with incomplete information, so that you don't need to have an identifier for each of the parents.
Similarly a detective might find a burnt envelope on the scene of the crime which stated a street address but not the
country or town. An OWL rule (not sure which one right now) would then allow an OWL reasoner to conclude that there
should be such fields, but these be existentially quantified until a later piece of information allows them to be filled in.
Indeed just knowing that the letter was meant to be sent to a certain street address reduces the search space quite massively.

> 
> Anthony
> 
> 
> 
> On Mon, Dec 3, 2018 at 1:18 PM Nathan Rixham <nathan@webr3.org <mailto:nathan@webr3.org>> wrote:
> 
> On Mon, Dec 3, 2018 at 8:47 PM PatHayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>> On Nov 25, 2018, at 11:14 AM, thomas lörtsch <tl@rat.io <mailto:tl@rat.io>> wrote:
>>> On 22. Nov 2018, at 13:02, Tim Berners-Lee <timbl@w3.org <mailto:timbl@w3.org>> wrote:
>>> 
>>> David
>>> 
>>> I agree with your resolution to make RDF easier to use for real  developers, whatever they are.  But I do not despair at the level that you do, I am more hopeful.
>>> Let me pick just one of your points (with a new subject as suggested).
>>> 
>>> 
>>>> On 2018-11 -21, at 22:40, David Booth <david@dbooth.org <mailto:david@dbooth.org>> wrote:
>>>> 
>>>> 3. Blank nodes.  They are an important convenience for RDF
>>>> authors,
>>> 
>>> Yes, here I agree.  The default data language for developers at the moment
>>> if JSON, and that is full of blank nodes.  Every {} in JSON is equivalent to a blank node [] in turtle
>>> 
>>> Where in JSON you write
>>> 
>>> { “name”: “Fred Bloggs”,
>>> “address”: {
>>>  “number”:  123,
>>>  “street”: “Acacia Avenue” }
>>> }
>>> 
>>> in turtle you write
>>> 
>>> [ :name “Fred Bloggs”; 
>>> :address [
>>>    :number  123;
>>>    :street  “Acacia Avenue” ]
>>> ] 
>>> 
>>> Which is just as simple as the JSON.  When you look at Turtle as a language
>>> to write and to generate it is I think nice.
>> 
>> 
>> IMO this is a good example that bnodes actually are foremost: structure. 
>> 
>> I used to think of them as plastic bags: you put things in them to transport them or keep them together but they carry no meaning in themselves (not counting the advertisements usually printed on them as "meaning", of course).
>> 
>> Bnodes allow graphs to encode nested lists (trees). That is useful because although graphs are very flexible, in real life we often prefer less flexible data structures like lists, nested lists, tables. At least I do when I write things down. Those structures are very useful. They add some, well, structure, to what we want to express. Do they carry "meaning"? I’d say yes but normally I don’t refer to the structure itself. In contrary it’s so useful because I don’t have to explicate it - it’s just there, as bullet points, indentation, columns and rows.
>> 
>> Sometimes I do want to adress a specific location in that structure. Then it’s useful to be able to give that bnode an identifier (and the ability to do so is a plus for RDF). However a triple with a bnode seperated from the other triples containing that same bnode can always only be so useful. It’s like taking two cells out of a bigger table, without headings or the full row. How far can that possibly get you? I think that some of the complaints voiced in this thread are based on unreasonable expectations and on a lack of understanding what bnodes are and can be.
>> 
>> Maybe unreasonable expectations at a deeper level are the core of the problem: the usefulness of graphs as data structures is limited, maybe more limited than RDF likes to admit. They are not always the most appropriate solution. We often use much more structured approaches to information modelling like trees and tables, and for good reasons. 
>> RDF might be much more useful if it had a way to integrate those structures instead of trying to mimick them - and integrate itself better into other datastructures. Then maybe we would need less blank nodes.
>> Nested lists as first class citizens in RDF would be a good thing. Also tables. There were discussions about "dark triples" pre the 2004 spec but I couldn’t find much in the mailinglist archives on the thinking behind it. 
>> But putting more emphasis on linking into existing data structures - like into certain cells in a RDBMS table or subtrees in a JSON document - might be helpful as well.
>> 
>> My main problem with bnodes is that it’s so hard to see where one structure ends and the next one begins, and what that structure actually is: a list? nested? how deep? a table even? an n-ary relation? where does that end? which node represents its main role?
>> A relational table or a nested list make that much easier. In a graph it takes extra effort to mark and characterize boundaries and substructures. RDF tries to do all that with just the bnodes and they are overloaded. That’s why it can be much harder to figure out what’s going on in an RDF based system than in a RDBMS based application - despite all the self describing properties etc. 
> 
> I think this is a very basic and important point. It is what I meant, expressed differently, by saying that RDF has no way to indicate scope. Bnodes in RDF are, logically, existentially quantified variables, but RDF has no way to indicate, and therefore no way for anyone to know, where the quantifiers are which bind those variables. So, for example, if we assume they are just outside each RDF document, then we should standardize bnodeIDs apart when merging; but if we assume they have larger scope, then maybe we shouldn’t. Bnodes introduced to encode structures like n-ary relational assertions, or lists, or some complicated piece of OWL syntax, should have a very narrow scope corresponding to the exact boundaries of those structures, and hence should be ‘invisible’ from outside (which is why it is fine to make them vanish in a higher-level syntax using [ ] or ( ).) 
> 
> Ideally, RDF2 should provide for these structures directly, but maybe we can get the benefit with a relatively tiny step, just by having a syntax for RDF which has explicit scoping brackets. Off the cuff, imagine a variant of NTriples in which a subset of triples can be enclosed in brackets, say [  ] (or something else if thse are already taken) to indicate that any bnode ID in a triple inside the bracket is local to those triples, ie is ‘bound'. Current RDF engines which do not make use of this information can simply ignore them, since they do not change the RDF meaning of the graph, but they may provide useful information to newer engines. For example, they might make it a lot easier to parse OWL syntax (‘Manchester’ syntax) from OWL/RDF. 
> 
> Putting brackets around an entire graph says, in effect, that all bnodeIDs in this graph are local to the graph: omitting them allows the possibility of sharing a bnode with some other graph (as in RDF datasets).
> 
> A better system, which would allow for more elaborate structures, would be to have convention of labelled scope brackets of the form [ID ], where ID is any alphanumeric string, which is understood to ‘bind’ only bnodes with ids of the form _:string where ID is an initial substring of string. So for example [A  ] binds _:A1 and _:A17 but not _:B1. This would allow the full expressiveness of nested quantification without very much extra work at all, and again it could be simply ignored by current RDF engines without harm, although they might be missing out on some of the meaning being expressed by this more elaborate notation. And if you leave out the ID, then this defaults to the simpler notation in the previous paragraph, so bc is automatic. 
> 
> The scope identifier should only be attached to one bracket, to make this kind of silliness
> 
> [A ,,,,[B,,,,,]A….]B
> 
> impossible.
> 
> This could be used to hide the internal strcuture of RDF lists:
> 
> [L 
> _:a rdf:first x:A .
> _:a rdf:rest _:Lb .
> _:Lb rdf:first x:B.
> _:Lb rdf:rest rdf:nil .
> ]
> could be abbreviated as something like 
> {x:A,x:B}
> and this treated like a new kind of RDF name, which of course becomes the first bnodeID (_:a) when compiled into RDF triples (which is why that bnodeID is not included in the scope, so it can act as the ‘name' of the list elsewhere in the graph.)
> 
> Pat, to me it looks like you're describing an RDF Dataset where Blank Node CANNOT be shared between the RDF Graphs, it would achieve the same no?
> 
> Open question: why can the scope of quantification not be the edge of the RDF Graph, what is the use case / requirement for blank nodes to be shared between graphs?
Received on Monday, 3 December 2018 21:53:29 UTC