Re: Blank Nodes Re: Toward easier RDF: a proposal from Nathan Rixham on 2018-12-03 (semantic-web@w3.org from December 2018)

From: Nathan Rixham <nathan@webr3.org>
Date: Mon, 3 Dec 2018 21:13:20 +0000
To: phayes@ihmc.us
Cc: tl@rat.io, Tim Berners-Lee <timbl@w3.org>, W3C Semantic Web IG <semantic-web@w3.org>
Message-ID: <CANiy74zSxLMmMeW=BYxuQOugNN9cDyQ7gPt7abY9s9hLes2aHw@mail.gmail.com>
On Mon, Dec 3, 2018 at 8:47 PM PatHayes <phayes@ihmc.us> wrote:

> On Nov 25, 2018, at 11:14 AM, thomas lörtsch <tl@rat.io> wrote:
>
> On 22. Nov 2018, at 13:02, Tim Berners-Lee <timbl@w3.org> wrote:
>
> David
>
> I agree with your resolution to make RDF easier to use for real
>  developers, whatever they are.  But I do not despair at the level that you
> do, I am more hopeful.
> Let me pick just one of your points (with a new subject as suggested).
>
>
> On 2018-11 -21, at 22:40, David Booth <david@dbooth.org> wrote:
>
> 3. Blank nodes.  They are an important convenience for RDF
> authors,
>
>
> Yes, here I agree.  The default data language for developers at the moment
> if JSON, and that is full of blank nodes.  Every {} in JSON is equivalent
> to a blank node [] in turtle
>
> Where in JSON you write
>
> { “name”: “Fred Bloggs”,
> “address”: {
>  “number”:  123,
>  “street”: “Acacia Avenue” }
> }
>
> in turtle you write
>
> [ :name “Fred Bloggs”;
> :address [
>    :number  123;
>    :street  “Acacia Avenue” ]
> ]
>
> Which is just as simple as the JSON.  When you look at Turtle as a language
> to write and to generate it is I think nice.
>
>
>
> IMO this is a good example that bnodes actually are foremost: structure.
>
> I used to think of them as plastic bags: you put things in them to
> transport them or keep them together but they carry no meaning in
> themselves (not counting the advertisements usually printed on them as
> "meaning", of course).
>
> Bnodes allow graphs to encode nested lists (trees). That is useful because
> although graphs are very flexible, in real life we often prefer less
> flexible data structures like lists, nested lists, tables. At least I do
> when I write things down. Those structures are very useful. They add some,
> well, structure, to what we want to express. Do they carry "meaning"? I’d
> say yes but normally I don’t refer to the structure itself. In contrary
> it’s so useful because I don’t have to explicate it - it’s just there, as
> bullet points, indentation, columns and rows.
>
> Sometimes I do want to adress a specific location in that structure. Then
> it’s useful to be able to give that bnode an identifier (and the ability to
> do so is a plus for RDF). However a triple with a bnode seperated from the
> other triples containing that same bnode can always only be so useful. It’s
> like taking two cells out of a bigger table, without headings or the full
> row. How far can that possibly get you? I think that some of the complaints
> voiced in this thread are based on unreasonable expectations and on a lack
> of understanding what bnodes are and can be.
>
> Maybe unreasonable expectations at a deeper level are the core of the
> problem: the usefulness of graphs as data structures is limited, maybe more
> limited than RDF likes to admit. They are not always the most appropriate
> solution. We often use much more structured approaches to information
> modelling like trees and tables, and for good reasons.
> RDF might be much more useful if it had a way to integrate those
> structures instead of trying to mimick them - and integrate itself better
> into other datastructures. Then maybe we would need less blank nodes.
> Nested lists as first class citizens in RDF would be a good thing. Also
> tables. There were discussions about "dark triples" pre the 2004 spec but I
> couldn’t find much in the mailinglist archives on the thinking behind it.
> But putting more emphasis on linking into existing data structures - like
> into certain cells in a RDBMS table or subtrees in a JSON document - might
> be helpful as well.
>
> My main problem with bnodes is that it’s so hard to see where one
> structure ends and the next one begins, and what that structure actually
> is: a list? nested? how deep? a table even? an n-ary relation? where does
> that end? which node represents its main role?
> A relational table or a nested list make that much easier. In a graph it
> takes extra effort to mark and characterize boundaries and substructures.
> RDF tries to do all that with just the bnodes and they are overloaded.
> That’s why it can be much harder to figure out what’s going on in an RDF
> based system than in a RDBMS based application - despite all the self
> describing properties etc.
>
>
> I think this is a very basic and important point. It is what I meant,
> expressed differently, by saying that RDF has no way to indicate scope.
> Bnodes in RDF are, logically, existentially quantified variables, but RDF
> has no way to indicate, and therefore no way for anyone to know, where the
> quantifiers are which bind those variables. So, for example, if we assume
> they are just outside each RDF document, then we should standardize
> bnodeIDs apart when merging; but if we assume they have larger scope, then
> maybe we shouldn’t. Bnodes introduced to encode structures like n-ary
> relational assertions, or lists, or some complicated piece of OWL syntax,
> should have a very narrow scope corresponding to the exact boundaries of
> those structures, and hence should be ‘invisible’ from outside (which is
> why it is fine to make them vanish in a higher-level syntax using [ ] or (
> ).)
>
> Ideally, RDF2 should provide for these structures directly, but maybe we
> can get the benefit with a relatively tiny step, just by having a syntax
> for RDF which has explicit scoping brackets. Off the cuff, imagine a
> variant of NTriples in which a subset of triples can be enclosed in
> brackets, say [  ] (or something else if thse are already taken) to
> indicate that any bnode ID in a triple inside the bracket is local to those
> triples, ie is ‘bound'. Current RDF engines which do not make use of this
> information can simply ignore them, since they do not change the RDF
> meaning of the graph, but they may provide useful information to newer
> engines. For example, they might make it a lot easier to parse OWL syntax
> (‘Manchester’ syntax) from OWL/RDF.
>
> Putting brackets around an entire graph says, in effect, that all bnodeIDs
> in this graph are local to the graph: omitting them allows the possibility
> of sharing a bnode with some other graph (as in RDF datasets).
>
> A better system, which would allow for more elaborate structures, would be
> to have convention of labelled scope brackets of the form [ID ], where ID
> is any alphanumeric string, which is understood to ‘bind’ only bnodes with
> ids of the form _:string where ID is an initial substring of string. So for
> example [A  ] binds _:A1 and _:A17 but not _:B1. This would allow the full
> expressiveness of nested quantification without very much extra work at
> all, and again it could be simply ignored by current RDF engines without
> harm, although they might be missing out on some of the meaning being
> expressed by this more elaborate notation. And if you leave out the ID,
> then this defaults to the simpler notation in the previous paragraph, so bc
> is automatic.
>
> The scope identifier should only be attached to one bracket, to make this
> kind of silliness
>
> [A ,,,,[B,,,,,]A….]B
>
> impossible.
>
> This could be used to hide the internal strcuture of RDF lists:
>
> [L
> _:a rdf:first x:A .
> _:a rdf:rest _:Lb .
> _:Lb rdf:first x:B.
> _:Lb rdf:rest rdf:nil .
> ]
> could be abbreviated as something like
> {x:A,x:B}
> and this treated like a new kind of RDF name, which of course becomes the
> first bnodeID (_:a) when compiled into RDF triples (which is why that
> bnodeID is not included in the scope, so it can act as the ‘name' of the
> list elsewhere in the graph.)
>

Pat, to me it looks like you're describing an RDF Dataset where Blank Node
CANNOT be shared between the RDF Graphs, it would achieve the same no?

Open question: why can the scope of quantification not be the edge of the
RDF Graph, what is the use case / requirement for blank nodes to be shared
between graphs?
Received on Monday, 3 December 2018 21:13:54 UTC