Re: Blank Nodes Re: Toward easier RDF: a proposal from Anthony Moretti on 2018-12-03 (semantic-web@w3.org from December 2018)

From: Anthony Moretti <anthony.moretti@gmail.com>
Date: Mon, 3 Dec 2018 13:38:51 -0800
To: nathan@webr3.org
Cc: phayes@ihmc.us, tl@rat.io, Tim Berners-Lee <timbl@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <CACusdfSjpKXFd5WoLcp4PSkYfcn8ttK=BhJNBmoowA11mXmkHg@mail.gmail.com>
Cheers for the replies, Henry. To do with Hugh's example:








*[ a "PostalAddress”;        :streetAddress: “1 High St",
:addressLocality geo:london;    ];    Seems typical of our discussion.
But what if I also later get a triple    geo:london :inRegion geo:ontario .*

A PostalAddress would have minimum criteria to make it valid, for example
having a Region might be a requirement. To test whether the above blank
node is valid you would dereference geo:London and see if it has Region
information, which means that in this example it isn't actually a valid
PostalAddress until that triple with geo:Ontario is added.

It's the same as a fraction missing a denominator, the following blank node
wouldn't be a valid Fraction until that information was added.

    {
        type: Fraction,
        numerator: 2
    }

Maybe just like IEEE 754 defines floating point numbers and also their
relevant operations, standards like schema:PostalAddress should possibly
define relevant operations like equality checking too. It would be like
having a standard library and everybody could be sure that when they use a
standard type it will compared in the same way by everybody.

Anthony



On Mon, Dec 3, 2018 at 1:18 PM Nathan Rixham <nathan@webr3.org> wrote:

>
> On Mon, Dec 3, 2018 at 8:47 PM PatHayes <phayes@ihmc.us> wrote:
>
>> On Nov 25, 2018, at 11:14 AM, thomas lörtsch <tl@rat.io> wrote:
>>
>> On 22. Nov 2018, at 13:02, Tim Berners-Lee <timbl@w3.org> wrote:
>>
>> David
>>
>> I agree with your resolution to make RDF easier to use for real
>>  developers, whatever they are.  But I do not despair at the level that you
>> do, I am more hopeful.
>> Let me pick just one of your points (with a new subject as suggested).
>>
>>
>> On 2018-11 -21, at 22:40, David Booth <david@dbooth.org> wrote:
>>
>> 3. Blank nodes.  They are an important convenience for RDF
>> authors,
>>
>>
>> Yes, here I agree.  The default data language for developers at the moment
>> if JSON, and that is full of blank nodes.  Every {} in JSON is equivalent
>> to a blank node [] in turtle
>>
>> Where in JSON you write
>>
>> { “name”: “Fred Bloggs”,
>> “address”: {
>>  “number”:  123,
>>  “street”: “Acacia Avenue” }
>> }
>>
>> in turtle you write
>>
>> [ :name “Fred Bloggs”;
>> :address [
>>    :number  123;
>>    :street  “Acacia Avenue” ]
>> ]
>>
>> Which is just as simple as the JSON.  When you look at Turtle as a
>> language
>> to write and to generate it is I think nice.
>>
>>
>>
>> IMO this is a good example that bnodes actually are foremost: structure.
>>
>> I used to think of them as plastic bags: you put things in them to
>> transport them or keep them together but they carry no meaning in
>> themselves (not counting the advertisements usually printed on them as
>> "meaning", of course).
>>
>> Bnodes allow graphs to encode nested lists (trees). That is useful
>> because although graphs are very flexible, in real life we often prefer
>> less flexible data structures like lists, nested lists, tables. At least I
>> do when I write things down. Those structures are very useful. They add
>> some, well, structure, to what we want to express. Do they carry "meaning"?
>> I’d say yes but normally I don’t refer to the structure itself. In contrary
>> it’s so useful because I don’t have to explicate it - it’s just there, as
>> bullet points, indentation, columns and rows.
>>
>> Sometimes I do want to adress a specific location in that structure. Then
>> it’s useful to be able to give that bnode an identifier (and the ability to
>> do so is a plus for RDF). However a triple with a bnode seperated from the
>> other triples containing that same bnode can always only be so useful. It’s
>> like taking two cells out of a bigger table, without headings or the full
>> row. How far can that possibly get you? I think that some of the complaints
>> voiced in this thread are based on unreasonable expectations and on a lack
>> of understanding what bnodes are and can be.
>>
>> Maybe unreasonable expectations at a deeper level are the core of the
>> problem: the usefulness of graphs as data structures is limited, maybe more
>> limited than RDF likes to admit. They are not always the most appropriate
>> solution. We often use much more structured approaches to information
>> modelling like trees and tables, and for good reasons.
>> RDF might be much more useful if it had a way to integrate those
>> structures instead of trying to mimick them - and integrate itself better
>> into other datastructures. Then maybe we would need less blank nodes.
>> Nested lists as first class citizens in RDF would be a good thing. Also
>> tables. There were discussions about "dark triples" pre the 2004 spec but I
>> couldn’t find much in the mailinglist archives on the thinking behind it.
>> But putting more emphasis on linking into existing data structures - like
>> into certain cells in a RDBMS table or subtrees in a JSON document - might
>> be helpful as well.
>>
>> My main problem with bnodes is that it’s so hard to see where one
>> structure ends and the next one begins, and what that structure actually
>> is: a list? nested? how deep? a table even? an n-ary relation? where does
>> that end? which node represents its main role?
>> A relational table or a nested list make that much easier. In a graph it
>> takes extra effort to mark and characterize boundaries and substructures.
>> RDF tries to do all that with just the bnodes and they are overloaded.
>> That’s why it can be much harder to figure out what’s going on in an RDF
>> based system than in a RDBMS based application - despite all the self
>> describing properties etc.
>>
>>
>> I think this is a very basic and important point. It is what I meant,
>> expressed differently, by saying that RDF has no way to indicate scope.
>> Bnodes in RDF are, logically, existentially quantified variables, but RDF
>> has no way to indicate, and therefore no way for anyone to know, where the
>> quantifiers are which bind those variables. So, for example, if we assume
>> they are just outside each RDF document, then we should standardize
>> bnodeIDs apart when merging; but if we assume they have larger scope, then
>> maybe we shouldn’t. Bnodes introduced to encode structures like n-ary
>> relational assertions, or lists, or some complicated piece of OWL syntax,
>> should have a very narrow scope corresponding to the exact boundaries of
>> those structures, and hence should be ‘invisible’ from outside (which is
>> why it is fine to make them vanish in a higher-level syntax using [ ] or (
>> ).)
>>
>> Ideally, RDF2 should provide for these structures directly, but maybe we
>> can get the benefit with a relatively tiny step, just by having a syntax
>> for RDF which has explicit scoping brackets. Off the cuff, imagine a
>> variant of NTriples in which a subset of triples can be enclosed in
>> brackets, say [  ] (or something else if thse are already taken) to
>> indicate that any bnode ID in a triple inside the bracket is local to those
>> triples, ie is ‘bound'. Current RDF engines which do not make use of this
>> information can simply ignore them, since they do not change the RDF
>> meaning of the graph, but they may provide useful information to newer
>> engines. For example, they might make it a lot easier to parse OWL syntax
>> (‘Manchester’ syntax) from OWL/RDF.
>>
>> Putting brackets around an entire graph says, in effect, that all
>> bnodeIDs in this graph are local to the graph: omitting them allows the
>> possibility of sharing a bnode with some other graph (as in RDF datasets).
>>
>> A better system, which would allow for more elaborate structures, would
>> be to have convention of labelled scope brackets of the form [ID ], where
>> ID is any alphanumeric string, which is understood to ‘bind’ only bnodes
>> with ids of the form _:string where ID is an initial substring of string.
>> So for example [A  ] binds _:A1 and _:A17 but not _:B1. This would allow
>> the full expressiveness of nested quantification without very much extra
>> work at all, and again it could be simply ignored by current RDF engines
>> without harm, although they might be missing out on some of the meaning
>> being expressed by this more elaborate notation. And if you leave out the
>> ID, then this defaults to the simpler notation in the previous paragraph,
>> so bc is automatic.
>>
>> The scope identifier should only be attached to one bracket, to make this
>> kind of silliness
>>
>> [A ,,,,[B,,,,,]A….]B
>>
>> impossible.
>>
>> This could be used to hide the internal strcuture of RDF lists:
>>
>> [L
>> _:a rdf:first x:A .
>> _:a rdf:rest _:Lb .
>> _:Lb rdf:first x:B.
>> _:Lb rdf:rest rdf:nil .
>> ]
>> could be abbreviated as something like
>> {x:A,x:B}
>> and this treated like a new kind of RDF name, which of course becomes the
>> first bnodeID (_:a) when compiled into RDF triples (which is why that
>> bnodeID is not included in the scope, so it can act as the ‘name' of the
>> list elsewhere in the graph.)
>>
>
> Pat, to me it looks like you're describing an RDF Dataset where Blank Node
> CANNOT be shared between the RDF Graphs, it would achieve the same no?
>
> Open question: why can the scope of quantification not be the edge of the
> RDF Graph, what is the use case / requirement for blank nodes to be shared
> between graphs?
>
Received on Monday, 3 December 2018 21:39:27 UTC