Re: [External] Re: The Joy of NULLs (not) from Hugo Mills on 2019-08-25 (semantic-web@w3.org from August 2019)

From: Hugo Mills <hugo@carfax.org.uk>
Date: Sun, 25 Aug 2019 13:25:58 +0100
To: Hugh Glaser <hugh@glasers.org>
Cc: Henry Story <henry.story@bblfish.net>, Daniel Hernandez <daniel@degu.cl>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <20190825122558.GF19687@savella.carfax.org.uk>
On Sun, Aug 25, 2019 at 11:02:11AM +0100, Hugh Glaser wrote:
> The blank node comments here got me pondering.
> Sorry, I'm a bit of a newbie with blank nodes (I never use them, and perhaps this posting indicates why), so this may all be well-known and in the docs.
> But can I check my understanding here, please.
> 
> As soon as you start to query a graph with blank nodes in practice, I think things gets pretty weird, and even unpredictable.
> Yes, in the RDF model, I'm sure it is all clear.
> But serialising changes everything.
> 
> I guess the simplest thing I am thinking of is something like:
> id:id1 ns:p1  [ ns:p2 id:id2 ] .
> 
> With these sorts of queries:
> Q1) ?s ns:p1 ?o .
> Q2) id:id1 ?p ?o .
> Q3) ?s ?p id:id2 .
> 
> Depending on doing SELECT v. CONSTRUCT (or with a Linked Data wrapper around the store, accessing id:id1 and id:id2), and what sort of content I might be negotiating, I will get various sorts of results.
> They will usually have something like _:genid1 or <ns:p1 rdf:nodeID="genid1"/> or whatever, and it may be in multiple documents.
> 
> But I *think* that what happens is that the connection between id:id1 and id:id2 is kept (Q1) or lost (Q2 + Q3), entirely dependent on how you choose to get the RDF out of your store.
> I hope I haven't misunderstood things - I have tried it on some stores, but that isn't a reliable way of working out what is meant to happen by definition of SPARQL etc..

   If I've understood your question, yes, you're mostly right
(although your Q1 example isn't quite right).

   Q1 will give you an identity of the bnode (_:b1, say). In the
serialisation of the result, that has an identity which is unique
*within that serialisation*, and cannot be assumed to match the
identities in any other serialisation of data from the graph.

   So, if your query is: Q4 "?s ns:p1 ?b. ?b ns:p2 ?o", you'll get the
relationship between id:id1 and id:id2. If you try making two separate
queries, Q5a "?s ns:p1 ?b1.", Q5b "?b2 ns:p2 ?o.", then the identities
returned for ?b1 and ?b2 may or may not match up -- there's no
requirement for them to do so.

   The issues over processing of bnodes also turns up in other places,
such as finding graph isomorphisms. Without bnodes, it's a simple
matter of checking that the two graphs have the same set of triples in
them. With bnodes present, you have to do a whole load of extra work
to find a pairwise mapping (relabelling) between the bnodes in each
graph which makes the set of triples the same. If no such mapping
exists, then the graphs are not isomorphic; if one does exist, then
they are.

   Hugo.

> Have I got it wrong?
> 
> And is that also really what SQL does, as you say?
> 
> [Exercise for the reader: create a Category Theory description of what blank nodes actually do in the whole Semantic Web ecosystem :-) ]
> 
> Best
> Hugh
> 
> 
> > On 25 Aug 2019, at 06:19, Henry Story <henry.story@bblfish.net> wrote:
> > 
> > Continuing this thread that started with the funny story on the NULL 
> > vanity licence plate reported here:
> >    https://mashable.com/article/dmv-vanity-license-plate-def-con-backfire/
> > 
> > I just came across work by Ryan Wisnesky on Algebraic Databases, where
> > the authors formalizes DBs in terms of Category Theory, in order to build provably 
> > correct ways to transform data.  
> > 
> > In that formalization, for which they have software tools, they give an clear 
> > explanation of NULLs in SQL databases that make each 
> > NULL different.  In the talk linked to below Ryan Wisnesky actually gives them  
> > different  subscripts. 
> > 
> > In that way nulls  in DBs are very different from nulls in 
> > Java - which can be compared for equality  and for which there exists only one 
> > instance -  and very similar to blank nodes on the semantic web.
> > 
> > See the presentation ”Algebraic Databases” on his web site
> >      https://www.wisnesky.net/
> > Or other content I found on this work
> >      https://twitter.com/bblfish/status/1165195822625153024
> > 
> > Henry Story
> > 
> > 
> >> On 13 Aug 2019, at 15:53, Daniel Hernandez <daniel@degu.cl> wrote:
> >> 
> >> SQL nulls are similar in some aspects to Codd nulls. A difference is that SQL nulls do no provide guaranty that the value exists. Blank nodes, on the other hand, are similar to marked nulls. We study the application to SPARQL of SQL techniques to approximate certain answers in: "Certain Answers for SPARQL with Blank Nodes." However, we founded a unique dataset using blank nodes as unknown values (Wikidata). I am curious if you know another.
> >> 
> >> On Tue, Aug 13, 2019 at 3:53 AM, Franconi Enrico <franconi@inf.unibz.it> wrote:
> >>> The situation is slightly more complex than that. 
> >>> NULL values in standard SQL are exactly defined as letting any equality involving a NULL value fail.
> >>> Note that the string 'NULL' represents a NULL value, namely if you type the string NULL into a cell of type STRING then it is understood to be a NULL value. 
> >>> This is where the implementors failed: a NULL value is never equal to itself.
> >>> This can be understood with the following standard SQL example (try it!).
> >>> 
> >>> With the database:
> >>> 
> >>> TABLE: col1 | col2
> >>>        -----+-----
> >>>          a  |  b
> >>>          b  | NULL
> >>> 
> >>> the query (meant to be the identity query, namely returning the input table itself):
> >>> 
> >>> SELECT * FROM TABLE 
> >>> WHERE TABLE.col1 = TABLE.col1 AND TABLE.col2 = TABLE.col2 ;
> >>> 
> >>> gives the result:
> >>> 
> >>> col1 | col2
> >>> -----+-----
> >>>   a  |  b
> >>> 
> >>> In SQL, the query above returns the table TABLE if and only if the table TABLE does not have any NULL value, otherwise it returns just the tuples not containing a NULL value, i.e., in this case only the ﬁrst tuple <a,b>. Informally this is due to the fact that a SQL NULL value is never equal (or not equal) to anything, including itself. This is because a SQL NULL value represents the absence of a value.
> >>> 
> >>> Note that this is where SQL NULL values are radically different from RDF bnodes. Indeed a bnode is EQUAL to itself but different from any other bnode. This is because a RDF bnode represents the existence of an unknown value.
> >>> 
> >>> --e.
> >>> 
> >>>> Il giorno 12 ago 2019, alle ore 16:41, Diogo FC Patrao <djogopatrao@gmail.com> ha scritto:
> >>>> 
> >>>> 
> >>>> Vanity license plates in USA are strings, right? Then this problem would only happen if NULL='NULL', which is not.
> >>>> 
> >>>> It could be that the private company stored 'NULL' instead of NULL to the unassigned tickets, but that's really bad coding/design (and easy to fix, I guess).
> >>>> 
> >>>> Or maybe the DAO wrongly translate NULL to 'NULL' at some point.
> >>>> 
> >>>> Cheers
> >>>> 
> >>>> dfcp
> >>>> 

-- 
Hugo Mills             | Alert status chocolate viridian: Authorised
hugo@... carfax.org.uk | personnel only. Dogs must be carried on escalator.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |
Received on Sunday, 25 August 2019 12:26:28 UTC