Re: Category Theory, OWL and RDF - Was: The Joy of NULLs from Ryan Wisnesky on 2019-09-08 (semantic-web@w3.org from September 2019)

From: Ryan Wisnesky <ryan@conexus.ai>
Date: Sun, 8 Sep 2019 05:14:23 -0400
To: Henry Story <henry.story@bblfish.net>
Cc: "semantic-web@w3.org" <semantic-web@w3.org>, Steffen Staab <staab@uni-koblenz.de>, Daniel Hernandez <daniel@degu.cl>, Evan Patterson <epatters@stanford.edu>
Message-Id: <9ACBF605-CCFE-48A9-A06E-8BCDAF2BC55E@conexus.ai>
To elaborate, a function is a special kind of relation, and a relation can be expressed as two functions (a construction often called a span).  When you are dealing with functors C -> Rel a la Evan, it is natural to present C using equations between expressions in the binary relation algebra (i.e., the internal language of an allegory).   When you are dealing with functors C -> Set a la CQL, it is natural to present C using equations between expressions in binary functional algebra (i.e., the internal language of a category).  The reason these choices are natural is that the functors out of C will preserve those constraints, be they relational or functional, respectively.  

At the end of the day, equations between binary relational expressions are simply more expressive than their functional counterparts.  So to capture them in a setting where relations are encoded as spans of functions, as in CQL, you need an additional language of constraints.  When you add these constraints (so-called 'regular logic', also known as existential-functional horn clauses) to the equations between binary functional expressions, you obtain an equivalent formalism.  So you don't have to choose between CQL and Evan's relational ologs; you can translate back and forth, as shown in that FOAF example in CQL.

There is an innate trade-off between how expressive a constraint language is and how easy it is to migrate data into and out of schemas constrained by it.  When your constraint language is limited to functions and equations a la CQL without the above mentioned constraints, you get three data migration functors, delta, sigma, and pi, with delta being the model reduct functor from institutional model theory and sigma its left adjoint and pi its right adjoint.  When you increase expressive power to the relational olog case, the pi (and possibly sigma) data migration functor no longer exists in general - it's the price to be paid for additional expressivity.  Finding special cases where pi does exist is part of CQL's special sauce, because the technology that does that can also be put to work statically verifying that collections of SQL queries landing onto tables with interlocking foreign keys from tables with interlocking foreign keys will never violate referential integrity. 

> On Sep 8, 2019, at 4:35 AM, Henry Story <henry.story@bblfish.net> wrote:
> 
> 
> 
>> On 7 Sep 2019, at 22:21, Henry Story <henry.story@bblfish.net> wrote:
>> 
>> I have been following through some of the literature that lead 
>> to Wisnesky’s research in the last few weeks.
>> 
>> From a Category Theoretical/Database (+RDF) perspective a very clear
>> and readable paper is ”Functorial Data Migration” from 2012 by David Spivak.  https://arxiv.org/abs/1009.1166
>> There he argues that one can look at a database schema as a small category,
>> and that a functor from that to Set (or other relevant categories), is a
>> database instance. It is shown that RDF graphs can be generated from any
>> Database via a Grothendieck construction, which is quite easy to follow.
>> 
>> Some problems with that model:
>> 
>> a) Because the schema is mapped to Set (the category whose objects are Sets
>> and morphisms functions), it seems that these types of schemas don’t quite
>> fit our RDF ontologies though, as we can define relations on types that
>> are not complete. So I could create a relation :wife between :Man and :Woman
>> and yet this not require every Man to have a wife.
>> b) The Grothendieck construction types each node of RDF (more below)
>> 
>> On the other hand for normal databases and for learning category theory
>> this article is very useful. It gives a model on how to do schema mappings
>> which are understood as functors from one schema to the next. It also 
>> gives one a very intuitive way of understanding Kan extensions, and 
>> advanced topic in CT.
>> 
>> The problems above lead me to the 2017 paper ”Knowledge Representation
>> in Bicategories of Relations” by Evan Patterson at Stanford,
>> https://arxiv.org/pdf/1706.00526.pdf
>> 
>> By moving to Rel the category where objects are sets and arrows relations
>> the problem a) above is resolved, and Evan Patterson can give a much better
>> explanation of Description Logics and OWL, which he considers head on,
>> with examples from FOAF. This article requires little CT to read, though
>> it introduces string diagrams, which need a little playing with to 
>> understand. (It is a good introduction to them). 
>> 
>> A very interesting point that is made there is that where the internal 
>> language of closed cartesian categories (e.g. Set) is lambda calculus, the
>> internal calculus of the bicategory of relations is ”regular logic”.
>> Here we start seeing why this is relevant to thinking about 
>> the integration between Description Logics and functional 
>> programming languages. 
> 
> It turns out that there is a way to move from the functional and relational
> olog (Ontology Log(ic?)) that is shown in the FOAF example
> from the help of CQL. It 
> 
> https://www.categoricaldata.net/help/index.html
> https://www.categoricaldata.net/help/FOAF.html
> 
> with explanations in § 1.3.3 of the paper ”Algebraic Data Integration” by 
> Wisnesky. Each side has its advantages apparently. The
> relational model apparently not having the functorial schema migration
> functor.
> 
> I’ll start playing with the software.
> 
>> 
>> One thing that comes out of this is that this gives a model to typed
>> Description Logics. He explains the difference between these and OWL,
>> which also seem to be typed. It would be interesting to see if Steffen
>> Staab’s mapping of OWL to Scala ends up thinking of OWL in a typed
>> way as described by that formalism.
>> 
>> All of these models deal with RDF1.0 type graphs, but not quads.
>> The remaining question I have is if there is a way to model modal
>> logics that falls out of these. In Set we have (Indexed Strong) Monads 
>> that give a model for the "S says P” relation
>> https://www.sciencedirect.com/science/article/pii/S1571066107000746
>> Are there monads in the Bicategories of Relations? 
>> (Also see ”What Logic corresponds via Curry-Howard to a Monad?
>> https://cstheory.stackexchange.com/questions/42689/what-logic-correponds-via-curry-howard-to-a-monad )
>> 
>> This shows that there are very interesting insights to be gained
>> by thinking with Category Theorists more closely about the 
>> Semantic Web.
>> 
>> Having done that reading I should try next to get back to Ryan’s work
>> which he mentions below. 
>> 
>>> On 29 Aug 2019, at 18:47, Ryan Wisnesky <ryan@conexus.ai> wrote:
>>> 
>>> Hi all,
>>> 
>>> QINL is simply the name we gave to a common phenomenon in dependent type theories, which is that you can usefully represent sets as dependent types, instead of as terms.  That has both positive and negative implications.  It's possible that Dotty's (upcoming Scala's) type system may support QINL, and similarly for Dependent Haskell.  Once you represent sets as dependent types, you can e.g. manipulate them using Coq tactics: https://www.wisnesky.net/dbpl15.pdf ("Using Dependent Types and Tactics to Enable Semantic Optimization of Language-Integrated Queries") and https://homes.cs.washington.edu/~chushumo/files/cosette_pldi17.pdf ("Homotopy Type Theory SQL: Proving Query Rewrites with Univalent SQL Semantics").  Although both QINL and LINQ are most naturally described using category theory, conceptually they are about how collections are represented in type theory.  
>>> 
>>> In our work on the categorical query language CQL (http://categoricaldata.net), our notion of schema includes equationally-defined constraints, sufficient to encode arbitrary behavior as functional programs (e.g., there is a CQL schema for SK combinatory logic).  This is enabled by CQL's underlying categorical semantics and can be implemented in QINL style, although there's no need to do so.
>>> 
>>> Henry alluded to the status of blank nodes in RDF, a question answered using the language of category theory in the Ph.D. thesis of Braatz : https://pdfs.semanticscholar.org/b8c8/5a3e7a04020259ec9a58c7e5563033f52844.pdf , presumably in a way equivalent to their intended set-theoretic semantics.  That thesis also contains a variety of constructions on RDF graphs such as "pushouts" that may or may not be known or useful to the RDF community, but whose analogs in other data models are known to be useful for data integration.  So I wanted to take this opportunity to ask around to see if anyone was interested in further investigating categorical constructions for RDF.
>>> 
>>> Ryan
>>> 
>>>> On Aug 29, 2019, at 6:37 AM, Henry Story <henry.story@bblfish.net> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On 26 Aug 2019, at 16:26, Steffen Staab <staab@uni-koblenz.de> wrote:
>>>>> 
>>>>> Dear Henry,
>>>>> 
>>>>> the pointers below seem to be really useful to us.
>>>>> The work on CQL and QINL seems to be very related to our papers
>>>>> 
>>>>> ISWC2019: https://arxiv.org/abs/1907.00855
>>>>> Programming 2019: https://arxiv.org/abs/1902.00545
>>>>> 
>>>>> where we use ontology concepts as well as queries as types in 
>>>>> programming languages.
>>>> 
>>>> That last one is a very interesting article linking Scala
>>>> and SPARQL. I completely agree with the described 
>>>> limitations of banana-rdf. 
>>>> 
>>>> This problem of how RDF and Scala fit together has been
>>>> something that has bugged me for a while. Because of the
>>>> strong presence of Functional Programmers in the Scala
>>>> community I have been lead to look at Category Theory
>>>> to look for an answer.  
>>>> 
>>>> Your work is also very enlightening. I feel we are at the
>>>> cusp of an interesting answer here.
>>>> 
>>>>> 
>>>>> QINL seems to go one step in this direction
>>>>> taking schemata (not so different from ontology concepts / ER Entities) 
>>>>> and extending them with behavior.
>>>>> 
>>>>> Still, I do not quite understand where the two approaches should meet.
>>>>> Any idea?
>>>> 
>>>> That is a very good question. I do get the feeling that by answering
>>>> this question we can make some very good progress. Perhaps
>>>> Ryan Wisnesky, can point to an answer here. I will try, but
>>>> it may take me some time to integrate both sides :-)
>>>> 
>>>> Ryan pointed me to an article from 2001 ”A Model Theory for Generic
>>>> Schema Management” [1] that is actually an application of Institution Theory (IT)
>>>> to Schema management with some very simple Java examples, that make
>>>> IT accessible. The advantage of looking towards IT - the logic of the structure
>>>> of all logics [2] - is that it can help one integrate many different points of views.
>>>> The advantage of moving up the abstraction layer, is that some questions
>>>> that within a domain seem arbitrary - eg the status of blank nodes in 
>>>> RDF - can be answered at the higher level by showing how it ties in
>>>> to many other areas of mathematics and engineering in a structured
>>>> way - eg. blank nodes appear as NULLs in a coherent formalization of 
>>>> database theory. In mathematics one can ground a problem by 
>>>> moving up the abstraction layers, it seems.
>>>> 
>>>> Henry 
>>>> 
>>>> [1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.481.7519&rep=rep1&type=pdf
>>>> [2] https://www.iep.utm.edu/insti-th/
>>>> [3] a page with many links to Categorical Query Language
>>>> https://www.categoricaldata.net/papers 
>>>> 
>>>> PS. Sorry for taking so long to answer. 1. It is taking time to integrate
>>>> all these papers, and 2. I keep having to do Scala programming tests for
>>>> job interviews to prove I can code! 
>>>> If anyone has a need for a Scala dev who understands RDF and
>>>> some CT, please let me know :-)
>>>> 
>>>>> 
>>>>> Cheers
>>>>> Steffen
>>>>> 
>>>>> 
>>>>>> Am 25.08.2019 um 07:19 schrieb Henry Story <henry.story@bblfish.net>:
>>>>>> 
>>>>>> Continuing this thread that started with the funny story on the NULL 
>>>>>> vanity licence plate reported here:
>>>>>> https://mashable.com/article/dmv-vanity-license-plate-def-con-backfire/
>>>>>> 
>>>>>> I just came across work by Ryan Wisnesky on Algebraic Databases, where
>>>>>> the authors formalizes DBs in terms of Category Theory, in order to build provably 
>>>>>> correct ways to transform data.  
>>>>>> 
>>>>>> In that formalization, for which they have software tools, they give an clear 
>>>>>> explanation of NULLs in SQL databases that make each 
>>>>>> NULL different.  In the talk linked to below Ryan Wisnesky actually gives them  
>>>>>> different  subscripts. 
>>>>>> 
>>>>>> In that way nulls  in DBs are very different from nulls in 
>>>>>> Java - which can be compared for equality  and for which there exists only one 
>>>>>> instance -  and very similar to blank nodes on the semantic web.
>>>>>> 
>>>>>> See the presentation ”Algebraic Databases” on his web site
>>>>>>  https://www.wisnesky.net/
>>>>>> Or other content I found on this work
>>>>>>  https://twitter.com/bblfish/status/1165195822625153024
>>>>>> 
>>>>>> Henry Story
>>>>>> 
>>>>>> 
>>>>>>> On 13 Aug 2019, at 15:53, Daniel Hernandez <daniel@degu.cl> wrote:
>>>>>>> 
>>>>>>> SQL nulls are similar in some aspects to Codd nulls. A difference is that SQL nulls do no provide guaranty that the value exists. Blank nodes, on the other hand, are similar to marked nulls. We study the application to SPARQL of SQL techniques to approximate certain answers in: "Certain Answers for SPARQL with Blank Nodes." However, we founded a unique dataset using blank nodes as unknown values (Wikidata). I am curious if you know another.
>>>>>>> 
>>>>>>> On Tue, Aug 13, 2019 at 3:53 AM, Franconi Enrico <franconi@inf.unibz.it> wrote:
>>>>>>>> The situation is slightly more complex than that. 
>>>>>>>> NULL values in standard SQL are exactly defined as letting any equality involving a NULL value fail.
>>>>>>>> Note that the string 'NULL' represents a NULL value, namely if you type the string NULL into a cell of type STRING then it is understood to be a NULL value. 
>>>>>>>> This is where the implementors failed: a NULL value is never equal to itself.
>>>>>>>> This can be understood with the following standard SQL example (try it!).
>>>>>>>> 
>>>>>>>> With the database:
>>>>>>>> 
>>>>>>>> TABLE: col1 | col2
>>>>>>>>    -----+-----
>>>>>>>>      a  |  b
>>>>>>>>      b  | NULL
>>>>>>>> 
>>>>>>>> the query (meant to be the identity query, namely returning the input table itself):
>>>>>>>> 
>>>>>>>> SELECT * FROM TABLE 
>>>>>>>> WHERE TABLE.col1 = TABLE.col1 AND TABLE.col2 = TABLE.col2 ;
>>>>>>>> 
>>>>>>>> gives the result:
>>>>>>>> 
>>>>>>>> col1 | col2
>>>>>>>> -----+-----
>>>>>>>> a  |  b
>>>>>>>> 
>>>>>>>> In SQL, the query above returns the table TABLE if and only if the table TABLE does not have any NULL value, otherwise it returns just the tuples not containing a NULL value, i.e., in this case only the ﬁrst tuple <a,b>. Informally this is due to the fact that a SQL NULL value is never equal (or not equal) to anything, including itself. This is because a SQL NULL value represents the absence of a value.
>>>>>>>> 
>>>>>>>> Note that this is where SQL NULL values are radically different from RDF bnodes. Indeed a bnode is EQUAL to itself but different from any other bnode. This is because a RDF bnode represents the existence of an unknown value.
>>>>>>>> 
>>>>>>>> --e.
>>>>>>>> 
>>>>>>>>> Il giorno 12 ago 2019, alle ore 16:41, Diogo FC Patrao <djogopatrao@gmail.com> ha scritto:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Vanity license plates in USA are strings, right? Then this problem would only happen if NULL='NULL', which is not.
>>>>>>>>> 
>>>>>>>>> It could be that the private company stored 'NULL' instead of NULL to the unassigned tickets, but that's really bad coding/design (and easy to fix, I guess).
>>>>>>>>> 
>>>>>>>>> Or maybe the DAO wrongly translate NULL to 'NULL' at some point.
>>>>>>>>> 
>>>>>>>>> Cheers
>>>>>>>>> 
>>>>>>>>> dfcp
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> diogo patrão
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Aug 12, 2019 at 11:11 AM Young,Jeff (OR) <jyoung@oclc.org> wrote:
>>>>>>>>> Here’s an example showing blank nodes being used to declare the place of birth is unknown in Wikidata:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> https://w.wiki/6$y
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> In the UI, it is rendered like this:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> <image001.png>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Jeff
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> From: Daniel Hernandez <daniel@degu.cl>
>>>>>>>>> Date: Monday, August 12, 2019 at 9:42 AM
>>>>>>>>> To: "semantic-web@w3.org" <semantic-web@w3.org>
>>>>>>>>> Subject: [External] Re: The Joy of NULLs (not)
>>>>>>>>> Resent-From: <semantic-web@w3.org>
>>>>>>>>> Resent-Date: Monday, August 12, 2019 at 9:37 AM
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> As Enrico pointed, blank nodes can be used to represent unknown values.
>>>>>>>>> An example of this use is Wikidata. I don't know another example.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Daniel
>>>>>>>>> 
>>>>>>>>> On Mon, 12 Aug 2019 07:36:41 +0000
>>>>>>>>> Franconi Enrico <franconi@inf.unibz.it> wrote:
>>>>>>>>> 
>>>>>>>>>> Mike, this could easily happen in an RDF world if you register a
>>>>>>>>>> vanity licence plate with anything starting with "_". Indeed, bnodes
>>>>>>>>>> would be the right way to represent unknown but existing plates. --e.
>>>>>>>>>> 
>>>>>>>>>> Il giorno 11 ago 2019, alle ore 23:10, Michael F Uschold
>>>>>>>>>> <uschold@gmail.com<mailto:uschold@gmail.com>> ha scritto:
>>>>>>>>>> 
>>>>>>>>>>> This is hilarious. It could never happen in an RDF world! No value,
>>>>>>>>>>> no triple.
>>>>>>>>>>> 
>>>>>>>>>>> He tried to prank the DMV. Then his vanity license plate backfired
>>>>>>>>>>> big time.
>>>>>>>>>>> https://mashable.com/article/dmv-vanity-license-plate-def-con-backfire/<http://flip.it/NIk7FD>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>
Received on Sunday, 8 September 2019 09:14:49 UTC