Re: Putting Government Data online from Frank Manola on 2009-06-29 (semantic-web@w3.org from June 2009)

From: Frank Manola <fmanola@acm.org>
Date: Mon, 29 Jun 2009 16:19:05 -0400
To: John F. Sowa <sowa@bestweb.net>
Cc: SW-forum <semantic-web@w3.org>
Message-id: <BD797538-3331-4D6F-BAE0-7F79BAE725BC@acm.org>
John--

What you're proposing here is not at all unreasonable, but I do think  
there are some things that need to be qualified/clarified a bit.

You talk about the Semantic Web "ignoring the fact that every major  
web site is built around a relational database".  I may be wrong, but  
your further comments suggest that what you mean by this is mainly  
that RDF uses triples rather than being based directly on n-tuples.  I  
don't think these are quite the same thing.  It might help if we were  
to distinguish better between the notation used for the *logic*, and  
the notation used to refer to the *data* (instances).

Trying to cram FOL expressions into triples is certainly a mess.  On  
the other hand, in dealing with data instances there's a need to  
support what is sometimes called Codd's "guaranteed access principle",  
which is that every atomic value in a relational database is  
guaranteed to be logically (in the database sense of that word)  
accessible by a combination of table name, primary-key value, and  
column name.  I.e., you need the combination (table name, primary-key  
value) to select the row, and (table name, column name) to select the  
column (note that the table name is needed for disambiguation *within  
a given database* in each case;  on the Web you need to identify the  
database too).  URIs provide various ways of disambiguating these  
names on the Web (e.g., you can have a URI for the table, which  
disambiguates the other components *within that table*), and you may  
prefer using compound names, but RDF simply boils the (table name,  
primary-key value) combination to the subject URI, and the (table  
name, column name) combination to the predicate URI;  i.e., it's a  
very direct way of providing for the guaranteed access principle, and  
this *does not* ignore relational databases.

RDF also reflects an aspect of relational databases that the use of n- 
tuples for logic expressions tends to ignore, namely normalization.   
It's one thing to think of logical expressions having n-tuples of  
arbitrary arity, and another to think of storing and then managing  
billions of instances of those same tuples (e.g., in determining which  
stored values need to be changed when an update occurs).  The same  
normalization principles that (ideally) govern the design of  
relational databases ought to be considered in the Semantic Web.  As  
in relational databases (and as you have suggested) there's no reason  
for forcing the stored representation of the data to be the same as  
the notation used to refer to it (and lots of reasons why making them  
the same is often a *very bad idea*) so there's lots of room for  
maneuver between what is stored and what the user sees.  However, RDF  
at least directly reflects this issue in providing a way of referring  
to an exact value within the Web (although even RDF doesn't, and  
can't, disambiguate references to values when different users use  
different URIs to refer to them), once again *not* ignoring relational  
databases (in fact, reflecting a prime concern in relational database  
design).

Finally, I want to repeat the general theme of my original reply (some  
of which you quoted below):  progress toward an alternative Semantic  
Web isn't going to be made by sniping remarks at people trying to get  
linked data on the Web, or telling people what they did or didn't  
ignore in developing the specs, but rather by working out the details  
of the alternative ideas, *showing people specifically how those ideas  
make it easier to develop a Semantic Web*, and implementing associated  
software.

--Frank

On Jun 25, 2009, at 11:26 PM, John F. Sowa wrote:

> Frank and Azamat,
>
> I have been the most enthusiastic proponent of a truly Semantic Web.
> But along the way, the semantics got lost in an ungodly mess of
> syntax.
>
> FM> The Semantic Web still has a chance given the number of dedicated
> > and smart people working on it.
> >
> > The S*m*ntic W*b has *no* chance as long as those who believe in it
> > don't develop their own specs and software that demonstrate all the
> > purported advantage of doing it that way (whatever it is)
>
> I very strongly agree.  And I wrote a note to ontolog forum that
> explains how to restore the focus on semantics.  (Copy below)
>
> In the process, my proposal cures the incredibly stupid blunder
> that is killing the Semantic Web:  ignoring the fact that every
> major web site is built around a relational database.
>
> I used to call SQL the worst notation for logic ever conceived.
> But I changed my mind after seeing RDF and OWL.
>
> My proposal below solves that problem by integrating SQL, RDF,
> and OWL on a truly equal footing.
>
> I honestly believe that this is the only way to rescue the
> original goals and hopes for the Semantic Web.
>
> John Sowa
> ___________________________________________________________________
>
> The real problem of "bringing semantics" into anything, whether a
> database or the WWW or anything else, is to keep your focus on the
> main goal:  representing meaning.  Everything else is a distraction.
>
> > Is "semantic foreign key" possible to facilitate current relational
> > database step into semantic database? In other words, if we can
> > build RDF or OWL based semantic foreign keys across different tables
> > and databases while providing those innovative foreign keys  
> inference
> > and reasoning ability, it may help to bring the semantics into the
> > current DB.
>
> That is not the problem.  People have been talking about integrating
> semantics with relational databases for over 30 years.  The solution
> was always very clear:  represent the meaning of the data in logic.
>
> The major obstacle was also very clear:  people ignored meaning,
> and devoted most of their efforts to adding more and more special
> "features" to SQL to address one or another low-level syntactic
> notation to support somebody's pet implementation.
>
> The major issues in creating the Semantic Web were also very clear:
> express meaning in logic.  But instead of focusing on the logic,
> they started to address all kinds of special cases, such as using
> triples instead of n-tuples or forcing everything into some kind
> of XML syntax.
>
> If you step back and look at the logic, all the problems disappear:
>
> 1. First order logic hasn't changed in the past 130 years, and
>    the syntax can be defined in half a page.
>
> 2. The mapping of relational databases to and from FOL is obvious.
>
> 3. The mapping of Description Logics to FOL is obvious.
>
> 4. You can develop very clean, very simple mappings of the above
>    three to one another.
>
> 5. The details of XML-based notations or table-based SQL notations
>    are of minor importance.  Those should *never* be allowed to
>    have the slightest influence on #1, #2, and #3 above.
>
> That is all very clean and very simple.  But we still have to deal
> with the problem of current systems such as SQL, RDF, and OWL.
>
> The answer is also simple:  SQL, RDF, and OWL will be declared
> "legacy systems".  In the terminology that IBM used, they will be
> called "functionally stabilized".  That means no new features or
> additions or further changes will be made to them.  They will be
> supported forever, but not as the basis for future development.
>
> All future development will focus on the very simple principles of
> #1, #2, and #3 above and with further purely *logical* extensions,
> not rinky-dink syntactic features of the kind that burden SQL,
> RDF, OWL, and all other horrible syntaxes that have outlived
> their usefulness.
>
> That is the answer.  It's extremely simple, and it provides
> *equal* support for both the current relational DBs and
> the current Semantic Web.  It is a solid and secure foundation
> for the future.
>
Received on Monday, 29 June 2009 20:20:15 UTC