Re: "whole reason quads were implemented" from Dan Brickley on 2012-05-10 (public-rdf-wg@w3.org from May 2012)

From: Dan Brickley <danbri@danbri.org>
Date: Thu, 10 May 2012 10:12:59 +0200
To: Sandro Hawke <sandro@w3.org>, Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-ID: <CAFfrAFqUv-iftFOhBkmY2hVLD1bu-_EmeEzwsVh=oqunYxsUJw@mail.gmail.com>
On 10 May 2012 03:04, Sandro Hawke <sandro@w3.org> wrote:
> On Wed, 2012-05-09 at 21:03 +0100, Andy Seaborne wrote:
>>
>> On 09/05/12 20:02, Sandro Hawke wrote:
>> > On Wed, 2012-05-09 at 11:26 -0700, Steve Harris wrote:
>> >>
>> >> Right. The whole reason quads were implemented was to be able to track
>> >> what *triples* appears in what documents (typically found on the web,
>> >> but file: is good too).
>> >
>> > Speak for yourself, please, Steve.   I've seen several implementations
>> > of quads that were used for other purposes and it's quite possible they
>> > predated yours.
>>
>> Which systems?  I'd like to understand the motivations and approaches.
>
> I don't actually think the history is that useful.  I was just clumsily
> trying to get Steve to stop generalizing and consider some of the other
> perspectives or help me understand why he wouldn't/couldn't.
>
> But since you ask, and now I'm thinking about it....
>
> An i-search for "quad" in my work notes turns up a conversation with
> danbri and bwm from March 2001 talking about prolog handling of RDF
> using triple/3 vs triple/4 and implementations by Jan Grant and Stefan
> Decker.   I have vague memories of Stefan's F-Logic variant at the time
> called "Triple", but which used quads.  I think he called the fourth
> position the "model"; in this conversation, danbri calls it
> "context/layer/etc".

FWIW the F-Logic implementation from Stefan that I remember wasn't
quad-based - we had a paper
http://www.w3.org/TandS/QL/QL98/pp/queryservice.html that did some
SKOS-like stuff and some RIF-like stuff, but since all the data was
trusted and reasonably accurate, provenance wasn't a big issue.

Here's his old code,
http://web.archive.org/web/20000815055950/http://www.aifb.uni-karlsruhe.de/~sde/rdf/
 ... I found a .zip at
http://wayback.archive.org/web/20001115000000*/http://www.aifb.uni-karlsruhe.de/~sde/rdf/SiLRI1_1_1.zip
and have taken an extra archival copy, now stashed in FOAF SVN too:
http://svn.foaf-project.org/foaftown/2012/history/SiLRI-1.1.1/Readme.txt
(and no sign of quads there fwiw).

The earliest RDFWeb crawl I have a copy of is
http://svn.foaf-project.org/foaftown/2010/allfactoids/allfactoids.P
from I think 2001, as (annoyingly!) triple/3

Initially the first RDFWeb/FOAF crawler stuffed everything into a
triple store, after parsing with EricP's Perl parser. I remember
exactly when it became clear to me that this wouldn't work: we had
Aaron Swartz's http://www.aaronsw.com/about.xrdf plus a joke RSS feed
from a spoof/fake adult video site. Somehow the graphs got mixed up (a
common URI or IFP smushing), and fictitious, dada-engine-generated
video rental names like "Chunky Truck Drivers Tickled for Cash!" got
mixed up with the profile of this ~13 year old guy, and showed up in
the page we auto-generated about him. Oops. Which made it plenty clear
that we needed a more robust model for remembering who-said-what.
There was also some kind of find-a-path-thru-the-graph thing, from
(again) from Jan Grant, ~July 2000. I can't find the code but
http://lists.foaf-project.org/pipermail/foaf-dev/2000-July/004200.html
shows how it worked (along with genid:foo fake people URIs).

> I think cwm switched from formula objects to quads about that time, but
> it's not immediately obvious to me in the cvs log.  "RDF quotation" is
> first mentioned in Feb 2001 and "quad" is never mentioned.
>
> There were lots of cwm uses cases; many of them show up in the Semantic
> Web tutorial TimBL, DanC, and I gave at WWW 2003 [1].   I don't think I
> started using quads (instead of just multiple in-memory triple stores)
> in my own programming until 2004 or so, when they showed up in swipl.
>
> I'm not claiming any of this is particularly important or useful
> evidence about what works or doesn't work, and most of it doesn't argue
> for standards in this field - for that we have to turn the the use cases
> we've been elucidating in this group, I think.

It's worth writing down the backstory sometimes, anyway. Quads were
hanging in the air pretty much from the start, due to the obvious
annoyingness of RDF'97 subject/predicate/object reification.

The 1997 RDF spec used a prolog-esque notation
http://www.w3.org/TR/WD-rdf-syntax-971002/

"According to the formal definition, the property "author", i.e. the
arc labeled "author" plus its source and target nodes is the triple
(3-tuple):

{author, [http://www.w3.org/People/Lassila], "Ora Lassila"}
where "author" denotes a node used for labeling this arc. This
formulation of the data model lends itself to reification, meaning
that the relation expressed by the arc can be converted into a
concrete node to which we can refer, as follows:

X ---PropName----> author
X ---PropObj-----> [http://www.w3.org/People/Lassila]
X ---PropValue---> "Ora Lassila"
which in fact means that a node X and three new triples are added:

{PropName,  X, author}
{PropObj,   X, [http://www.w3.org/People/Lassila]}
{PropValue, X, "Ora Lassila"}
It is later shown that reification allows us to express modalities
(e.g. beliefs about statements) or simply attach any properties to
other properties."

(btw PropName became rdf:predicate, PropObj became rdf:subject,
PropValue begat rdf:object and also rdf:value)

A lot of what people have been trying to do with quads over the years
was really driven by their liking that original RDF use case while
disliking the original technical approach (reification within
3-tuples).


Bijan Parsia wrote a nice little xml.com article back then too,
http://www.xml.com/lpt/a/821
"RDF Applications with Prolog", By Bijan Parsia July 25, 2001

This walks through Jan Wielemaker's  SWI-Prolog's early RDF support, including:

"In the previous section, when I asserted the terms returned by
load_rdf/2, those terms were of the form
rdf(Subject,Predicate,Object), and that was how they were entered into
the Prolog database. With rdf_db, the rdf/3 predicate is, in fact, a
rule! Each ground level rdf statement is asserted as rdf/4 so that
each statement's "provenance" can be determined (by the fourth
argument). This is, of course, merely an implementation detail. One
could replace rdf_db's use of the builtin Prolog knowledge base with
something else."

See also http://www.swi-prolog.org/pldoc/man?predicate=rdf/4
http://www.swi-prolog.org/pldoc/man?predicate=rdf%2f3

Around this time, there was also Geoff Chapel's work embedding
SWI-Prolog behind the Mozilla APIs,  see
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Apr/0025.html
and http://www-archive.mozilla.org/rdf/doc/inference.html ... this
integrated with quads via living inside the Mozilla layered
datasources API, which puts it in the Mozilla/MCF/Guha contexts family
tree too; see also
http://www.ninebynine.org/RDFNotes/UsingContextsWithRDF.html

Another rdf/3 in Prolog was
http://www.w3.org/1999/11/11-WWWProposal/rdfqdemo.html ... Jan Grant
again; we made this to celebrate the Web's 10th birthday. Both his and
Geoff's work came from my asking in comp.lang.prolog,
https://groups.google.com/group/comp.lang.prolog/browse_thread/thread/bbe965abf762fb5d/6c9ca6edc20ace47?lnk=gst&q=brickley+prolog#6c9ca6edc20ace47
trying to get something prologesque working inside the Mozilla
datasource APIs.

>From https://developer.mozilla.org/en/RDF_in_Mozilla_FAQ ... still a
useful view,

"What is a datasource?

RDF can generally be viewed in two ways: either as a graph with nodes
and arcs, or as a "soup" of logical statements. A datasource is a
subgraph (or collection of statements, depending on your viewpoint)
that are for some reason collected together. Some examples of
datasources that exist today are "browser bookmarks", "browser global
history", "IMAP mail accounts", "NNTP news servers", and "RDF/XML
files".

In Mozilla, datasources can be composed together using the composite
data source. This is like overlaying graphs, or merging collections of
statements ("microtheories") together. Statements about the same RDF
resource can then be intermingled: for example, the "last visit date"
of a particular website comes from the "browser global history"
datasource, and the "shortcut keyword" that you can type to reach that
website comes from the "browser bookmarks" datasource. Both
datasources refer to "website" by URL: this is the "key" that allows
the datasources to be "merged" effectively."


One reason for doing this kind of archaeology is that it shows that
it's pointless trying to rhetorically divide the community into
practically minded, "RDF as data" people versus ivory tower
rule-language-using academics.  We're all just a bunch of hackers
cobbling hopefully-useful things together with the tools we have to
hand. Right from the start anyone doing anything with RDF has grappled
with the 'quads' problem. It's inherent to RDF: when you have a system
that makes it peculiarly easy to mix diverse data together, it becomes
uncommonly important to also have a way of separating that information
again afterwards. The original RDF M+S spec acknowledged and
highlighted this, even while not offering a particularly useful
solution (reification).

I like Bijan's conclusion btw,

"Since the inferences in this transformation were trivial, it doesn't
involve any particularly sophisticated Prolog, and yet this is
precisely the kind of everyday task many of us find ourselves doing
all the time. The Semantic Web, if it's to work out, will be made up
as much of the ordinary and familiar as of the exotic."

cheers,

Dan

> [1] http://www.w3.org/2000/10/swap/doc/
Received on Thursday, 10 May 2012 08:13:29 UTC