Re: RDF vs RDBMS from Danny Ayers on 2006-08-29 (semantic-web@w3.org from August 2006)

From: Danny Ayers <danny.ayers@gmail.com>
Date: Tue, 29 Aug 2006 12:50:27 +0200
To: semantic_web@googlegroups.com, "Semantic web list" <semantic-web@w3.org>
Message-ID: <1f2ed5cd0608290350m6a39daa3kd1eecc7594251c85@mail.gmail.com>
On 8/29/06, reviswami78@yahoo.com <reviswami78@yahoo.com> wrote:
>
> I'm a newbie to RDF and have been facing a fundamental question as read
> more about RDF. RDF positions itself away from plain XML
> representations of data saying XML suited for representing data with
> containment hierarchies, and where "order" is important, whereas RDF
> has a flatter structure, represents only references among different
> entities. That sounds just like what a relational database is supposed
> to do, and those are critieria when deciding whether to used an XML DB
> or a relational DB to store your data.

I'll dodge the XML question, save to say that RDF and XML can be seen
as fulfilling different roles, RDF providing a data model and XML
providing a syntax for serialising data. I'll have a crack at this
though:

> Where does RDF fit in, and how does it compare to relational databases.
> I keep hearing that databases are not good for "semi-structured" data,
> but am not yet able to understand how RDF addresses that. Mozilla for
> example uses RDF for very structured (table of content) data.

I think it can be useful to think of RDF as *a* relational model, just
not the same one that SQL DBs are based on (Codd's). Individual
statements in RDF are expressed as subject, predicate, object triples.
Sets of these with a common predicate can be mapped to binary
relations in the relational model, in the the common parlance,
2-column tables. e.g.

==== foaf:name ====
---subject------object--
_:personA   |   "John"
_:personB   |   "Jane"
_:personC   |   "Fred"
...

Here the subjects are bnodes, which can be viewed as ID fields/keys in
the local store.

Going back to your suggestion that RDF is flatter than XML, well yes,
when viewed as a set of triples it is. But the subject of one triple
can (and often is) the object of another, and vice versa:

==== foaf:knows ====
---subject------object--
_:personA   |   _:personB
_:personB   |   _:personC
_:personC   |  _:personA
...

So another view of a set of statements is the node (subject/object) &
directed arc (predicate) graph. There's a loop in this example. In
this sense RDF is actually less flat than XML, which (without
assistance) just has a hierarchical tree structure.

In the directed graph structure there's an obvious analogy there to
the interlinked structure of the Web.  But almost certainly the most
important point of RDF in regards to the Web is that the subject,
predicate and object can be resources in the Web sense, things
identified with URIs. This means that they can act as ID fields/keys
not just in the local store but anywhere they appear. In other words,
through relational glasses the (Semantic) Web as a whole can be
considered a single database. In this view an individual RDF store or
file is just a cache of a little bit of the data in the Semantic Web.
The graph view of RDF is more than just an analogy to the interlinked
structure of the Web, it's an extension of it.

In the relational model, a row in a table is actually an assertion
that the relation is true for the values in the row. A SELECT query is
a filter on the assertions that are true for the given conditions. A
RDBMS will maintain logically consistency across all the data it
contains. In these (and other ways) a relational DB is a reasoning
engine. But  another significant difference between relational DBs and
RDF is that in the former, for a certain set of values a relation is
either considered either true (there is a corresponding row in the
table) or false (there isn't). In the RDF model in the general case,
if a set of values isn't in the "row" (i.e. you don't have a
particular statement), then it's not false, just unknown. (This is the
open world assumption, check "Missing isn't broken",
http://rdfweb.org/mt/foaflog/archives/000047.html). In practice, when
querying either programmatically or with SPARQL, you will only be
looking at a certain set of data, so this is treated as the universe
(the whole graph) and hence closed.

Where things start to get really interesting is that the predicates
can appear in "tables" too:

====  rdf:type ====
---subject------object--
 foaf:name   |   rdf:Property
...

At this point it may be easier to stop thinking in terms of the
relational model, the object-oriented model - the inheritance bits at
least - is probably closer conceptually (though still very different).

> What would be points of comparison where RDF is better suited to store
> and query my data?

I'll leave that part for someone else ;-)

Spot on questions, hopefully some day the answers will find their way
into the FAQs here:
http://esw.w3.org/topic

(Somewhere around there you should also find material on mapping
between RDBMSs and RDF, the stuff above is just one way).

Cheers,
Danny.

-- 

http://dannyayers.com
Received on Tuesday, 29 August 2006 10:50:40 UTC