RE: Myths of the Semantic Web - Popular Misconceptions for Why it Won't Work from Jeff Pollock on 2006-11-08 (public-sweo-ig@w3.org from November 2006)

From: Jeff Pollock <jeff.pollock@oracle.com>
Date: Wed, 8 Nov 2006 10:05:40 -0800
To: "'Steve Harris'" <steve.harris@garlik.com>
Cc: <public-sweo-ig@w3.org>
Message-ID: <002001c70360$8109df00$0e42908d@JEFFX60S>
This was hugely helpful, and thanks for taking my prodding in good spirit!

You're reply I believe captures a core attribute of one family of use cases
that can benefit from the SemWeb languages - cases where the data structure
and relations churn frequently.  Further, that your explanation is dead-on
for this class of use case - that it's not that you couldn't do it in XML or
SQL-DB (you can), but that if you did your queries would be much more
brittle (eg: expensive to maintain) and that over time the schema itself
would be nearly useless as a "knowledge representation" due to the fact that
it would be a victim of endless unplanned bolt-on concepts with artificial
"keys" or "trees" in the structure that were only there because SQL-DB or
XML-DB made you do it.

I also agree with you that your approach gives you the option, in the
future, of using more expressive ontology with your already established
triples -- this manner of "selectively imposing order on chaos" is something
that traditional schema languages have no corollary capability.

...one item for follow-up (but maybe not now) is on your suspicion that RDF
DB will scale as well or better than relational.  I would love to hear your
impressions as to why this is the case - especially in light of the
struggles that many RDF DB's are having with moving beyond the 1-2B triple
mark.  I for one, believe that federating RDF-DB will be easier (elimination
of messy cross-db relational algebras for joining clustered tables) but that
in cases where we want to inference (even at a simple RDF/S level), the
index and compute costs of materializing edges is still a barrier to MEGA
warehouse scale (multi terabyte) graph DBs.  But now that I am thinking
about it, your system would only operate on the "told" or "asserted" triples
(and not generate new ones based on subsumption inference) - right?  In that
case, yes, scale should come at roughly the same linear rate as a relational
system. (sorry, thinking out loud there)

Thanks again for your comments, I definitely think there are some gems in
what you've said that we can repurpose for SWEO deliverables in some form
(BTW: I like danny's suggestions for content SWEO content areas)

Best,

-Jeff-
 

-----Original Message-----
From: Steve Harris [mailto:steve.harris@garlik.com] 
Sent: Wednesday, November 08, 2006 9:30 AM
To: jeff.pollock@oracle.com
Cc: public-sweo-ig@w3.org
Subject: Re: Myths of the Semantic Web - Popular Misconceptions for Why it
Won't Work

On 8 Nov 2006, at 15:35, Jeff Pollock wrote:

> <Devil's advocate>

Sure. There was a fair bit of that on my part too.

> a) depends on what we mean by "ontology" (personally, I am fairly  
> liberal
> about it, to mean a model defined in OWL, RDF, RDFS, or a  
> derivative eg:
> SKOS) ...sometimes in casual conversation I will allow for "any  
> formal model
> is an ontology" sort of thinking, which is how many treat the term.  
> What do
> you mean by ontology?

So, in our company we do have an OWL ontology (I forget which  
flavour), but we just treat it as RDF data. However, I believe there  
are domains where even that is unnecessary, and maybe even undesirable.

> b) if you're doing *no* reasoning whatsoever, why not just put your  
> model in
> XML?  There are more tools, and more widespread knowledge of how to  
> use
> XSD's...or better yet, if you have SQL developers already, why not  
> just put
> in relational tables and use an abstract denormalized schema?

We require a certain amount of flexibility in our data storage, we  
acquire new data on a regular basis and the information in it often  
contains things we hadn't even considered, an example would be  
affluence metrics for neighbourhoods. Our ontology does not contain  
the concept of a neighbourhood, or an affluence rating, or any way in  
which the tow might be connected. However we could still express that  
stuff in RDF, and query it without perturbing the existing data, just  
adding
?person :isIn ?neighbourhood
type triples, plus some appropriate statements about the neighbourhood.

My (limited) experience of XML is that it's hard to add stuff while  
maintaining the behaviour of old queries, and the schema becomes  
baroque over time with extensive additions.

My (more extensive) experience of SQL is that you certainly can  
design a schema that is extensible to add unexpected data, but again,  
over time the schema becomes complex, and the queries get  
increasingly impenetrable.

Once you've added a few dozen valuable, yet unexpected data sources  
like that you really get to appreciate RDF's monotonicity, and not  
having to mess with a schema every time.

> c) you say that "advantages we get from representing our data in  
> RDF are
> sufficient to justify the effort without any reasoning" -- but what  
> are
> those advantages?  ...are they really technical, or business,  
> advantages
> that couldn't be had with the proper Relational or XML schema?  Why  
> not?

I don't believe there's anything you can do with RDF and co. that you  
can't do with SQL, it's more a question of whether you would, or  
could be bothered, or whether you'd go mad trying.

We think that using RDF for the representation layer gives us an edge  
in an industry that has very dynamic data needs, but it's not as if I  
can prove that.

More speculatively I have a suspicion that it's easier to scale large  
triple stores than large relational stores, but I don't have any  
proof for that, it's just a hunch. We have two small clusters that  
each hold 2 KBs of around 1.25 billion triples, with a high churn  
rate. It wasn't especially difficult to develop the storage and  
SPARQL query engine. Less than 1 man year of effort, and the  
performance is good, but with room for improvement.

I don't believe that there are a large number of domains that can  
benefit form ontology-less RDF, but I do think there are some. In the  
end, nothing prevents you from retrofitting an ontology to your  
existing data, should you want one. I could see us going down that  
route in the future, once we have sufficient experience of the data  
domain, and if inference would be helpful.

I have always worked with ontologies (OWL or RDFS) in the past, but  
in Garlik we couldn't really see the need.

> </Devil's advocate>
>
> I believe that the area of data security and identity management is  
> a space
> that will greatly benefit from the SW family of languages - so, in all
> seriousness, if you have the time to reply to the above prodding,  
> I'd love
> to hear your thoughts.

Hmm... I seem to have subjected you to a brain-dump, hopefully it's  
of interest.

- Steve

> From: public-sweo-ig-request@w3.org [mailto:public-sweo-ig- 
> request@w3.org]
> On Behalf Of Steve Harris
> Sent: Wednesday, November 08, 2006 7:15 AM
> To: jeff.pollock@oracle.com
> Cc: public-sweo-ig@w3.org
> Subject: Re: Myths of the Semantic Web - Popular Misconceptions for  
> Why it
> Won't Work
>
>
>
> On 8 Nov 2006, at 13:06, Jeff Pollock wrote:
>>
>> There are plenty of "Myths" out there, such as:
>>
>> -	Semantic Web makes you tag everything again
>> -	Semantic Web requires a single global ontology
>
> Perhaps controversial, but I don't believe that all applications on
> the semantic web require ontologies at all. The application my
> company is deploying now has an ontology, but it's only used
> informatively, and we do no reasoning over it. The advantages we get
> from representing our data in RDF are sufficient to justify the
> effort without any reasoning, and its easier for developers with an
> SQL background to grok. I expect there is data in the store which is
> not described by any ontological structures.
>
> - Steve
>
>
Received on Wednesday, 8 November 2006 18:08:03 UTC