- From: Jeff Pollock <jeff.pollock@oracle.com>
- Date: Wed, 8 Nov 2006 10:05:40 -0800
- To: "'Steve Harris'" <steve.harris@garlik.com>
- Cc: <public-sweo-ig@w3.org>
This was hugely helpful, and thanks for taking my prodding in good spirit! You're reply I believe captures a core attribute of one family of use cases that can benefit from the SemWeb languages - cases where the data structure and relations churn frequently. Further, that your explanation is dead-on for this class of use case - that it's not that you couldn't do it in XML or SQL-DB (you can), but that if you did your queries would be much more brittle (eg: expensive to maintain) and that over time the schema itself would be nearly useless as a "knowledge representation" due to the fact that it would be a victim of endless unplanned bolt-on concepts with artificial "keys" or "trees" in the structure that were only there because SQL-DB or XML-DB made you do it. I also agree with you that your approach gives you the option, in the future, of using more expressive ontology with your already established triples -- this manner of "selectively imposing order on chaos" is something that traditional schema languages have no corollary capability. ...one item for follow-up (but maybe not now) is on your suspicion that RDF DB will scale as well or better than relational. I would love to hear your impressions as to why this is the case - especially in light of the struggles that many RDF DB's are having with moving beyond the 1-2B triple mark. I for one, believe that federating RDF-DB will be easier (elimination of messy cross-db relational algebras for joining clustered tables) but that in cases where we want to inference (even at a simple RDF/S level), the index and compute costs of materializing edges is still a barrier to MEGA warehouse scale (multi terabyte) graph DBs. But now that I am thinking about it, your system would only operate on the "told" or "asserted" triples (and not generate new ones based on subsumption inference) - right? In that case, yes, scale should come at roughly the same linear rate as a relational system. (sorry, thinking out loud there) Thanks again for your comments, I definitely think there are some gems in what you've said that we can repurpose for SWEO deliverables in some form (BTW: I like danny's suggestions for content SWEO content areas) Best, -Jeff- -----Original Message----- From: Steve Harris [mailto:steve.harris@garlik.com] Sent: Wednesday, November 08, 2006 9:30 AM To: jeff.pollock@oracle.com Cc: public-sweo-ig@w3.org Subject: Re: Myths of the Semantic Web - Popular Misconceptions for Why it Won't Work On 8 Nov 2006, at 15:35, Jeff Pollock wrote: > <Devil's advocate> Sure. There was a fair bit of that on my part too. > a) depends on what we mean by "ontology" (personally, I am fairly > liberal > about it, to mean a model defined in OWL, RDF, RDFS, or a > derivative eg: > SKOS) ...sometimes in casual conversation I will allow for "any > formal model > is an ontology" sort of thinking, which is how many treat the term. > What do > you mean by ontology? So, in our company we do have an OWL ontology (I forget which flavour), but we just treat it as RDF data. However, I believe there are domains where even that is unnecessary, and maybe even undesirable. > b) if you're doing *no* reasoning whatsoever, why not just put your > model in > XML? There are more tools, and more widespread knowledge of how to > use > XSD's...or better yet, if you have SQL developers already, why not > just put > in relational tables and use an abstract denormalized schema? We require a certain amount of flexibility in our data storage, we acquire new data on a regular basis and the information in it often contains things we hadn't even considered, an example would be affluence metrics for neighbourhoods. Our ontology does not contain the concept of a neighbourhood, or an affluence rating, or any way in which the tow might be connected. However we could still express that stuff in RDF, and query it without perturbing the existing data, just adding ?person :isIn ?neighbourhood type triples, plus some appropriate statements about the neighbourhood. My (limited) experience of XML is that it's hard to add stuff while maintaining the behaviour of old queries, and the schema becomes baroque over time with extensive additions. My (more extensive) experience of SQL is that you certainly can design a schema that is extensible to add unexpected data, but again, over time the schema becomes complex, and the queries get increasingly impenetrable. Once you've added a few dozen valuable, yet unexpected data sources like that you really get to appreciate RDF's monotonicity, and not having to mess with a schema every time. > c) you say that "advantages we get from representing our data in > RDF are > sufficient to justify the effort without any reasoning" -- but what > are > those advantages? ...are they really technical, or business, > advantages > that couldn't be had with the proper Relational or XML schema? Why > not? I don't believe there's anything you can do with RDF and co. that you can't do with SQL, it's more a question of whether you would, or could be bothered, or whether you'd go mad trying. We think that using RDF for the representation layer gives us an edge in an industry that has very dynamic data needs, but it's not as if I can prove that. More speculatively I have a suspicion that it's easier to scale large triple stores than large relational stores, but I don't have any proof for that, it's just a hunch. We have two small clusters that each hold 2 KBs of around 1.25 billion triples, with a high churn rate. It wasn't especially difficult to develop the storage and SPARQL query engine. Less than 1 man year of effort, and the performance is good, but with room for improvement. I don't believe that there are a large number of domains that can benefit form ontology-less RDF, but I do think there are some. In the end, nothing prevents you from retrofitting an ontology to your existing data, should you want one. I could see us going down that route in the future, once we have sufficient experience of the data domain, and if inference would be helpful. I have always worked with ontologies (OWL or RDFS) in the past, but in Garlik we couldn't really see the need. > </Devil's advocate> > > I believe that the area of data security and identity management is > a space > that will greatly benefit from the SW family of languages - so, in all > seriousness, if you have the time to reply to the above prodding, > I'd love > to hear your thoughts. Hmm... I seem to have subjected you to a brain-dump, hopefully it's of interest. - Steve > From: public-sweo-ig-request@w3.org [mailto:public-sweo-ig- > request@w3.org] > On Behalf Of Steve Harris > Sent: Wednesday, November 08, 2006 7:15 AM > To: jeff.pollock@oracle.com > Cc: public-sweo-ig@w3.org > Subject: Re: Myths of the Semantic Web - Popular Misconceptions for > Why it > Won't Work > > > > On 8 Nov 2006, at 13:06, Jeff Pollock wrote: >> >> There are plenty of "Myths" out there, such as: >> >> - Semantic Web makes you tag everything again >> - Semantic Web requires a single global ontology > > Perhaps controversial, but I don't believe that all applications on > the semantic web require ontologies at all. The application my > company is deploying now has an ontology, but it's only used > informatively, and we do no reasoning over it. The advantages we get > from representing our data in RDF are sufficient to justify the > effort without any reasoning, and its easier for developers with an > SQL background to grok. I expect there is data in the store which is > not described by any ontological structures. > > - Steve > >
Received on Wednesday, 8 November 2006 18:08:03 UTC