Re: Extending RDFS, property-classes from Alan Ruttenberg on 2009-02-10 (semantic-web@w3.org from February 2009)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Tue, 10 Feb 2009 02:13:51 -0500
To: Richard Newman <rnewman@twinql.com>
Cc: Jiri Prochazka <ojirio@gmail.com>, semantic-web@w3.org
Message-ID: <29af5e2d0902092313i29669973k71a6e1367af03ae7@mail.gmail.com>
On Tue, Feb 10, 2009 at 12:32 AM, Richard Newman <rnewman@twinql.com> wrote:
>> I find your remark very odd, and more demonstrative of a lack of
>> experience than an accurate perception of the state or vision of the
>> Semantic Web. Certainly the lack of customers using OWL becomes a
>> self-fulfilling prophecy when such a point of view is held.
>
> I was merely stating my experience: in my (... 6? Blimey.) years in the SW
> community, I'd say the ratio of organizations I've worked with using RDF for
> storage versus simple reasoning versus OWL reasoning is approximately
> 10:5:1. Granted, my recent work has been on a system that doesn't currently
> offer OWL reasoning (partly because of a lack of demand from customers:
> RDFS++ has been adequate), but we do stay in touch with a wide variety of
> people, including the RACER folks.
>
> That's not to say that OWL *vocabulary* isn't used; after all, why bother
> making up your own sameAs property?

Good. Because what I tried to bring attention to was that OWL was
bringing new expressivity (i.e. new vocabulary and patterns) to the
annotation space, and your response was easily read as dismissive of
this.

> I'm simply saying that folks trying to
> use OWL-DL (and up) reasoners on *real datasets and systems* (as opposed to
> things like my occasional playing around with Pellet) are significantly
> outnumbered by those dumping big datasets into RDF, and tooling for
> large-scale RDF systems is more widely available than tooling for OWL
> systems on the same scale.
>
> The implication of that is that a solution in OWL 2 is not a solution for
> the majority of people: their tooling doesn't support it, or reasoning won't
> scale to their datasets, or they have to interoperate with others who aren't
> using it.

OWL 2 isn't *currently* a solution to all problems, nor would I
recommend it be. However your previous note implied that it wasn't
worth taking OWL into account, or even considering using the
vocabulary and I felt this needed to be corrected. On your experience
with 10:5:1, this doesn't sound too off, but considering the youth of
OWL I think it's significant, and I don't think it can be extrapolated
as being static into the future. I think it's fair to say that Science
Commons, where I work, is building for the future, and we've given
some thought to our choice of OWL.

>> OWL is widely deployed in the area I work on the Semantic Web for
>> science, with our own Neurocommons being a 400M triple store expressed
>> in OWL and many other projects using OWL.
>
> Can I ask what level of reasoning you apply to Neurocommons?

It's a mixture and it is evolving. Some of the reasoning is at the
smaller chunk level, as validation of the conversion to OWL -
inconsistencies at that level are detected and fixed in the conversion
script. In other cases inferences are computed by pellet, saved to a
file, and loaded in to the store. In the store itself, which uses
Virtuso, we focus on propagating subclass and part_of relations
(expressed as restrictions in OWL).

While we don't represent that full OWL reasoning is done at the whole
store level yet, there are nonetheless benefits in using OWL. First,
the expressivity is greater and we can say, within spec, more clearly
what we mean (part_of at the class level is an example). Second, as I
mention, portions can be reasoned over exactly and this is used to
improve quality. Finally, we don't modify our knowledge representation
to suit our technology. I've seen many cases where people optimize
their RDF for query performance. I think this is a loss in the long
run as technology changes over time. I'd rather interact with the OWL
and store developers to run better on a representation that I don't
expect to have to change radically at any point. Because this
representation has a stronger chance of being stable, we feel it is
more likely that we will be able to convince more and more of the
scientific community that an investment in this direction won't be
squandered.

>> It would make no sense for any of these projects to use RDF or even RDFS.
>
> I wasn't saying anything of the sort.
>
> If you scroll back and read what I wrote, I said:
>
> * OWL 2 has annotations of assertions (yay!)

Pardon me, I missed the "yay" in your first note :)

> * I haven't heard of a single customer who is considering using OWL 2

OWL 2 is currently only in last call, and there are only early
implementations. I wouldn't expect the demand to be high yet and we're
only beginning to work on education and outreach. But if you do
anything in e.g. the biomedical space I would expect there to be
upcoming demand.

> * I don't know of any widespread deployments of OWL (the implication being
> "OWL reasoning", not "OWL vocabulary", which I would hope is obvious).
>
> All of those things are true, and I'm not impugning OWL.

Again, glad to hear that, though I have to say that's not how I read
your initial message.

> I would very much like to know about high-scale, high-traffic services being
> backed by OWL reasoning; knowledge of the industry is very interesting to
> me. Terascale reasoning would make some of my areas of interest much more
> straightforward!

The OWL 2 specification, which I encourage you to read and comment on,
includes a number of profiles that are designed for scaling in
different directions.  You might want to review the OWL2-QL and -RL
profiles. QL, in particular, supports implementation on top of
relational databases by translation to SQL. Clark and Parsia have an
open implementation called OWLGRES. RL is being implemented by ORACLE,
and I expect it will be applied to rather large data sets. SHER, from
IBM, applies a different strategy for large ABoxes.

And the technology is relatively young. Even in the last year there
has been significant progress in reasoning algorithms and I expect
such work to continue. So while "terascale reasoning" may not be here
yet, I'd not rule it out. And there is certainly more to the Semantic
Web than this sort of application. There are plenty of applications
that have a greater need for correctness and consistency checking than
such scale - think medicine (not billing for it), engineering, and
law.

>> For one thing, the Semantic Web languages are aimed to be a set that work
>> together and
>> build on each other. OWL will offer the first specified way of doing
>> expressive annotations and it would make no sense to do other than use
>> the facilities it offers, as owl:sameAs and owl:inverseFunctional are
>> used now.
>
> I will certainly investigate it. The reason I said this was something of a
> chicken/egg situation is that I can't see customers porting their *data* to
> OWL 2 without having tools to push it around. A language is useless without
> speakers.

As a provider I would think that part of your role is to be aware of
trends in need, and be a provider of solutions. I don't expect the
scientists I work with to be experts in data curation or knowledge
representation and I regularly push back when they offer solutions
they think will work, but which have undesirable properties.  I
consider my role to help solve their problems, not hope that they
figure it out for me. I would be surprised if you didn't have
customers with needs for annotations, and I expect that some use of
OWL 2 (most likely vocabulary in this case) would be well advised. I'm
glad that you will have a closer look at what we're working on in OWL
and hope to hear back from you with constructive comments, which I'd
appreciate if you could send to public-owl-comments@w3.org.

Regards,
Alan
Received on Tuesday, 10 February 2009 07:14:27 UTC