RE: The Standards Manifesto from Joshua Allen on 2002-05-23 (www-rdf-interest@w3.org from May 2002)

From: Joshua Allen <joshuaa@microsoft.com>
Date: Wed, 22 May 2002 22:40:52 -0700
To: "Dan Brickley" <danbri@w3.org>
Cc: <www-talk@w3.org>, <www-rdf-interest@w3.org>, "Aaron Swartz" <me@aaronsw.com>
Message-ID: <4F4182C71C1FDD4BA0937A7EB7B8B4C105534FC8@red-msg-08.redmond.corp.microsoft.com>
> others... there are tools out there. And publication isn't hard. HTTP
> servers are plentiful. There are a number of query languages
implemented.

> Your specific scenarios aren't too far from the Annotea work, and
similar
> efforts over on www-annotations@w3.org. What do you think stands in
the
> way of that work going mom'n'dad mainstream?

The server-centric model is the thing that stops it going mainstream, I
think.  I have thought about the particular example of annotea a lot,
since identical functionality *has* propagated to Mom'n'dad mainstream
in the form of Microsoft's Sharepoint Team Services "discussion server".
It is possible to annotate within any HTML page, Word Document, etc. and
anyone who is pointed to the same "discussion server" as you can see
your annotations, annotate the annotations, and so on.  It is fairly old
technology, shipping for about 2 years.  There are many places that
offer discussion server hosting for cheap ($8/month is one I just saw).

I have used annotea+Amaya as well, and the functionality is pretty
similar.  Sharepoint has much higher adoption, but probably this is
because Mom'n'dad don't use Amaya, and they *do* use IE.  The annotea
plugin for IE is a pretty poor UI compared to the Amaya one, IMO, and
more "experimental" than anything.  So any "normal" user today wanting
to annotate documents and share with others is faced with the prospect
of switching browsers, trying to install an annotea server, and so on.

But like I said, I think it is the server-centric model that limits
adoption of *both* STS and Annotea, I am just pointing out that Annotea
is intimidating to the average user, and that probably explains why
Sharepoint is outpacing Annotea within this limited range of adoption
potential.

Now, when I say "server-centric", what I mean is that you need to be
using the same discussion/annotea server as me if you want to see my
annotations.  This would be like saying that you have to dialup to my
network if you want to read my web pages.  Every annotation server
becomes an island of metadata.  This model only scales so far.

And annotea begs the question, what do I gain by using RDF and URIs
internally?  True, it makes it easy to import/export data between
servers, but no easier than doing the same with Sharepoint, and an
adapter that converts between sharepoint (or annotea) and any
intermediate format is not too hard to write (and most developers would
just use XML to do this, not RDF).  And in either case, interacting with
the annotation server uses some custom protocol that needs to be
implemented.  So as far as the typical user or sysadmin is concerned,
RDF is an implementation detail and doesn't make much real-world
difference to the solution.  Just to be clear, I am not saying that
RDF+URI is *bad*, just that a server-centric model where you have to
build bridges between every server manually *anyway* pretty much
neutralizes the value of RDF and URIs.  The server-centric model is
completely contrary to the semantic web, IMO.

Annotea raises another question.  Adoption of annotea (just like
adoption of sharepoint) fragments the world into isolated and
disconnected silos of annotation information.  We can *claim* that we
are doing something good for openness, because "it is theoretically
possible one day to make all of the annotea servers share information".
But the astute observer will quickly ask, "but isn't that what NNTP has
been doing for years, without any XML whatsoever"?  So we are pushing
annotation servers as a *hypothetical* solution to a problem that has
already been solved.

So to sum it up:
1) RDF is not necessary (although admittedly it is desirable) for
annotations, sharepoint proves that
2) Annotation *Servers* are not necessary or even desirable for
annotations, NNTP proves that

Finally, I'll address a comment that I suspect will come up, which is
"the reason semantic web is having trouble is because jerks like
sharepoint refuse to use RDF."  The only counter I have for that is that
there were plenty of non-HTML hypertext systems when Mosaic was first
created, and those non-cooperative "jerks" didn't have to cooperate for
the WWW to happen.  They competed with WWW as hard as they could, until
they finally realized that they were dealing on an entirely different
level and cooperation was the only sane thing to do.  URIs did not
require convincing anyone in the end.  They were so self-evidently
superior to closed-silo systems that they swept right over the holdouts.

By itself, I do not think RDF provides that, any more than HTML would
have been able to alone create the WWW.  RDF is simply a serialization
format for graphs of assertions.  To my mind, using RDF in an Annotea
server is no different than using HTML in Hypercard.  It works, but most
people will prefer to use Hypercard's proprietary content language if
they can only talk to other hypercards anyway.  Saying that RDF allows
me to interop with other annotea servers is like saying that HTML lets
me cut-and-paste content from Hypercard to Outlook.  True, but again it
is not compelling enough to make it win out over other formats.

> Aggregation is imho the key problem. Most interesting, real world RDF
data
> is full of blank (URI-less) nodes. Most RDF tools don't provide much
by

Yes, and I maintain that is the *only* unique problem that the semantic
web is solving, and the key to it beating out one-off proprietary
solutions.

> way of tools to merge these nodes together, so aggregates of RDF data
can
> be annoyingly fragmented. Fragmentation of data pulls against the
network
> effect, by lessening the value of exposing and harvesting data into
larger

I agree, that is why I am so hard-core about using URIs consistently :-)
It's also the main thing that concerns me about annotea.  Unless we
always take the approach that *publishing* metadata is completely
independent of aggregating and querying, we have little incentive to
make the publishing part be as universal and accessible as possible, and
we have little incentive to enable proper aggregation.  Saying that
*theoretically* we can publish the RDF at an http: address, and
*theoretically* we could aggregate that data with other data we yank
from an annotea server is not enough -- that is simply HTML.  In my
opinion, we make the data fragmentation problem *worse*, not better,
when we deploy systems in which the publishing and querying are
tightly-coupled like that.

Now, if we took annotea, and made it be an NNTP-scraper/query engine,
and modified the browser plugins to just dump into NNTP, we would be
onto something :-)
Received on Thursday, 23 May 2002 02:10:31 UTC