Interoperability (Was: what do ontologists want?) from pat hayes on 2001-05-18 (www-rdf-logic@w3.org from May 2001)

From: pat hayes <phayes@ai.uwf.edu>
Date: Fri, 18 May 2001 15:46:04 -0500
To: jos.deroo.jd@belgium.agfa.com
Cc: www-rdf-logic@w3.org
Message-Id: <v04210131b72b34c55da6@[205.160.76.183]>
> > >There's indeed a point here. Yesterday I was doing a testcase
> > >with 200001 concepts used in 100000 statements (no real application,
> > >just stress testing some inference engines). In that particular
> > >testcase I found that the RDF/XML file could be zipped 20 times.
> > >Using RDF/N3 this was just 4 times. So the XML file is 10 MB, the
> > >N3 file is 2 MB and the binary compressed file is 0.5 MB. Needless
> > >to say that this is having an impact on communication, storage and
> > >processing. We found the best balance with N3 [1][2][3][4].
> >
> > Your figures speak for themselves, but I'm not sure of your implication -
> > that N3 should be used in preference to RDF/XML? Wouldn't this be throwing
> > the baby out with the bathwater? Performance and efficiency lie on a
> > continuum, interoperability comes in big discrete chunks - do we 
>really want
> > an extra N converters? When the binary XML brigade on xml-dev have come up
> > with something workable, that perhaps will be worth considering.
>
>Honestly, I don't know the answers to your questions.
>We just gathered some facts (such as sizes, speeds, etc.)
>add for an artificial testcase.
>Of course, you couln't be more right in saying that
>  Performance and efficiency lie on a continuum,
>  interoperability comes in big discrete chunks.

Well, I am not sure this is true. Let me make a case against this 
widespread doctrine. I take it that the claim is based on the idea 
that if N people agree to use a standard interchange format - say, 
XML - then interoperability is done with; but if they do not, then in 
the worst case N(N-1) converters need to be written. This however is 
the worst possible case. In practice, perhaps after some initial 
experimentation, about N converters will need to be written, because 
the participants will evolve a protocol of their own and write 
translators or converters into and out of it. Which is exactly what 
happens when they decide to use XML, in fact. So the big 
interoperability advantage of using a 'standard' format is that it 
avoids that initial period of negotiation and expermentation during 
which the interchange format is designed. But in fact it does not 
even do this, unless the needs of the participants have been exactly 
anticipated by the designers of the format. XML itself is just a 
notation for encoding labelled directed acyclic graph structures as 
sequences of character codes with a rather low information density. 
If I send you labelled graphs that you cannot interpret, the fact 
that they are encoded in XML is not much of an advantage over having 
them encoded in, say, reverse Polish in ASCII. So the community of 
users must somehow design the format to its own needs, as many 
communities are of course doing within XML. But now, take one of 
these: say, Rules-XML. Suppose they had chosen some other basic 
notational convention: suppose they were in fact doing Rules-ABC 
instead of Rules-XML; in what way would they be worse off? What does 
using XML buy one, apart from the reassuring sense that one is being 
up-to-date? (Of course it gets one XML-syntax-checkability and so on, 
but this is rather like putting legs on cars so that they can wear 
shoes.)

As another amusing anecdote, I gather that Mike Genesereth, the 
original author of KIF, is trying to oversee the design of an 
XML-ised version of KIF. The trouble is, there are at least three 
different ways to render KIF into XML, and each has its own 
proponents, and each group has formed its own committee. My own 
advice to Mike would be, to hell with XML: stick to S-expressions, 
and let everyone write their own converters into an XML format, and 
then they can write N converters between them.

Pat Hayes

---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Friday, 18 May 2001 16:46:05 UTC