RE: Why RDF was a good choice from Butler, Mark on 2002-06-10 (www-mobile@w3.org from June 2002)

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Mon, 10 Jun 2002 18:23:08 +0100
To: "'Sam Lerouge'" <sam.lerouge@rug.ac.be>, "'www-mobile@w3.org'" <www-mobile@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F045018D1@0-mail-1.hpl.hp.com>
Hi Sam

The problem here is there is a huge difference between what is available now
and what people have speculatively said will be available in the future. In
any kind of engineering, it is always desirable to try to minimize the
number of dependencies on untried and untested technology. Unfortunately the
W3C does not seem to be following that rule - it wants to use RDF for
everything even though it not finished!

This is frustrating for companies who would like to deploy technology today.
This is not just limited to CC/PP: for example all W3C working groups manage
issue lists, but they all do it in different ways, sometimes with varying
success, often using different pieces of software hand-written by group
chairs. Recently it was suggested that the W3C should investigate the
possibility of supplying standardised tools to all groups. The W3C decided
that RDF would be an ideal basis for this! I would thought taking an
existing open source piece of software - say Bugzilla - would be a much
quicker, more efficient way of solving the problem. The fact that RDF still
has no agreed mechanism for datatypes is surely indicative that is not a
finished technology ready for industrial scale deployment?

The other problem is RDF, like a number of computer technologies before it,
has recently had a great deal of hype. I think this has frustrated people
(it frustrates me on a daily basis) because instead of honest discussions on
what RDF can do, what it is good at, what it is not good at, what bits are
missing, what needs further work etc etc many discussions over emphasise the
future possibilities and even have a kind of "religous" fervour (e.g. "you
can't do that it's not in the spirit of RDF"). Furthermore as RDF is
complicated, and keeps changing, arguments often end up centering on minor
issues to do with the serialisation rather than major ones ("you need to
qualify your attributes with namespaces", "why don't you use typedNodes",
"you haven't typed your component", "you should use IDs not about's" etc). 

Now I may be coming across as being very negative about RDF, so I just want
to say I am not dismissing RDF or the Semantic Web: I think they are
promising technologies. As I see it, about the same time that relational
databases were developed, there were a whole set of important technologies
developed by the AI community. However, whereas we see relational databases
widely used today, we don't see these AI techniques everywhere. So if the
Semantic Web just means these technologies are more widely deployed and
easily available to developers that will be a good thing. However as anyone
familiar with AI knows, at the time people were tremendously disappointed
because the technologies did not deliver on the hype that surrounded them,
even though they did have uses. It would be a shame if the same thing
happened to the Semantic Web. The other thing GOFAI (good old fashioned AI)
taught us is that the knowledge encoding problem is hard and there are no
real shortcuts, a fact which is equally relevant to RDF today. 

So really what I am advocating is that more caution is exercised before
building dependencies on unfinished technology. People seem to use the
argument that we should be using RDF, because in the future everything will
be using RDF, so it's easier to migrate now. Contrary to what other people
might think, my experience of software engineering is the key point is to
get the design right, because even if you are not using the most up-to-date
technology, if the design is right it's much easier to migrate. This is
because a key indicator of good design is simplicity, and if you have
simplicity then migration is always easier. If the design is poor, it
doesn't matter if you are using the most up-to-date technology, with
computers there is a very likelihood that at some point you have to migrate
technology and this is much harder with a poor design. Recent interest in
techniques like "Extreme Programming" have highlighted how achieving
simplicity in design helps. 

> The
> advantages come when you start using RDF Schema. Or better, 
> when different
> vocabularies are used, that refer to other vocabularies. These
> "inter-vocabulary relationships" are not known in XML Schema, 
> I believe. The
> previous version of the RDF Schema specification (see
> http://www.w3.org/TR/2000/CR-rdf-schema-20000327/) gives some 
> basic hints on
> how one could use this.

The problem is they were only hints, and as you note they are only in the
previous version of this spec, not the current one. So really RDFS is at
exactly the same stage as XSD: neither have a standard way of declaring
inter-vocabulary relationships, and any application that requires this at
the moment has to develop it's own idiosyncratic way of doing this. 

> I think RDF is all about machine-readability, rather than 
> human-readability.

That's true, but that's a good reason why CC/PP should not use RDF.
Experience has shown that CC/PP profiles tend to generated by hand, so the
fact they are written in the XML serialisation of RDF (which is more
complicated than vanilla XML) creates additional difficulty, along with the
fact there is no support in RDF for validating data entered by hand as there
is in XML. 

> The interesting part of RDF is that a software agent that has a basic
> knowledge on some constructs (e.g. the CC/PP model and core 
> vocabulary) can
> learn to use other vocabularies when you feed him a new RDF 
> Schema that
> refers to the vocabularies he already knows.

People have said this (in fact I'm sure I've said it to Carl - sorry Carl!)
but I don't believe it any more. This stems from my experience with UAProf.
With UAProf they followed the RDF Schema guidelines, adopting a brand new
namespace for each vocabulary every time they wanted to add new attributes.
When they did this, they copied all the existing attributes to this
namespace, but often changed the resolution rules, the components or the
data types of these attributes for good measure. This has created a
nightmare scenario where you have nearly as many different vocabularies as
devices so processors have to process each device in a different way. It
also means that there are lots of potential problems where you link to
UAProf devices that use slightly different versions of the same vocabulary
in a chain as there is no agreed way of merging different vocabulary
versions, particularly when attributes have the same meaning but some of
their properties e.g. data type, resolution rule etc have changed. This
problem has never been resolved. 

So I think there is no getting away from the fact that
i) we need to agree on vocabularies
ii) once we've agreed on vocabuaries, we need to think very carefully before
we change them because changing them creates problems and breaks things
iii) we definitely should not change vocabularies just to add new
attributes; we should create new vocabularies and use the new and old
vocabularies concurrently

These rules are just sensible rules for using namespaces and they are
equally applicable in RDF or XML. 

In fact I think there is a general need to rethink how namespaces are used
in association with schemas. To give an example of this, the CC/PP WG has
recently been working on a new version of the CC/PP Structure and Vocabulary
document in order to move to candidate recommendation. In order to do this,
the group devices it was necessary to change the namespace of the CC/PP
schema. I was very reluctant to do this as I was conscious that doing this
would break existing CC/PP processors. However eventually the rest of the
group prevailed as it was seen as being good W3C practice to do this.
Personally I think if namespaces are to be used in this way they need two
"axes"(separate data fields): a namespace axis (that identifies what it is)
and a version axis (which identifies which version it is). However such
changes are clearly beyond the remit of the CC/PP working group. As it is, I
imagine there are lots of processors that try to solve this with regular
expressions (i.e. if this namespace contains the string "CCPP" in it process
it as CC/PP) but I don't think this is a satisfactory long term solution.

> One strong point of RDF Schema is the ability to express of 
> relationships
> between different vocabularies. I am thinking of a useful 
> application: when
> a content provider knows that
> a) "requested_file --mime-type--> image/jpeg", and
> b) "user_agent --accepts--> [text/html, text/plain, 
> image/jpeg, image/gif]",
> then he should be able to deduce the client will be able to 
> process the
> data. In order to do so, he must know the relationship between the
> "mime-type" property, that belongs to a multimedia metadata 
> vocabulary, and
> the "accepts" property, that belongs to some CC/PP 
> vocabulary. 

Yes, but why not use the same term for both the content and the user agent
as HTTP/1.1 content negotiation does? Instead of trying to find a solution
to the problem, why not avoid the problem altogether?

> Using RDF
> Schema (or one of the related technologies, such as DAML+OIL) 
> to express
> both vocabularies, and their relationships, would enable the content
> provider to learn new vocabularies and their use.

The thing is I don't think content providers want to learn new vocabularies.
Content providers are busy. They want very simple vocabularies which tell
them exactly what they need to know. Also they don't want different devices
to use different vocabularies. They want one common vocabulary so they can
use it to support device independence. So whereas we *could* do this, I
think we *should* be trying hard not to do this. However again this is
verging on a "religous" style arguement for some people. 

Of course if we are going to use RDF, then I think that using RDF Schema for
vocabularies is a very good idea. However currently CC/PP does not require
this, and without validation (remember this!) there is no guarantee that
even if schemas do exist they will be correct. This means there are plenty
of barriers to interoperability here if people don't use the technology
correctly, regardless of whether it is RDF or XML. If 
 
> Hope this wasn't too much information in one time.

Well I wrote for much longer than you, so not at all - I hope this wasn't
too much. I guess we will see by the number of people who unsubscribe
themselves this time :-) Just one thing though - please don't take any of
these comments personally. Others have definitely expressed the views you
express before, it's just the implementation experience has been rather
different. 

best regards

Mark H. Butler, PhD
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Monday, 10 June 2002 13:24:43 UTC