RE: Why RDF was a good choice from Vidhya Gholkar on 2002-06-11 (www-mobile@w3.org from June 2002)

From: Vidhya Gholkar <vidhya.gholkar@argogroup.com>
Date: Tue, 11 Jun 2002 06:24:42 +0100
To: <www-mobile@w3.org>
Message-ID: <ABB39CCA97F2D840BB0FEDDEBD5A4D040A79A6AD@mail-svr1.elstead-ad.elstead.argogroup>
"This has created a nightmare scenario where you have nearly as many
different vocabularies as
devices so processors have to process each device in a different way."

Is this  exaggeration,  to make a point, or is it real  i.e. are there
really 300+ UAProf vocabularies. If so where are they? The WAP forum
certainly did not create them!

On the idea of "choosing vocabularies and not changing them": Are
Handset and browser manufacturers going to stay static just as a
convenience to CC/PP or similar technology? Languages evolve because of
needs and experience  - this also applies to the evolution of devices
and their new abilities.

On 'simple designs' and 'extreme programming':  Migration can be easy
irrespective of whether the design is simple or not. One of the ideas in
extreme programming is not 'simple design' but *right for now* design.
What this means in practice is that you get something working soon -
now, and you don't spend ages and ages designing with little to show for
it. 




Vidhya

-----Original Message-----
From: Butler, Mark [mailto:Mark_Butler@hplb.hpl.hp.com]
Sent: 10 June 2002 10:23
To: 'Sam Lerouge'; 'www-mobile@w3.org'
Subject: RE: Why RDF was a good choice



Hi Sam

The problem here is there is a huge difference between what is available
now
and what people have speculatively said will be available in the future.
In
any kind of engineering, it is always desirable to try to minimize the
number of dependencies on untried and untested technology. Unfortunately
the
W3C does not seem to be following that rule - it wants to use RDF for
everything even though it not finished!

This is frustrating for companies who would like to deploy technology
today.
This is not just limited to CC/PP: for example all W3C working groups
manage
issue lists, but they all do it in different ways, sometimes with
varying
success, often using different pieces of software hand-written by group
chairs. Recently it was suggested that the W3C should investigate the
possibility of supplying standardised tools to all groups. The W3C
decided
that RDF would be an ideal basis for this! I would thought taking an
existing open source piece of software - say Bugzilla - would be a much
quicker, more efficient way of solving the problem. The fact that RDF
still
has no agreed mechanism for datatypes is surely indicative that is not a
finished technology ready for industrial scale deployment?

The other problem is RDF, like a number of computer technologies before
it,
has recently had a great deal of hype. I think this has frustrated
people
(it frustrates me on a daily basis) because instead of honest
discussions on
what RDF can do, what it is good at, what it is not good at, what bits
are
missing, what needs further work etc etc many discussions over emphasise
the
future possibilities and even have a kind of "religous" fervour (e.g.
"you
can't do that it's not in the spirit of RDF"). Furthermore as RDF is
complicated, and keeps changing, arguments often end up centering on
minor
issues to do with the serialisation rather than major ones ("you need to
qualify your attributes with namespaces", "why don't you use
typedNodes",
"you haven't typed your component", "you should use IDs not about's"
etc). 

Now I may be coming across as being very negative about RDF, so I just
want
to say I am not dismissing RDF or the Semantic Web: I think they are
promising technologies. As I see it, about the same time that relational
databases were developed, there were a whole set of important
technologies
developed by the AI community. However, whereas we see relational
databases
widely used today, we don't see these AI techniques everywhere. So if
the
Semantic Web just means these technologies are more widely deployed and
easily available to developers that will be a good thing. However as
anyone
familiar with AI knows, at the time people were tremendously
disappointed
because the technologies did not deliver on the hype that surrounded
them,
even though they did have uses. It would be a shame if the same thing
happened to the Semantic Web. The other thing GOFAI (good old fashioned
AI)
taught us is that the knowledge encoding problem is hard and there are
no
real shortcuts, a fact which is equally relevant to RDF today. 

So really what I am advocating is that more caution is exercised before
building dependencies on unfinished technology. People seem to use the
argument that we should be using RDF, because in the future everything
will
be using RDF, so it's easier to migrate now. Contrary to what other
people
might think, my experience of software engineering is the key point is
to
get the design right, because even if you are not using the most
up-to-date
technology, if the design is right it's much easier to migrate. This is
because a key indicator of good design is simplicity, and if you have
simplicity then migration is always easier. If the design is poor, it
doesn't matter if you are using the most up-to-date technology, with
computers there is a very likelihood that at some point you have to
migrate
technology and this is much harder with a poor design. Recent interest
in
techniques like "Extreme Programming" have highlighted how achieving
simplicity in design helps. 

> The
> advantages come when you start using RDF Schema. Or better, 
> when different
> vocabularies are used, that refer to other vocabularies. These
> "inter-vocabulary relationships" are not known in XML Schema, 
> I believe. The
> previous version of the RDF Schema specification (see
> http://www.w3.org/TR/2000/CR-rdf-schema-20000327/) gives some 
> basic hints on
> how one could use this.

The problem is they were only hints, and as you note they are only in
the
previous version of this spec, not the current one. So really RDFS is at
exactly the same stage as XSD: neither have a standard way of declaring
inter-vocabulary relationships, and any application that requires this
at
the moment has to develop it's own idiosyncratic way of doing this. 

> I think RDF is all about machine-readability, rather than 
> human-readability.

That's true, but that's a good reason why CC/PP should not use RDF.
Experience has shown that CC/PP profiles tend to generated by hand, so
the
fact they are written in the XML serialisation of RDF (which is more
complicated than vanilla XML) creates additional difficulty, along with
the
fact there is no support in RDF for validating data entered by hand as
there
is in XML. 

> The interesting part of RDF is that a software agent that has a basic
> knowledge on some constructs (e.g. the CC/PP model and core 
> vocabulary) can
> learn to use other vocabularies when you feed him a new RDF 
> Schema that
> refers to the vocabularies he already knows.

People have said this (in fact I'm sure I've said it to Carl - sorry
Carl!)
but I don't believe it any more. This stems from my experience with
UAProf.
With UAProf they followed the RDF Schema guidelines, adopting a brand
new
namespace for each vocabulary every time they wanted to add new
attributes.
When they did this, they copied all the existing attributes to this
namespace, but often changed the resolution rules, the components or the
data types of these attributes for good measure. This has created a
nightmare scenario where you have nearly as many different vocabularies
as
devices so processors have to process each device in a different way. It
also means that there are lots of potential problems where you link to
UAProf devices that use slightly different versions of the same
vocabulary
in a chain as there is no agreed way of merging different vocabulary
versions, particularly when attributes have the same meaning but some of
their properties e.g. data type, resolution rule etc have changed. This
problem has never been resolved. 

So I think there is no getting away from the fact that
i) we need to agree on vocabularies
ii) once we've agreed on vocabuaries, we need to think very carefully
before
we change them because changing them creates problems and breaks things
iii) we definitely should not change vocabularies just to add new
attributes; we should create new vocabularies and use the new and old
vocabularies concurrently

These rules are just sensible rules for using namespaces and they are
equally applicable in RDF or XML. 

In fact I think there is a general need to rethink how namespaces are
used
in association with schemas. To give an example of this, the CC/PP WG
has
recently been working on a new version of the CC/PP Structure and
Vocabulary
document in order to move to candidate recommendation. In order to do
this,
the group devices it was necessary to change the namespace of the CC/PP
schema. I was very reluctant to do this as I was conscious that doing
this
would break existing CC/PP processors. However eventually the rest of
the
group prevailed as it was seen as being good W3C practice to do this.
Personally I think if namespaces are to be used in this way they need
two
"axes"(separate data fields): a namespace axis (that identifies what it
is)
and a version axis (which identifies which version it is). However such
changes are clearly beyond the remit of the CC/PP working group. As it
is, I
imagine there are lots of processors that try to solve this with regular
expressions (i.e. if this namespace contains the string "CCPP" in it
process
it as CC/PP) but I don't think this is a satisfactory long term
solution.

> One strong point of RDF Schema is the ability to express of 
> relationships
> between different vocabularies. I am thinking of a useful 
> application: when
> a content provider knows that
> a) "requested_file --mime-type--> image/jpeg", and
> b) "user_agent --accepts--> [text/html, text/plain, 
> image/jpeg, image/gif]",
> then he should be able to deduce the client will be able to 
> process the
> data. In order to do so, he must know the relationship between the
> "mime-type" property, that belongs to a multimedia metadata 
> vocabulary, and
> the "accepts" property, that belongs to some CC/PP 
> vocabulary. 

Yes, but why not use the same term for both the content and the user
agent
as HTTP/1.1 content negotiation does? Instead of trying to find a
solution
to the problem, why not avoid the problem altogether?

> Using RDF
> Schema (or one of the related technologies, such as DAML+OIL) 
> to express
> both vocabularies, and their relationships, would enable the content
> provider to learn new vocabularies and their use.

The thing is I don't think content providers want to learn new
vocabularies.
Content providers are busy. They want very simple vocabularies which
tell
them exactly what they need to know. Also they don't want different
devices
to use different vocabularies. They want one common vocabulary so they
can
use it to support device independence. So whereas we *could* do this, I
think we *should* be trying hard not to do this. However again this is
verging on a "religous" style arguement for some people. 

Of course if we are going to use RDF, then I think that using RDF Schema
for
vocabularies is a very good idea. However currently CC/PP does not
require
this, and without validation (remember this!) there is no guarantee that
even if schemas do exist they will be correct. This means there are
plenty
of barriers to interoperability here if people don't use the technology
correctly, regardless of whether it is RDF or XML. If 
 
> Hope this wasn't too much information in one time.

Well I wrote for much longer than you, so not at all - I hope this
wasn't
too much. I guess we will see by the number of people who unsubscribe
themselves this time :-) Just one thing though - please don't take any
of
these comments personally. Others have definitely expressed the views
you
express before, it's just the implementation experience has been rather
different. 

best regards

Mark H. Butler, PhD
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/
Received on Tuesday, 11 June 2002 01:25:00 UTC