proposed TAG issues: uniform resource version info and access of resource metadata

Dear all,

During the AC meeting, before wrapping things up, Steve Bratt called for
discussion concerning the new things W3C should be doing.

My comments are general but to some extent related to the document
"Architecture of the World Wide Web W3C Working Draft 15 November 2002";
its parts 2.2.4. "Consistent representations and persistence" and 2.5.
"Some generalities about URIs".

To outline the following text, I'm actually suggesting (asking comments
for) two rather practical things:

   #1. There should be a uniform way to declare version history of web
   resources (recommended by W3C)?, and more importantly

   #2. There should be a "clean", uniform way to refer to (and thus
   access) the metadata of web resources?

These ideas are more elaborately written out in the following two
sections:

1. Managing versions of resources
=================================
To me, one of the things the Web architecture is missing, is a recommended
(formal) notion for resources changing in time. Perhaps evolvability of
content should also be recognised by the architecture?

The Web is a changing entity rather than a network of fixed resources
(that is, essentially the same once created). What W3C might want to
consider, is a recommendation for a some kind of minimal versioning
mechanism of Web resources as an optional primitive.

My point is that this mechanism should define a uniform way of declaring
changes of resources (i.e. things addressable using URIs). To me this
seems a rather fundamental building block of the Web that should be
introduced as a special case of  a uniform way for declaring resource meta
data in general (i.e. a task for RDF).

To my opinion, the optional version declaration (of the master resource)
should at least manifest for each version:
- the date (and time) when the resource was created,
- when a new version was created, AND
- did the resource undergo "substantial changes" between the last two
versions (i.e. should the contents be "reconsidered"),
- whether the master resource (conditioned to versioning) is supposed to
be permanent or not, and is the essential content fixed or not (or perhaps
even broader type or status classification [this is something to be
figured out]), and
- other metadata.

In addition (for symmetry), I think there should be a recommended URI
naming convention for versioning (derived from an appropriate XML Schema
data type). For instance, when presented a resource

	[1] http://www.w3.org/TR/1998/REC-xml-19980210

an agent could hypothesise that there is an "up-to-date" resource

	[2] http://www.w3.org/TR/1998/REC-xml

and mechanically check the metadata of that resource ([2]):

	[2m] http://www.w3.org/TR/1998/REC-xml/meta.rdf

to verify if this hypothesis was correct ("[1] is indeed a
instance-in-time of [2] or at least that's what the author says in [2m]").
In addition, the user agent might again confirm from the metadata [2m]
that the resource

	[3] http://www.w3.org/TR/2000/REX-xml-20000814

is indeed newer than [1] but no "substantial changes" took place between
the two versions. (I'm not proposing actual syntax but simply
illuminating the idea.)

Clearly, the above ideas are already to a large extent present in the
common practice. Since the most valuable thing of the Web is its content,
an official version mechanism would make a steer things into a common good
direction (deprecated data, digital signatures of different versions and
diffs, intellectual property rights, etc.).

The idea of not deleting anything from the Web while preserving digital
history isn't a good one unless there's a formal way of, e.g. saying "this
content is outdated -- newer version is here". (What comes to terminology,
this series of resource version would define the "webline" of a resource;
an analogoue from physics, i.e. "worldline" in spacetime ;).

To me, the resource version metadata hints that there is a need for a
more general architectural entity for declaring metadata (or a uniform
naming convention to find resource-specific metadata).

The above works nicely with the current Web (if appropriate formal naming
convention is adopted [what?] and) if you where to look for the
metadata--but in general you don't know. A simple solution to this idea I
describe next:

2. Uniform mechanism to refer to the meta data associated with a resource
=========================================================================
A natural(?) way for doing this might be a some sort of standard index.rdf
for resources, analogous to index.html in the context of Web sites. I'm
using the analogue to make a point here -- not proposing a fixed syntax.

The basic idea is simply an easy access to the metadata of an resource.
To me, a new universal naming rule for "Meta URIs" would seem like a good
idea. This "Meta URI" would return metadata desrciptions related to the
object URI -- for instance, resource version history (see above),
DC/DAML+OIL statements, type-sensitive pointers to external WSDL
descriptions (or similar), RSS, RDDL, etc.

A good way for syntactically doing this would be trough RDF statements.
All this is obviously related to the ideas of SW/WS anyway (and would seem
to lessen the need for UDDI-kind of centralised mechanisms--perhaps ideas
from LDAP might be consulted here?).

The actual recommendation might be as short as:

--------------------------------snip-
For a resource referred by

	[4] http://domain/path/filename

the RDF description (declared by the resource) will be found at

	[5] http://domain/path/filename/meta.rdf

((or equivalently something like [5'] http://domain/path/meta/filename --
this enables grouping of metadata documents with the current HTTP
servers))

where the meta.rdf is the RDF-form metadata manifested by [4].

Note: This is a general-purpose mechanism; [5] might not actually contain
any sentences subjected to [4].
--------------------------------snip-

Notes:
#1 I'm actually arguing for retrieving a clean RDF/XML file for resource
descriptions (meta data).
#2 For URIs with other URI schemes it might be a bit more complicated
(perhaps URL derived from the "domain names" etc.).
#3 The definition is recursive (to enable meta [meta...] data).
#4 I'm really proposing only a _naming_convention_ to simple access of RDF
data (SW-aware HTTP server [whatever] might "actually" dig this data out
from the documents by itself).
#5 With URL's this would work with the existing HTTP servers using RDF
files.
#6 By doing this, W3C would actually propose a practical mechanism for a
concrete basement for a multitude of potential SW applications.
#7 This is something small developers could do with notepads (or due
decentralisation, let their webmasters to do for themselves)
#8 Yes; I'm aware that the above example recommendation is not sufficient
as such and doesn't even take all URL's properly into account (if this
idea gets support, I'll propose a more concrete one ;).

The actual syntax of all this is again of course besides my point.
Personally I'd prefer a simple RDF file (XML serialisation) including
arbitrary RDF statements (and that W3C would recommend few predicates for
the version meta data etc.). In practice, however, I believe that
semantics for declaring "external subsets" (URIs) of the meta.rdf is
needed so that the metadata can be maintained and digitally signed easily
in cascading parts (and that access points for, e.g., organisation-wide
meta data can be provided).

Once again, there's little new: RDF statements can in principle be
embedded in almost arbitrary XML data (and parsed likewise) but I believe
the above would simplify things considerably in practice (_the_ place to
look descriptions from) and thus support the application of metadata in
general.

I'd love to see W3C to provide a RDF vocabulary (fixed predicate
semantics) repository for making statements of various kinds and thus
eliminating the need for defining a number of markup languages but that's
another story.

Thanks,

--Ossi



Ossi Nykänen                              Tel   +358 3 3115 3544
Tampere University of Technology          Fax   +358 3 3115 3549
DMI / W3C Finnish Office                  Email ossi@w3.org
P.O.Box 553, FIN-33101 Tampere, Finland   Web   www.w3c.tut.fi

Received on Monday, 25 November 2002 09:21:30 UTC