What would be cool... (CONNEG vs. discovery)

I have to admit that I haven't read all the CONNEG documents or lists.
Started to, at one point, as part of my work on XML MIME types, but I
didn't really find that the approach looked much like what I wanted to work
with, and it felt like it was carrying a tremendous load of baggage I
didn't share.

Tim Bray's message on packaging (an area that has interested me for a long
while) and its inadequacies, coupled with its 'blue-sky' call, really has
me thinking, even on a sleepy Sunday morning.

The infrastructure I've always envisioned for packaging has been pretty
simple, involving basic document retrieval from a single source, maybe
augmented by caching of some sort.  It makes the most of what exists,
doesn't require a whole lot of extra work, and gets at least part of the
job done.

It has sizable limitations, however, as will any (I think) point-to-point
mechanism, like SYSTEM identifiers.  Packages can only describe so much,
for particular contexts, and may not be a generalizable enough means of
describing information atomically, especially when people start mixing
vocabularies the way they want to, without concern for what some designer
thought long ago in a place far away.

What I think would be cool, building on some of the ideas Tim mentioned in
his earlier messages, is a distributed system for discovering information
about document structures, using namespaces to help the identification
process.  This doesn't need to make any claims that a namespace is a thing
in itself, or that there is 'one true' description of a namespace lurking
in a schema or an RDF document.  It just provides the refinement needed to
ask intelligent questions about particular document structures.

That information could be generic to a namespace - 'everything MathML can
be processed by X, and uses this schema' - or particular to an element -
'this bit of MathML could be represented this way, whatever the context'.
Applications processing documents only using one namespace might be able to
get away with a single request for information about the root element of a
document, while applications processing complex documents might make
multiple requests for information.

Providing that information will be a complex task, likely more difficult
than returning a canonical schema and/or documentation.  There may be
multiple descriptions, useful in different contexts, provided by different
authorities.  There may be different kinds of information, from stylesheets
(CSS, XSLT, more) to code (Java, ActiveX, plug-ins) to schemas to
documentations.  Different kinds of applications may need different parts
of the information, as editors typically want to know different things than
readers.

Building a distributed system to support these kinds of requests will be
much more difficult than (and doesn't replace) content-negotiation on a
request-by-request basis.  It seems, to me at least, like a much more
powerful approach, one that could effectively distribute software as well
as information, and maybe finally get us out of this
one-type-of-information-requires-one-type-of-application mindset that's
kept us locked in the browser window.

It's sort of like the Semantic Web, I think, and can use a lot of the same
infrastructure, but it avoids the URIs-are-system-identifiers shortcut, as
well as any claims about single sources of authoritative information.  I'll
leave it up to the keepers of the Semantic Web to determine how compatible
or incompatible this proposal is.

Tim Bray asked for blue-sky, and I guess this is it.  It may just be too
early on a Sunday morning, and there may be those wishing I'd gone to
church instead.  If anyone's interested, I'd love to see where such a
system might lead, though it would probably be more appropriate to develop
such thoughts on a different list.

Simon St.Laurent
XML Elements of Style / XML: A Primer, 2nd Ed.
Building XML Applications
Inside XML DTDs: Scientific and Technical
Cookies / Sharing Bandwidth
http://www.simonstl.com

Received on Sunday, 11 June 2000 11:20:44 UTC