Re: Attributes: Warwick Framework from Ron Daniel Jr. on 1997-02-14 (w3c-dist-auth@w3.org from January to March 1997)

From: Ron Daniel Jr. <rdaniel@lanl.gov>
Date: Fri, 14 Feb 1997 10:45:29 -0700
To: Jim Whitehead <ejw@ics.uci.edu>, w3c-dist-auth@w3.org
Message-Id: <3.0.32.19970214104524.0095e100@acl.lanl.gov>
At 03:38 PM 2/13/97 -0800, Jim Whitehead wrote:
[and I said]
>>It doesn't talk about why people
>>might want to break metainformation into packages, but if someone
>>really wants me to say why that is (IMHO) vital, press my hot button.
>
>Press.  This issue is very relevant to WEBDAV right now.

OK.  Sorry for the length of the response, but I did say it was a hot
button issue. But let me answer your last concern first since I think
we may have a misunderstanding:

>Packaging also seems to have the drawback of
>requiring a more complex metadata data model than links and resources due
>to the complexity of the package data structure.

Packages *are* resources. A package has a media type, can have a URI,
etc. I think you are the one who cited Tim's page on metadata principles
where he said that "metadata is data". That is exactly what I am
arguing for.

Clients and servers will need
to exchange metadata for different purposes. Sometimes we need to
create or update a bibliographic description of a web page. Sometimes
we need to view the revision history or send commands to the version control
system. Specialized document types will have specialized descriptions.
Over time, user sites will impose new requirements (e.g. provenance
tracking). 
In my considered technical opinion, these different purposes
are better met by having purpose-specific media types defined, and
allowing sets of such purpose-specific "packages" to be sent back and
forth inside MIME multipart containers. (Later in the message I will
provide my reasoning for this statement).

The application/relations stuff is not a package container format. It
is one particular package - one that talks about the relations between
other packages. We can say things like:

  (is-bibliographic-description-of  uri1 uri2)
  (is-digital-signature-for         uri1 uri3)
  (is-revision-history-of           uri1 uri4)

where uri1 identifies the target resource and the other URIs identify
the packages that contain particular forms of metadata.

application/relations gets more complex because it does not restrict
relations to being binary. What the group might do is define a simple
format (perhaps only allowing binary relations) that all WEB-DAV
compliant systems must support, but allow other formats to be used
between consenting parties. Similarly, the group might define one common
set of instructions for checking resources in and out of a version control
system, but allow other formats for that information so that proprietary
capabilities can be exploited.

=============
OK, the preceding was to try and be more clear on what I mean by "packages"
and provide a bit of motivation for why I think they are important. The
rest of this message is to say more on why I think they are important,
and it may help clarify exactly what I mean by packages.


First, I assert that it is impossible in theory, let alone practice, to
define the one true all-encompasing metadata set. Anything that purported
to be comprehensive would be obsolete the instant it was declared
complete, and would be FAR too large to be useable. Then of course there
is the practical difficulty in getting all communities to coordinate
their naming practices.

What happens now is that particular communities who wish to share
their resources define a community specific metadata standard, and
they expend great effort to come up with standards that work well
for the needs of their community. Museums
have a standard, newspaper photo editors and archivists have a standard,
the Geospatial communities have a standard, libraries have a standard, etc.
(Actually, all these communities have several standards, but I digress).

When resources are
created in a community for the use of others in the community, they
are most naturally, AND USEFULLY, described using the community's
existing metadata standard. These standards will typically define a
single syntax, certain required elements, and the order of appearance
of particular elements. All of this is easily handled by allowing the
description to be conveyed as a legal instance of the exchange format
for whatever standard is being followed, and using a media type to let
receiving applications know what they are receiving. This is a "package".

Preventing communities from using their already-established metadata
standards is unacceptable. Trying to define mappings between such
standards and the notion of independent, textual, name:value pairs
is  ...(pause) difficult. :-) Furthemore, its not necessary. Send those
community-specific and purpose-specific "packages" around as typed
MIME entities and let helper applications deal with them.

Third, I think that the "ease of use" some people see in having
attributes as stand-alone elements is illusory. Consider the attribute
"resolution". From dealing with images in web pages, we might agree that
"resolution" means the width x height of an image in pixels.
Satellite observation folks think resolution is the size of the smallest
resolvable feature, measured in meters (or fractions thereof).
Legislative folks think that
a "resolution" attribute should carry an identifier for a particular
legislative motion. Mediators think that "resolution" is a big text
field containing a summary of the outcome of a conflict. etc. These
interpretations don't even share a data type, never mind a meaning.

The only way I know to disambiguate such uses of "resolution" is
to augment the attribute name with the name of the schema from which
it is drawn and which defines its meaning. You can either do that
in-line for every element, or you can do it once for all the elements
that are drawn from the same schema. The latter offers a considerable
savings in terms of redundant information - especially once you start
allowing attribute names and values to have language and characterset
specifications. But doing it once typically requires grouping the
elements into clumps that are 1/2 back to packages.

Fourth, if I don't know the schema for an attribute, all I can reasonably
do with it is throw
it out. If you put them into packages with media types, its easy to
tell if you know the format or not and you can drop it on the floor
if you dont. You can put different formats with the same general
purpose into multipart/alternative. (e.g. Dublin Core, SOIF, USMARC)


>As near as I can tell, by packaging you sacrifice some ability to perform
>content negotiation (e.g., the case where the natural language used for the
>value of attributes varies within a schema -- if a book has a title which
>is available in Russian/Cyrillic and Russian/Roman, and my browser can only
>display roman character sets, while the rest of the information on the book
>is in English).

First, content negotiation can happen at the package level. A library
catalog system could emit its records as MARC, or could use that
info to generate a Dublin Core or SOIF or GILS or ... description,
whatever the client has asked for. Or I could return them all to
the client in a multipart/alternative wrapper and let the client
figure out what to do.

Second, the particular example you cite can be handled if the package
format is defined in a capable enough manner. We might
have something like
 --#####
  Content-type: application/foo; charset=ISO10646

  <title lang="Russian/Cryillic">.....</>
  <title lang="Russian/Roman">...</>
  etc.
 --####

Not all packages will be defined in such a way that they can support this,
but I contend that they don't all need to. Some packages will carry very
simple information. Let them stay simple. Use multipart/alternative to
allow simple and complex metadata packages for the same general purpose
to be carried around.

>However, packaging does have some accounting advantages
>(it's easier to remove a package of metadata when the resource it describes
>has been removed rather than a slew of individual metadata resources) and
>packaged metadata may be easier to search by attribute value than
>individual resource metadata.


What I percieve as the major advantages for packages are:
  You can use existing metadata formats as-is. This is likely to
     turn up in a number of places in WEB-DAV. For example, I might
     want to show the revision history of a document. That can be
     sent as application/cvs-revison-report or some such thing.
  You can provide a simple, common metadata package to allow
     some level of interoperation between communities. (Something
     like the Dublin core or SOIF)
  Specialized communities can use their already-standardized metadata
     formats. This is very important. *very*
  If the user needs to edit the metadata package, media types and
     helper applications provide the means for doing it in a manner
     that is eminently extensible.
  

Ron Daniel Jr.              voice:+1 505 665 0597
Advanced Computing Lab        fax:+1 505 665 4939
MS B287                     email:rdaniel@lanl.gov
Los Alamos National Lab      http://www.acl.lanl.gov/~rdaniel
Los Alamos, NM, USA, 87545
Received on Friday, 14 February 1997 12:46:07 UTC