Re: Structured Resources

At 5:54 PM 3/17/97, Yaron Goland wrote:
>Documents have structure and it would seem a good thing for DAV to
>expose this structure and make it available for manipulation. As such I
>propose a new Method, STRUCTURE. When executed on a resource this method
>will return a description of the structure of the document.
>
>I recommend that the structure of a resource be expressed as a list of
>URIs, some relative, some not, along with associated meta-data. The
>STRUCTURE method returns this list.

I went back and re-read this proposal.  After reading it, I'm still in need
of clarification.

To start with, the initial justification for this feature is stated as, "it
would be a good thing for DAV to expose {document} structure."  Perhaps it
would be easier to understand this proposal if we knew which requirements
were addressed by it.  By reading this post, and follow-on posts carefully,
it appears that this proposal is intended to address:
  - partial resource updates (partial writes)
  - partial resource locks
  - listing a container

Is this the complete list of requirements (or functionality) addressed by
this proposal?

It is not clear to me that this structure proposal completely addresses
these requirements, nor is it clear that this is the best possible means of
addressing the requirements (what are the tradeoffs we are making by
adopting this approach?).  It is also very unclear how this proposal
actually works (there are a lot of devils in the details).

For example, lets examine the case where a client wants to perform a
partial resource lock of section 1 of a document which contains five major
sections.  The client does a STRUCTURE call and receives back a list of 15
URLs (or maybe only 3 URLs -- the important thing isn't the exact number,
only that there are either more or less URLs returned than sections in the
document).  This might occur if the server doesn't perfectly understand the
structure of the document (this happened to me just the other day -- the
Acrobat Distiller mistook an author name as a major section of a FrameMaker
paper), or if there are ambiguities in how to expose the structure of a
document (e.g., if there are H1 tags in an HTML document, don't expose H3
or higher, but if H3 is the lowest heading, then should they be exposed?).
The general question is: how does the client map the output of a STRUCTURE
method to sections in a document?  Proprietary metadata tags?  This also
applies to partial puts.

To make structure interoperable, you'll need to define descriptive tags for
each URI that is returned, which may vary by media type.  This appears to
lead to a discussion about a Dublin Core like set of tags (here there be
dragons?) for describing the elements of a document's structure.

Judith Slein had an insightful reply to this proposal (as she usually
does), stating that (without any reply to date) useful operations on
structured documents are:

>Insert new content into the document at a certain position in the structure
>(for example, insert 5 new pages after page 10)
>Delete content from the document
>Move content from one location to another in the structure
>Copy content from one location to another in the structure

Is this capability now facilitated because the structure is exposed?  The
structure proposal now makes these questions relevant, whereas with other
proposals these questions were not.  To be truthful, our requirements
document doesn't have any requirements for providing operations to
copy/move sections of a document -- but does this mean we shoduld now
reexamine the requirements and add extra functionality? (I say no).

As for listing a container, structure would work, but it does seem somewhat
convoluted.  When listing a directory-like-object, there are requirements
on what metadata should be returned for each entry (media type, last
modified date, creation date, and potentially owner and entity tag).  While
this metadata might be returned, there is no requirement for what metadata
should be returned.

>One method for adding to the structure of a document is to PUT a new
>resource, where the request-URI has the same base as the structure
>resource. Thus if the structured resource is http://foo then
>http://foo/bar specifies a member of foo's structure.

This appears to have the drawback of the occasional name space collision.


In general, the structure proposal puts the burden of understanding
document structure on the server and then adds an extra burden to the
client to understand the structure the server returns (e.g., a client asks
for STRUCTURE of resource at URL U, which the server returns in the body of
the STRUCTURE response, which the client then needs to interpret and
correlate with its understanding of the document structure.)  However,
wouldn't it just be easier for the client to give the server what it thinks
the name of the structural element should be?  So if a client wants to lock
section 1, it should simply tell the server to lock section 1, rather than
do a STRUCTURE, followed by an interpretation, followed by a lock of the
URI that the client thinks corresponds to section 1.  If the server
understands the structure, it will have enough information to process this
request.  No, I wasn't asleep during the partial resource lock discussion,
but I don't buy that the only solution to the partial resource naming
problem is something like structure.

Similarly for partial writes, a client should simply submit a
media-type-specific description of the change, which the server will apply
to the resource.  An example of this is the VTML proposal, which would be
one (of many) media types which could usefully describe an update to an
HTML file.  The output of Unix diff is another.

As for containers -- all we really need is MKDIR, RMDIR, and BROWSE, plus
container specific semantics for other methods.  Structure just doesn't
seem like a good fit, since it's stretching the notion of the "structure"
of a container, which is really just a flat list or tree, to represent the
elements of a container.

>        I look forward to the group's comments,

These are my comments.  :-)

- Jim

Received on Tuesday, 25 March 1997 21:39:12 UTC