RE: Structured Resources from Yaron Goland on 1997-03-26 (w3c-dist-auth@w3.org from January to March 1997)

From: Yaron Goland <yarong@microsoft.com>
Date: Wed, 26 Mar 1997 12:16:04 -0800
To: "'Jim Whitehead'" <ejw@ics.uci.edu>, w3c-dist-auth@w3.org
Message-ID: <11352BDEEB92CF119F3F00805F14F485026B72D8@RED-44-MSG.dns.microsoft.com>
The goal in proposing structure is very simple - Resources have
structure and I felt it would be a good thing to provide a standard
mechanism for gaining access to that structure. Especially as in so
doing we are able to meet many of the DAV requirements.

My goal for proposing structure within DAV is to provide the framework
that others can later come and build upon. For example, we define a LINK
and UNLINK mechanism but we define few links. Our goal has been to
provide the link framework so that others, such as Dublin core, can come
and build upon it. My goal with structure is the same.

You then point out that there are all sorts of little details in using
structure which you are not sure are adequately addressed. Unfortunately
there is nothing for me to directly respond to as you do not point out
any problems. The best I can do is continue to produce position papers
which demonstrate how structures behave. I will be coming up with an
additional paper describing structure's interaction with the HTTP 1.1
methods, including PUT.

The rest of your post basically addresses the problems inherent in
returning structure. Specifically - How does the server know what the
structure looks like? How does the client know what structure they are
seeing?

In my opinion the answer to both questions is - Standards.

As you point out, it is difficult to determine the structure of an HTML
file. There are often, fairly arbitrary, decisions to be made in how one
would expose such a structure. My own opinion is that the proper
solution to this problem is the introduction of a standard for how one
exposes the structure of an HTML or any other content type. The standard
would define the meta-data format to be used in the structure response
to describe what the structure is. The standard would also provide for
the "proper" way to expose structure, either defining what choices must
be made or providing meta-data to expose to the client what choice the
server made.

I am further proposing that DAV contribute two of these standards. One
standard for history and another standard for basic directories. But,
just as we are depending on groups like Dublin to fill out the meta-data
types, so we will be depending upon groups in the IETF to provide
standards for other content types.

As for the issue of placing burden on the server, that is the server's
choice. The server is not required to expose a structure. However, the
case has continually been made that it is often better to place a burden
on the server than on the wire. For example, a history is not required
for a version tree. One can always just walk the versioning links. Thus
one can argue that we are burdening the server because it has to provide
redundant information. However the bandwidth savings of not having to
climb the versioning tree are substantial enough that servers are
willing to shoulder the burden of providing a history. The same logic
applies to directories. Once can always do an exhaustive search of the
name space. However this is wildly inefficient so the server shoulders
the burden of providing a directory listing. The same logic applies to
large compound documents, such as OLE files. The client can always
download the entire file in order to find the particular istream or
istorage they are interested in. However this is extremely wasteful of
bandwidth so servers are willing to provide a manifest for the file,
even though this is redundant information. In the end, however, the
choice is the server's. It may decide that it does not wish to provide a
structure for an HTML file because it believes it is too much effort. In
that case, the client will need to download the entire file. As such, my
proposal requires nothing of the server that the server does not choose
to provide.

Also, to address the name space collision, that is what the overwrite
header is for. Either way, that sort of collision is inherent to any
structured namespace. Every time you copy a file into a directory you
face the same collision problem.

All in all I believe that STRUCTURE is the proper way to expose
functions such as MKDIR, RMDIR, and BROWSE within an object oriented and
extensible context. I believe that the power of HTTP has been the
abstract nature of its methods which allow themselves to be leveraged in
any number of ways. I believe that STRUCTURE provides us this base.
MKDIR can only be used to create a directory and even then the concept
gets confusing. For example, can I MKDIR on a compound file? STRUCTURE
does not suffer from these problems. By leveraging PUT we are able to
provide the functionality of stating "I am adding a resource to the
namespace." by leveraging DELETE, we are able to provide the
functionality of stating "I am removing a resource from the namespace."
As for BROWSE, I would argue that is the same as STRUCTURE. STRUCTURE
is, IMHO, a cooler name. =)

		Yaron

> -----Original Message-----
> From:	Jim Whitehead [SMTP:ejw@ics.uci.edu]
> Sent:	Tuesday, March 25, 1997 6:23 PM
> To:	Yaron Goland; w3c-dist-auth@w3.org
> Subject:	Re: Structured Resources
> 
> At 5:54 PM 3/17/97, Yaron Goland wrote:
> >Documents have structure and it would seem a good thing for DAV to
> >expose this structure and make it available for manipulation. As such
> I
> >propose a new Method, STRUCTURE. When executed on a resource this
> method
> >will return a description of the structure of the document.
> >
> >I recommend that the structure of a resource be expressed as a list
> of
> >URIs, some relative, some not, along with associated meta-data. The
> >STRUCTURE method returns this list.
> 
> I went back and re-read this proposal.  After reading it, I'm still in
> need
> of clarification.
> 
> To start with, the initial justification for this feature is stated
> as, "it
> would be a good thing for DAV to expose {document} structure."
> Perhaps it
> would be easier to understand this proposal if we knew which
> requirements
> were addressed by it.  By reading this post, and follow-on posts
> carefully,
> it appears that this proposal is intended to address:
>   - partial resource updates (partial writes)
>   - partial resource locks
>   - listing a container
> 
> Is this the complete list of requirements (or functionality) addressed
> by
> this proposal?
> 
> It is not clear to me that this structure proposal completely
> addresses
> these requirements, nor is it clear that this is the best possible
> means of
> addressing the requirements (what are the tradeoffs we are making by
> adopting this approach?).  It is also very unclear how this proposal
> actually works (there are a lot of devils in the details).
> 
> For example, lets examine the case where a client wants to perform a
> partial resource lock of section 1 of a document which contains five
> major
> sections.  The client does a STRUCTURE call and receives back a list
> of 15
> URLs (or maybe only 3 URLs -- the important thing isn't the exact
> number,
> only that there are either more or less URLs returned than sections in
> the
> document).  This might occur if the server doesn't perfectly
> understand the
> structure of the document (this happened to me just the other day --
> the
> Acrobat Distiller mistook an author name as a major section of a
> FrameMaker
> paper), or if there are ambiguities in how to expose the structure of
> a
> document (e.g., if there are H1 tags in an HTML document, don't expose
> H3
> or higher, but if H3 is the lowest heading, then should they be
> exposed?).
> The general question is: how does the client map the output of a
> STRUCTURE
> method to sections in a document?  Proprietary metadata tags?  This
> also
> applies to partial puts.
> 
> To make structure interoperable, you'll need to define descriptive
> tags for
> each URI that is returned, which may vary by media type.  This appears
> to
> lead to a discussion about a Dublin Core like set of tags (here there
> be
> dragons?) for describing the elements of a document's structure.
> 
> Judith Slein had an insightful reply to this proposal (as she usually
> does), stating that (without any reply to date) useful operations on
> structured documents are:
> 
> >Insert new content into the document at a certain position in the
> structure
> >(for example, insert 5 new pages after page 10)
> >Delete content from the document
> >Move content from one location to another in the structure
> >Copy content from one location to another in the structure
> 
> Is this capability now facilitated because the structure is exposed?
> The
> structure proposal now makes these questions relevant, whereas with
> other
> proposals these questions were not.  To be truthful, our requirements
> document doesn't have any requirements for providing operations to
> copy/move sections of a document -- but does this mean we shoduld now
> reexamine the requirements and add extra functionality? (I say no).
> 
> As for listing a container, structure would work, but it does seem
> somewhat
> convoluted.  When listing a directory-like-object, there are
> requirements
> on what metadata should be returned for each entry (media type, last
> modified date, creation date, and potentially owner and entity tag).
> While
> this metadata might be returned, there is no requirement for what
> metadata
> should be returned.
> 
> >One method for adding to the structure of a document is to PUT a new
> >resource, where the request-URI has the same base as the structure
> >resource. Thus if the structured resource is http://foo then
> >http://foo/bar specifies a member of foo's structure.
> 
> This appears to have the drawback of the occasional name space
> collision.
> 
> 
> In general, the structure proposal puts the burden of understanding
> document structure on the server and then adds an extra burden to the
> client to understand the structure the server returns (e.g., a client
> asks
> for STRUCTURE of resource at URL U, which the server returns in the
> body of
> the STRUCTURE response, which the client then needs to interpret and
> correlate with its understanding of the document structure.)  However,
> wouldn't it just be easier for the client to give the server what it
> thinks
> the name of the structural element should be?  So if a client wants to
> lock
> section 1, it should simply tell the server to lock section 1, rather
> than
> do a STRUCTURE, followed by an interpretation, followed by a lock of
> the
> URI that the client thinks corresponds to section 1.  If the server
> understands the structure, it will have enough information to process
> this
> request.  No, I wasn't asleep during the partial resource lock
> discussion,
> but I don't buy that the only solution to the partial resource naming
> problem is something like structure.
> 
> Similarly for partial writes, a client should simply submit a
> media-type-specific description of the change, which the server will
> apply
> to the resource.  An example of this is the VTML proposal, which would
> be
> one (of many) media types which could usefully describe an update to
> an
> HTML file.  The output of Unix diff is another.
> 
> As for containers -- all we really need is MKDIR, RMDIR, and BROWSE,
> plus
> container specific semantics for other methods.  Structure just
> doesn't
> seem like a good fit, since it's stretching the notion of the
> "structure"
> of a container, which is really just a flat list or tree, to represent
> the
> elements of a container.
> 
> >        I look forward to the group's comments,
> 
> These are my comments.  :-)
> 
> - Jim
>
Received on Wednesday, 26 March 1997 15:16:03 UTC