Re: Byte ranges -- formal spec proposal from Ronald E. Daniel on 1995-05-17 (uri@w3.org from May 1995)

From: Ronald E. Daniel <rdaniel@acl.lanl.gov>
Date: Wed, 17 May 1995 16:59:38 -0600
To: http-wg@cuckoo.hpl.hp.com, luotonen@netscape.com, www-talk@w3.org
Cc: uri@bunyip.com
Message-Id: <199505172259.QAA24514@idaknow.acl.lanl.gov>
Thus spoke Ari Luotonen <luotonen@netscape.com> (at least on Wed, 17 May 1995)

>      _________________________________________________________________
>    
>                         BYTE RANGES WITH URLS AND HTTP


We have been putting off the problem of fragment identifiers, and this
is a good start on the problem. I have a few reflex objections about
details - such as preferring 0-based addressing to 1-based - but they
are very minor. My major objection is that I would like to see
byterange addressing as one component in a more general fragment
identification architecture. The "Miscellaneous" section, quoted below,
mentions the possibility of combining different addressing schemes, but
does not provide any specification. I would be a LOT happier if we
could have an overall scheme that byterange, paragraph, row/col, word,
stanza, and other addressing schemes could fit into.

For example, I might want queries such as:

 Get the value of the <title> element in an HTML file
 http://host/path;generic-id="title"

 Get bytes 1-5 of the second paragraph of a file
 http://host/path;para=2&byterange=1-5

 Get a portion of a JPEG
 http://host/path.jpg;rows=37-99&cols=53-200


When we start looking at the addressing needs of a variety of
specifiers (rows/cols, paragraphs, ...) then we may find that
we would prefer different choices of index base, inclusion or
exclusion of the elements at the extremes of the range, etc.



> Miscellaneous
> 
>    There are other kinds of ranges that can be addressed in a similar
>    fashion; this document does not define them, but both the URL
>    parameter and the Range: header are defined so that it is possible to
>    extend them. This byte range specification applies to any
>    content-type. There may be range schemes that are meaningful to only
>    certain types of documents.
>    
>    As an example, there might be a linerange URL parameter, with the same
>    kind of range specification, and the Range: header would then specify
>    the numbers in lines. Example:
> 
>         http://host/dir/foo;linerange=21-30
> 
>    The response from a 123 line file would be:
> 
>         Range: lines 21-30/123
> 
>    This could be useful for such things as structured text files like
>    address lists or digests of mail and news, but isn't meaningful to
>    such document types as GIF or PDF.
>    
>    Other examples might be document format specific ranges, such as
>    chapters:
> 
>         http://host/dir/foo;chapterrange=1-3
> 
>         Range: chapters 1-3/12
> 
>    Or just the first chapter:
> 
>         http://host/dir/foo;chapterrange=1
> 
>         Range: chapters 1/12
> 
>   MULTIPLE URL PARAMETERS
>   
>    If at some point there will be multiple simultaneous URL parameters,
>    they should be separated by the ampersand character (just like
>    multiple values are encoded in the FORM request).


We need to define more than just the syntax of how multiple
parameters will be seperated. We need to define the semantics
of foo=n1-n2&bar=n3-n4. Does the "bar" parameter apply to the
result of the "foo" parameter? Vice versa? Or do we return the
two selections seperately the way you specify with foo=n1-n2,n3-n4  ?

How are errors to be handled when we specify a range that is
longer than the file? What about when the starting offset of
the range is greater than the length of the file? 

Byteranges are pretty nice since they are broadly applicable,
but I am not sure what it means to ask for a byterange of
a database. This problem is even more acute when we get into
parameters such as "paragraph", "row/col", "stanza", etc.
How are we to indicate when a parameter is inappropriate for
a URL, such as paragraph for an image? Usually row/col
will be inappropriate for HTML files, but if we have previously
selected a table then it is the natural way to get a table
element. How do we do that?

If we do not develop a uniform architecture for fragment
identification, we are going to have a slew of partial solutions before
we wise up and develop a uniform treatment. Then everyone will be
pissed because of differing addressing conventions, code bloat, etc.
and a total inability to make the uniform scheme match the previous
partial solutions.

My understanding is that HyTime can handle this uniform
fragment identification. Can people knowledgeable about HyTime
talk about the good *and bad* points of using HyTime addressing
for URI fragment identification? Is there a way we can
start small, with just byterange selection, then grow our
capabilities?



Ron Daniel Jr.                email: rdaniel@acl.lanl.gov
Advanced Computing Lab        voice: (505) 665-0597
MS B-287  TA-3  Bldg. 2011      fax: (505) 665-4939
Los Alamos National Lab        http://www.acl.lanl.gov/~rdaniel/
Los Alamos, NM,  87545    tautology: "Conformity is very popular"
Received on Wednesday, 17 May 1995 19:02:14 UTC