Re: Request for feedback on HTTP Location header syntax + semantics, Re: Issues 43 and 185, was: Issue 43 (combining fragments) from Nathan on 2010-03-12 (ietf-http-wg@w3.org from January to March 2010)

From: Nathan <nathan@webr3.org>
Date: Fri, 12 Mar 2010 17:29:34 +0000
To: Jonathan Rees <jar@creativecommons.org>
CC: Julian Reschke <julian.reschke@gmx.de>, ietf-http-wg@w3.org
Message-ID: <4B9A79FE.3090103@webr3.org>
Jonathan Rees wrote:
> On Thu, Mar 11, 2010 at 12:55 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
>> On 11.03.2010 13:31, Jonathan Rees wrote:
>>> (bcc www-tag)
>>>
>>> If you believe in the "identification" / "resource" / "representation"
>>> theory then figuring this out if pretty straightforward.
>> I think that's a theory I believe in :-).
>>
>>> Suppose http://example.com/a redirects to http://example.com/c#d, and
>>> we want to know what resource is identified by http://example.com/a#b.
>>>  In general resource x#y means y as locally defined in x. So
>>> http://example.com/a#b is b as locally defined in
>>> http://example.com/a.  To find what's in http://example.com/a, you
>>> look at the resource http://example.com/c#d.  How a fragid is defined
>>> locally in something depends on the media type registration, and the
>>> only media type of which I'm aware that allows one to define locally a
>>> fragid b to identify something that itself has representations with
>>> the potential for fragid definition is application/rdf+xml (and the
>> Interesting. How so? Pointer? Couldn't see that in
>> <http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-fragID>.
> 
> The RDF media type registration supports the use of URIs with fragids
> as "identifying" anything. You could for example say
> <http://example.com/c#d> owl:sameAs <http://example.com/e>. where
> http://example.com/e has an HTML representation that defines its own
> local identifiers, and so on.  (The URI syntax really ought to support
> scheme:path#frag#frag#frag ... so it goes...)
> 
> This is all theoretical... I'm not going to get on anyone's case for
> not implementating this, only for ruling it out.
> 
>>> other RDF media types). For text/html, for example,
>>> http://example.com/c#d would identify an HTML element, and there's no
>>> fragid namespace defined locally inside an HTML element in which #a
>>> could be defined.
>>>
>>> Usually, when a browser finds that a resource doesn't define the
>>> desired fragid, it just shows you a representation of the resource. (I
>>> vaguely remember some discussion about how the user should be alerted
>>> when this happens, like a mild form of 404, but that's another story.)
>>> To be consistent that is what should happen in this case. That is, it
>>> would throw away the unresolvable #b and just show you
>>> http://example.com/c#d. (unless the representation of
>>> http://example.com/c is RDF that defines d to be a resource that has a
>>> locally defined b.)
>> The good thing is that the end result for text/html is the same: the
>> fragment id from the redirect URL is taken into account, the original fragid
>> being overridden.
>>
>> So, considering that whatever has to happen here depends on the media type
>> of the representation we get from the redirect target -- isn't this
>> something the spec for text/html needs to spell out?
> 
> Only if the recovery strategy is specific to text/html (see below). If
> it's generic across all media types then it's HTTP's job. I'll leave
> the determination up to you.
> 
>> We currently have (in "-latest"):
>>
>> "Note: This specification does not define precedence rules for the case
>> where the original URI, as navigated to by the user agent, and the Location
>> header field value both contain fragment identifiers."
>>
>> How about expanding this to something like:
>>
>> "Note: This specification does not define precedence rules for the case
>> where the original URI, as navigated to be the user agent, and the Location
>> header field value both contain fragment identifiers. In particular, the
>> semantics of fragment identifiers depend on the representation's media
>> type."
>>
>> ...and thus make it the HTML WG's problem (for text/html)? :-)
> 
> I could live with that. Maybe remove the word "precedence" as it is
> distracting. Maybe cite 3986.
> 
> The reason the spec "does not define" it (which I think is debatable)
> stems from HTTP's incomplete embrace of the
> identification/resource/representation theory in the case of
> redirects. For example:
> 
>    "For 3xx responses, the location SHOULD
>    indicate the server's preferred URI for automatic redirection to the
>    resource."
> 
> More I/R/R friendly might be
> 
>    "For 3xx responses, the location SHOULD
>    identify a resource for which a representation should
>    be obtained."
> 
> The language for 302 and 307 talks about the resource "residing at"
> another URI without defining what "residing" means (especially when
> the Location: URI doesn't use the http: scheme). You could exploit the
> I/R/R theory by observing that the Location: URI actually identifies
> another resource (maybe the same one for a 301, I don't know) and in
> the 302/307 case it's "lending" its "corresponding" representations
> (and redirects) to the first resource. Then you don't have to define
> "resides" and you've defined "redirect" quite clearly and abstractly.
> 
> I wouldn't mind someone arguing that in <div id="d"> foo <div id="b">
> bar </div> </div> it would make sense to "go" to the div with id="b"
> when http://example.com/a#b redirects to http://example.com/c#d,
> although in an ideal world this would be licensed by the html (xml,
> etc) media type registrations. Heck, I don't really care going to #b
> when the "b" element is outside the "d" element, when what we're
> talking about is classified as an error recovery case. There is some
> theoretical risk that a user could be confused or tricked somehow by
> this behavior, but that would be true for any error recovery behavior,
> including going to the top of the document. Whether this is in the
> purview of HTTP depends on whether one would want to agree on a
> general recovery principle that's independent of HTML. No advice on
> that matter.
> 

Am I correct in thinking that the scope of a URI in the Location header
value has been matched to be anything that can be the target of a
hyperlink (and thus include fragments); whereas the request-target is
scoped to primary resources only as the locating of media-type specific
secondary resources can only correctly be determined by an application
which supports said media-type(s)? (preferably because the media-type
specification itself has indicated how to handle this).

And would this then indicate that the handling of conflicting fragments
between the originally requested hyperlink/uri and any subsequent
hyperlink/uri(s) would also be in the domain of each media-type's
specification (where media-type supports fragments that is).

The above would then bring me back to my original point and question
(which I didn't communicate properly) as follows:

Is HTTP adding in support for something (fragments) which is media-type
specific to an area of the specification (request/response semantics)
which is supposed to be generic across all media-types; and further
requiring adaptation/refining of specifications by media-types to cater
for http functionality; would this be acceptable if it were FTP or any
other protocol?

I'm unsure how to word this without offending or coming across wrong to
be honest - I'm aware that support for fragments in the Location header
already exists and is used; but should it have been there in the first
place?

Finally, are there any notes to say how fragments should be handled when
the media-type doesn't support them (or is this out of scope)?

Many Regards,

Nathan
Received on Friday, 12 March 2010 17:30:21 UTC