Re: parsing URI (references) according to RFC 3986

On 2011-06-23 20:56, Boris Zbarsky wrote:
> On 6/23/11 2:46 PM, Julian Reschke wrote:
>>> There is no question about those aspects of this. The base URI for the
>>> image source in this case is
>>> "http://greenbytes.de/tech/tc/uris/imgother.html".
>>
>> Yes.
>>
>> So please remind me (sorry; has been a long day for me): what is the
>> problem here?
>>
>> Sending an additional request despite the URI spec says "should not"?
>> Why is the request being sent anyway?
>
> The problem is that per 4.4 as I understand it this HTML:
>
> <!DOCTYPE html>
> <base href="http://greenbytes.de/tech/tc/uris/imgother.html">
> <img src="#foo">
>
> located at <http://greenbytes.de/tech/tc/uris/img.html> should treat the
> image load as a load of <http://greenbytes.de/tech/tc/uris/img.html>
> (because this is a same-document URI reference) whereas this is not what
> any browser does. Browsers treat it as a load of
> <http://greenbytes.de/tech/tc/uris/imgother.html>.
>
> Is my understanding of 4.4 incorrect? If so, why?

Yes, it's incorrect.

4.4 says:

"When a URI reference refers to a URI that is, aside from its fragment 
component (if any), identical to the base URI (Section 5.1), that 
reference is called a "same-document" reference."

Note that section 5.1 is referenced for "base URI", and that has in 
<http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.5.1.1>:

"Within certain media types, a base URI for relative references can be 
embedded within the content itself so that it can be readily obtained by 
a parser. This can be useful for descriptive documents, such as tables 
of contents, which may be transmitted to others through protocols other 
than their usual retrieval context (e.g., email or USENET news).

It is beyond the scope of this specification to specify how, for each 
media type, a base URI can be embedded. The appropriate syntax, when 
available, is described by the data format specification associated with 
each media type."

...which is about the HTML <base> case, unless I'm mistaken.


>>> If it's not supposed to mean anything and specs that use URIs are just
>>> supposed to define what happens (a stance I agree with), then why is 4.4
>>> even there?
>>
>> The text was changed from RFC 2396, which just said:
>
> Yes. I'm aware of what RFC 2396 said. The substantive changes from RFC
> 2396 to RFC 3986 are that the bit about empty URIs in contexts expected
> to result in a new request was removed and the definition of a
> same-document reference was changed to be defined in terms of comparison
> of a resolved URI to a base URI instead of examination of the original
> URI reference string.
>
> My example above does not involve either of those changes, so it's
> equally an issue using the RFC 2396 wording.
>
>> The history for this seems to be in
>> <http://labs.apache.org/webarch/uri/rev-2002/issues.html#017-rdf-fragment>.
>>
>
> To be clear, I'm not so much asking why the text is in RFC 3986 or RFC
> 2396. I understand the problem it was attempting to solve. I'm just

You're ahead of me :-)

> saying that:
>
> 1) It's the wrong way to solve the problem.
> 2) The problem is outside the remit what I think a URI RFC should be
> concerned with.
> 3) And most importantly, it's a partial answer to the original "Which
> parts of RFC 3986 are interop issues?" question.

Yes!

> My primary interest in posting here at all was to bring up item 3 above,
> since you _specifically_ asked for a list of such interop issues.
>
> What you decide to do from this point on is up to you, but I feel like
> I'm seriously wasting my time here....

No, it's actually the first useful discussion in a long time.

I *believe* that you are reading 3986 wrong, and that there really is no 
conflict with HTML and with what browsers do.

Best regards, and thanks for actually spending the time to get to the 
bottom of this,

Julian

Received on Thursday, 23 June 2011 19:06:19 UTC