Re: Proposed Charter and Agenda for IRI BOF at IETF 76 from Maciej Stachowiak on 2009-09-27 (public-iri@w3.org from September 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sat, 26 Sep 2009 18:30:33 -0700
To: "Roy T. Fielding" <fielding@gbiv.com>
Cc: Larry Masinter <masinter@adobe.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-id: <C8272D4D-99FF-4D32-BB16-907E9BD9C84C@apple.com>
On Sep 26, 2009, at 6:06 PM, Roy T. Fielding wrote:

> On Sep 26, 2009, at 5:13 PM, Maciej Stachowiak wrote:
>
>> The definition for how to perform forgiving processing of resource  
>> identifiers originally started out in the HTML5 spec, where you  
>> suggest it should go. However, it was moved to a separate document  
>> based on strong objections from many parties. I understand from the  
>> below that your objection was solely to the use of the term "URL",  
>> and not to these processing rules being in the HTML spec. But that  
>> was not the sole objection. Many thought it was architecturally  
>> wrong to define these rules in the HTML spec. Thus, while I'm sure  
>> Ian Hickson would be perfectly happy to put the processing  
>> requirements back in HTML5, I'm not sure that is an acceptable long- 
>> term solution.
>
> I think it is hopeless to trace back all the screwed-up  
> misunderstandings
> of Web architecture that led to anyURI, LEIRI, and now HTML5-URL.
> I think I explained how it is supposed to work, succinctly and to the
> point where actual text can be applied to the HTML5 draft that will
> resolve all objections and settle this matter once and for all.
> If not, then we can deal with those new objections when they arise.

I think removing the use of the term "URL" from HTML5 would remove  
some objections, but I don't think folding the text of Web Address  
into HTML5 would address any objections, except perhaps the concern  
about lack of timely progress in this area.

>
>> Furthermore, besides the general architectural objection, there may  
>> be applications and technologies that wish to use HTML-style loose  
>> processing rules. Having those rules in the HTML spec instead of in  
>> a standalone specification makes it more difficult to reuse the  
>> technology.
>
> Those rules already exist in RFC3986, Appendix B.  What does not
> exist there is the behavior after parsing into the components,
> since that behavior is entirely application-dependent.  If HTML5
> wants to define that behavior, it can do so only if the requirements
> are stated to be specific to browser-like applications.

As far as I can tell, RFC3986 *only* defines how to extract  
components. It does not define how to turn an arbitrary string into a  
URI, which is potentially needed for HTTPbis. It does not define how  
to perform a relative resolution on a possibly-invalid reference  
against a possibly-invalid base.

That being said, I think what RFC3986 Appendix B says is a good  
definition of how to extract components from possibly-invalid strings.  
It seems way easier to understand than what the Web Address draft  
says, and better matches what implementations actually do.

>
>> On a more philosophical level: a lot more resource identifiers are  
>> extracted from attributes in HTML documents than from the sides of  
>> busses. It is not clear to me why the side-of-bus use case should  
>> be privileged. IRIs are a standard for the Internet, not for  
>> vehicular advertising. And indeed, many print ads these days drop  
>> the initial http: from the addresses they print.
>
> Also explained in 3986.  I don't remember if that was copied into  
> 3987.

Either way, the upshot is that strings may appear in bus ads that are  
not allowed to appear in format or protocol elements that require an  
IRI.

>
>> For an Internet standard, there is nothing wrong with defining  
>> rules for lenient processing as well as the syntax of strictly  
>> conforming input. Doing so can convert "experiment[s] in  
>> forgiveness" into interoperability.
>
> There is nothing wrong with defining correct processing rules for
> whatever thing you are trying to process, whether those rules be
> strict or lenient.  The problem is saying that the rules are for
> processing X when in fact you are actually processing Y and then
> unilaterally declaring that Y is the new definition of X.

I don't think Larry proposed to do that. He just suggested that a  
certain form of reusable lenient processing rules should be in the  
same spec as the normative definition of an IRI. I don't think he  
suggested that these rules should redefine what an IRI is.

Regards,
Maciej
Received on Sunday, 27 September 2009 01:31:14 UTC