RE: Updating the IRI spec to include "web addresses"

I've found it convenient to use "HRef" as a shorthand
in the document.

What I'm not sure of is whether I can get away with
just *replacing* the IRI -> URI algorithm, or if
I should leave both HRef -> URI and IRI -> URI.

Right now, the HTML5/"Web Address" draft is written as
"how to parse" and "how to resolve relative to absolute".

I'm not sure if it's possible to recast it as
HRef => URI, but it's certainly worth a try.

Larry
--
http://larry.masinter.net


-----Original Message-----
From: Roy T. Fielding [mailto:fielding@gbiv.com] 
Sent: Sunday, May 31, 2009 2:32 PM
To: Larry Masinter
Cc: HTML WG; public-iri@w3.org
Subject: Re: Updating the IRI spec to include "web addresses"

On May 31, 2009, at 10:12 AM, Larry Masinter wrote:

> (Please reply on public-iri mailing list):
>
> I started working on trying to merge the "Web Address" concept into  
> the IRIbis document, using the text edited by Dan Connolly and M.  
> Sperberg-McQueen.
>
> The biggest question I see is that an IRI is defined as a sequence  
> of CHARACTERS which are independent of the ENCODING - whether  
> UTF-8, UTF-16, or shift-jis or something else.
>
> However, "Web Address" (or "Hypertext Reference", as has been  
> suggested) is defined as a sequence of BYTES which in turn have a  
> CHARACTER ENCODING which is taken from the DOCUMENT or SCRIPT  in  
> which it is embedded.
>
> I don't think this is a difficulty, it's just an observation about  
> the layering.
>
> My intent is to use "Hypertext Reference" rather than "Web Address"  
> as the name of the concept being introduced, and to introduce a  
> "href" BNF.  At this point, I'm planning on adding this as an  
> appendix, and I'm considering moving the LEIRI section to an  
> appendix as well.
>
> Any problems with this direction? Special things to be concerned  
> about?

Is it the same direction as the following?

The thing between the quotes in an HTML href/src/... attribute is called
a hypertext reference.  A hypertext reference is converted first into an
infoset string in the document encoding (replacing entity references)
and then into a URI reference (replacing the document encoding with
some form of URI encoding).  Both of those conversions are defined by  
HTML5.
The latter is either done according to the IRI proposed standard or by
some other character-replacement algorithm cooked up by HTML5.
Once the attribute is in URI reference form, RFC3986 applies.  The only
thing called a Web Address is what RFC3986 defines as a URI.

I suppose hypertext reference is easier to say than the more technically
accurate designation of document-encoded resource identifier reference.

....Roy

Received on Monday, 1 June 2009 02:07:51 UTC