Re: Intent to close ISSUE-205

* Manu Sporny <msporny@digitalbazaar.com> [2013-01-15 20:27-0500]
> On 01/15/2013 10:26 AM, Pat Hayes wrote:
> > Revolutionary though it may seem, I would suggest, when writing a Web
> > standard, to actually use the terminology defined by other normative
> > Web standards. That is, if you mean IRI, say "IRI", and if you mean
> > URL, say "URL". To do otherwise is at best confusing, and at worst so
> > bloody stupid that it is impossible to even discuss it politely.
> 
> There is a deeper story here, Pat (and those that feel that continuing
> to use the IRI terminology is perfectly okay).
> 
> Here's are the numerous problems with term 'IRI':
> 
>   * the vast majority of Web developers don't know what it is
>   * many of them will never learn the difference because the
>     difference doesn't matter to them in practice.
>   * The HTML5 spec uses URL everywhere for these reasons, that
>     will be the norm going forward.
>   * creating a new term to specify "an internationalized URL" was,
>     in hindsight, the wrong thing to do because it has caused a great
>     deal of confusion.
>   * being pedantic when attempting to communicate a general concept
>     to a general audience can do more harm than good.
>   * this is coming down the pipeline: http://url.spec.whatwg.org/

The HTTP spec tells me how to dereference http: *URLs*, e.g.
<http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85&amp;channel=R%26D>.
Likewise dav:, ftp:, data:, mail:, and about 150-odd others. Specifically,
the path component is %unescaped and interpreted as a UTF-8 string.

IRIs are a pretty different animal, which happen to have a
transformation to URLs involving punycode and %-encoded UTF-8. (The
fact that syntactically URIs are a sublanguage of IRIs is a borderline
damaging red herring.) The important thing is that if I get want to
dereference an IRI like
<http://伝言.example/?user=أكرم&amp;channel=R%26D>, I have a specific
procedure to follow to turn it into something 2616 can handle.

In retroactively declaring all URLs to behave like IRIs, we have to
visit all of the specs which use URL and URI very specifically. I have
no idea if that's even considered.


> A little more detail on the points above...
> 
> Web developers don't understand the IRI terminology
> ---------------------------------------------------
> 
> I've been very supportive of the use of IRI terminology in specs for a
> number of years. For example, in the RDFa spec, we ended up including
> this note because of the number of comments we got on the subject:
> 
> """
> RDFa is a way of expressing RDF-style relationships using simple
> attributes in existing markup languages such as HTML. RDF is fully
> internationalized, and permits the use of Internationalized Resource
> Identifiers, or IRIs. You will see the term 'IRI' used throughout this
> specification. Even if you are not familiar with the term IRI, you
> probably have seen the term 'URI' or 'URL'. IRIs are an extension of
> URIs that permits the use of characters outside those of plain ASCII.
> RDF allows the use of these characters, and so does RDFa. This
> specification has been careful to use the correct term, IRI, to make it
> clear that this is the case.
> """
> 
> Web Keys spec... same thing. PaySwarm base spec, same thing... and now
> we're getting the same comments "IRIs are confusing to Web developers"
> for the JSON-LD spec. These comments didn't come from the same people,
> or same group of people, they came from a myriad of web developers with
> different backgrounds. What was not clear to me two years ago is now
> very obvious. Web developers don't understand the difference between URL
> and IRI and more importantly, they should not have to.

The web developers who don't need to know the difference are exactly
those that can use IRIs exclusively. Anyone interacting with any of
the 150-odd URL-accessed protocols has to know whether to punify,
UTF-8-ify and %-ify.


> The Difference Doesn't Matter in Practice
> -----------------------------------------
> 
> If URLs had been designed correctly in the beginning (which is
> fantastically easy to say with hindsight), they would've included
> internationalized characters and we wouldn't be in this mess. Web
> developers call IRIs URLs in practice, it's everywhere, look at the
> documentation on building websites and you will find very little to no
> use of the term IRI.
> 
> Google search index count for
>    URL: 366M
>    URI:  27M
>    IRI:   4M
> 
> In fact, I had no idea what an IRI was until I hit RDF. Nobody I worked
> with knew what an IRI was before we started working with RDF. It didn't
> matter then and it still doesn't matter now (unless you want to be
> extremely pedantic, which is a mistake when trying to convince new Web
> developers to use this stuff). You are not penalized when you stick an
> IRI instead of a URL in your web page in any way (or vice-versa). The
> difference doesn't matter to 99.999% of the people building and using
> the Web.

This is, I believe, and argument that it's easier to invent new terms
for what we conventionally call URLs and URIs and re-train the
cognoscenti (all 4M of them) to use the new terms. This is a bizarre
consequence of the success of web browsers, which have introduced some
variant of "URL" into most current languages, and then backed that
term up with feature creep. Things get messy when users share a term
for Do The Right Thing with developers who use it for Do Exactly This.


> Future Work on Merging URL with IRI
> -----------------------------------
> 
> Anne is working on this http://url.spec.whatwg.org/. Two of the goals are:
> 
> * Align RFC 3986 and RFC 3987 with contemporary implementations and
>   obsolete them in the process. (E.g. spaces, other "illegal" code
>   points, query encoding, equality, canonicalization, are all concepts
>   not entirely shared, or defined.) URL parsing needs to become as
>   solid as HTML parsing. [URI] [IRI]
> 
> * Standardize on the term URL. URI and IRI are just confusing. In
>   practice a single algorithm is used for both so keeping them distinct
>   is not helping anyone. URL also easily wins the search result
>   popularity contest.
> 
> The writing is on the wall. I suggest that the RDF WG move toward the
> URL terminology. I was attempting to start the ball rolling with the
> JSON-LD spec, at least, attempt to future-proof the spec a bit. That
> failed. I hope there are others in both the JSON-LD CG and RDF WG that
> share this view. IRIs and URIs are dead, they just don't know it yet...
> long live the URL.

I think Anne's effectively working out a name for the thing in the
location bar, which is the same was what Javascript libraries happily
dereference, and pretty close, modulo &-escaping, to what you can
stick in HTML href, src and action attributes. Yeah, there are a lot
of folks out there who think of those as URLs.


> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: The Problem with RDF and Nuclear Power
> http://manu.sporny.org/2012/nuclear-rdf/
> 

-- 
-ericP

Received on Wednesday, 16 January 2013 03:23:12 UTC