W3C home > Mailing lists > Public > public-rdf-wg@w3.org > January 2013

Re: Intent to close ISSUE-205

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 16 Jan 2013 00:27:39 -0600
Cc: public-linked-json@w3.org, 'RDF WG' <public-rdf-wg@w3.org>
Message-Id: <850D065D-AA0F-42C6-B662-BF837EB3DA76@ihmc.us>
To: Manu Sporny <msporny@digitalbazaar.com>

On Jan 15, 2013, at 7:27 PM, Manu Sporny wrote:

> On 01/15/2013 10:26 AM, Pat Hayes wrote:
>> Revolutionary though it may seem, I would suggest, when writing a Web
>> standard, to actually use the terminology defined by other normative
>> Web standards. That is, if you mean IRI, say "IRI", and if you mean
>> URL, say "URL". To do otherwise is at best confusing, and at worst so
>> bloody stupid that it is impossible to even discuss it politely.
> 
> There is a deeper story here, Pat (and those that feel that continuing
> to use the IRI terminology is perfectly okay).
> 
> Here's are the numerous problems with term 'IRI':
> 
>  * the vast majority of Web developers don't know what it is

Any web developer - anyone over the age of ten - who is incapable of typing "IRI" into Gogle and finding http://www.ietf.org/rfc/rfc3987.txt as the first hit, doesn't deserve us to even consider them as fully human, IMO.

>  * many of them will never learn the difference because the
>    difference doesn't matter to them in practice.

Then by all means, if the difference does not matter to them, let them pretend that by "IRI" we mean "URL" and proceed on that basis. But when they write software that does not get used in the fastest-growing markets in China and Malaysia and the Indian subcontinent, let their managers blame their ignorance rather than our stupidity in using the wrong terminology in a normative specification.

>  * The HTML5 spec uses URL everywhere for these reasons, that
>    will be the norm going forward.

Perhaps. I don't think this will last more than a few years, but no doubt we will have to agree to disagree.

>  * creating a new term to specify "an internationalized URL" was,
>    in hindsight, the wrong thing to do because it has caused a great
>    deal of confusion.

I disagree. But in any case, this view of yours is a personal one, and if you believe this then you should be lobbying to have http://www.ietf.org/rfc/rfc3987.txt declared to be deprecated, or some such action. But as it is not deprecated, but is the current standard, we should at the least acknowledge that and conform to it. 

>  * being pedantic when attempting to communicate a general concept
>    to a general audience can do more harm than good.

To use technical words defined by normative standards normatively, when writing other normative standards, is not "pedantry". To do otherwise is irresponsible, harmful, and stupid. 

Look, there are two points here, which are getting muddled. One is the idea that IRIs were a mistake, that Real People only use URLs, and so on. If you are sure of this, then by all means write the spec so that it requires URLs and not IRIs. Use "URL" to refer to URLs, and say that is what you mean. You might draw a little attention to the restriction that you are making, because some people will raise their eyebrows. All your points above seem relevant to this idea. 

But the other idea, and the one to which I was objecting, is that we will IN FACT refer to IRIs, but we will PRETEND that they are called "URLs" because our readers like that word better. Note, this second proposal -  to deliberately mis-use a technical term to refer to something that it does not in fact refer to - has no bearing whatever on the facts of the case, so your points above about what Web developers want (URLs, not IRIs) and what is a mistake (IRIs not URLs) are completely irrelevant. Your poor developers are not IN FACT going to be reading about URLs, they are just going to be given the surface impression that they are; but iN FACT the spec will still be talking about IRIs. Anyone who doesn't know what an IRI is, is not going to be helped by this. At best, they might remain unaware of the editorial bait-and-switch that you will have pulled on them, but no doubt they will discover this when the code they write fails some conformity test because they were under the naive illusion that "URL" actually meant "URL" instead of meaning "IRI". 

It is the second idea that my scorn was, and continues to be, directed towards. None of your points in this email bear on this idea of deliberately mis-using technical terminology.

>  * this is coming down the pipeline: http://url.spec.whatwg.org/

Then either (1) wait until it is out of the pipeline, and write the spec citing it normatively, or (2) refer to it explicitly and explalin what you are doing here in some detail, with a forward-reference which makes your change official when this gets to be a recommendation. 

> 
> A little more detail on the points above...
> 
> Web developers don't understand the IRI terminology
> ---------------------------------------------------
> 
> I've been very supportive of the use of IRI terminology in specs for a
> number of years. For example, in the RDFa spec, we ended up including
> this note because of the number of comments we got on the subject:
> 
> """
> RDFa is a way of expressing RDF-style relationships using simple
> attributes in existing markup languages such as HTML. RDF is fully
> internationalized, and permits the use of Internationalized Resource
> Identifiers, or IRIs. You will see the term 'IRI' used throughout this
> specification. Even if you are not familiar with the term IRI, you
> probably have seen the term 'URI' or 'URL'. IRIs are an extension of
> URIs that permits the use of characters outside those of plain ASCII.
> RDF allows the use of these characters, and so does RDFa. This
> specification has been careful to use the correct term, IRI, to make it
> clear that this is the case.
> """
> 
> Web Keys spec... same thing. PaySwarm base spec, same thing... and now
> we're getting the same comments "IRIs are confusing to Web developers"
> for the JSON-LD spec. These comments didn't come from the same people,
> or same group of people, they came from a myriad of web developers with
> different backgrounds. What was not clear to me two years ago is now
> very obvious. Web developers don't understand the difference between URL
> and IRI

Then let them read the specs. I had to :-)

> and more importantly, they should not have to.

Ah, that is the other point. If this really is true, then write the spec so that it refers to URLs rather than IRIs. But *really* write it that way, don't just PRETEND to.

> 
> The Difference Doesn't Matter in Practice
> -----------------------------------------
> 
> If URLs had been designed correctly in the beginning (which is
> fantastically easy to say with hindsight), they would've included
> internationalized characters and we wouldn't be in this mess. Web
> developers call IRIs URLs in practice, it's everywhere, look at the
> documentation on building websites and you will find very little to no
> use of the term IRI.
> 
> Google search index count for
>   URL: 366M
>   URI:  27M
>   IRI:   4M
> 
> In fact, I had no idea what an IRI was until I hit RDF.

I had no idea what RDF was until I hit RDF. So what? I don't find this entire discussion persuasive or even important. We are writing a technical specification, not a guide for dummies. Anyone who can read (say) the XML Schema specs is surely not going to find RFC 3987 much of an obstacle. 

> Nobody I worked
> with knew what an IRI was before we started working with RDF. It didn't
> matter then and it still doesn't matter now (unless you want to be
> extremely pedantic

Or you want to write in Chinese or Tagalog or Cyrillic. Or if you actually read specs. I am sure that many Web developers actually do read them, in fact, but you only hear the complaints from the others. 

> , which is a mistake when trying to convince new Web
> developers to use this stuff). You are not penalized when you stick an
> IRI instead of a URL in your web page in any way (or vice-versa). The
> difference doesn't matter to 99.999% of the people building and using
> the Web.

So Web developers who have worried about how to deal with IDN homograph security (for examle) are less than 0.001% of the total. I guess maybe, but I kind of doubt it. I see more and more communications in non-Western alphabets these days. 

> 
> Future Work on Merging URL with IRI
> -----------------------------------
> 
> Anne is working on this http://url.spec.whatwg.org/. Two of the goals are:
> 
> * Align RFC 3986 and RFC 3987 with contemporary implementations and
>  obsolete them in the process. (E.g. spaces, other "illegal" code
>  points, query encoding, equality, canonicalization, are all concepts
>  not entirely shared, or defined.) URL parsing needs to become as
>  solid as HTML parsing. [URI] [IRI]
> 
> * Standardize on the term URL. URI and IRI are just confusing. In
>  practice a single algorithm is used for both so keeping them distinct
>  is not helping anyone. URL also easily wins the search result
>  popularity contest.
> 
> The writing is on the wall. I suggest that the RDF WG move toward the
> URL terminology. I was attempting to start the ball rolling with the
> JSON-LD spec, at least, attempt to future-proof the spec a bit. That
> failed. I hope there are others in both the JSON-LD CG and RDF WG that
> share this view. IRIs and URIs are dead, they just don't know it yet...
> long live the URL.

You might be right, but making the text of a standard into nonsense is not the right way to start a revolution. The fact is, right now, "URL" and "IRI" have different meanings. By all means include an explanation in the spec saying that "IRI just means modern URLs that use Unicode and satisfy RFC 3987" or the like. But don't pretend that RFC 3986 and 3987 are the same when in fact they aren't.

Pat


> 
> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: The Problem with RDF and Nuclear Power
> http://manu.sporny.org/2012/nuclear-rdf/
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 16 January 2013 06:58:37 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:53 GMT