RE: [iri] #5: Separate IRI from "presentation of IRI" as concepts

> While it's true that this is a "transition", the question is whether 
> the presentation form is identical or is baked somehow. I tend to 
> favor the least amount of presentational sugar possible, with the 
> fundamental problem being that bidi URIs don't keep their elements 
> visually grouped "properly". And actually I thought that was Larry's 
> position (although he may have changed it).

Let me try to say what I meant in a different way, and see if this helps:

http://tools.ietf.org/html/rfc6365#page-24 
lists the terms "input method" and "rendering rules" and says

" A rendering rule is an algorithm that a system uses to decide how  to display a string of text."

So, let's suppose that the browser community specified a rendering rule,   "DISPLAY-IRI-TO-HUMAN-IN-ADDRESS-BAR", which gives a precise algorithm for showing a user an IRI in a browser "address bar" in way that the user might write it down or input it later.    ((For example, DISPLAY-IRI-TO-HUMAN-IN-ADDRESS-BAR might, for example, suggest rendering  http://tools.ietf.org/html/rfc6365#page-24  as  tools.ietf.org/html/rfc6365#page-24, e.g., suggest that in an address bar, the "http://" prefix need not be displayed, since it is the host name that the user should see. )) Perhaps there would be special rendering rules for Bidi IRIs.))

URIs had a design goal of easy "transcription": http://tools.ietf.org/html/rfc3986#section-1.2.1 :

   The URI syntax has been designed with global transcription as one of
   its main considerations.  A URI is a sequence of characters from a
   very limited set: the letters of the basic Latin alphabet, digits,
   and a few special characters.  A URI may be represented in a variety
   of ways; e.g., ink on paper, pixels on a screen, or a sequence of
   character encoding octets.  The interpretation of a URI depends only
   on the characters used and not on how those characters are
   represented in a network protocol.

IRIs have traded other benefits at the cost of making transcription much more complex and difficult.

"global transcription": using only ASCII characters gave a common method for transcription for URI exchange; using more characters means there will be IRIs that are more meaningful, but transcription will be harder (especially those not familiar with the language & characters displayed).

"A URI may be represented ...": for URIs, it was reasonable to treat "ink on paper" and "sequence of character encoding octets" as the same thing. For IRIs, to go from "ink on paper" to "sequence of character encoding octets" requires an "input method", and to go the other way requires "rendering rules". 

What I'm trying to do here in IETF IRI working group is to be clear that the IRI specification does NOT specify rendering rules.

Received on Monday, 28 November 2011 17:05:11 UTC