W3C home > Mailing lists > Public > public-iri@w3.org > March 2010

RE: BIDI IRI Display (was spoofing and IRIs)

From: Larry Masinter <LMM@acm.org>
Date: Thu, 4 Mar 2010 07:53:50 -0800
To: "'Shawn Steele'" <Shawn.Steele@microsoft.com>, "'Slim Amamou'" <slim@alixsys.com>
Cc: <public-iri@w3.org>, "'Peter Constable'" <petercon@microsoft.com>, <unicode@unicode.org>
Message-ID: <002001cabbb2$e56206e0$b02614a0$@org>
Shawn, I don't think I was clear enough.

You say:
"The problem isn't an IRI in different contexts (a list of IRIs or
not), the problem is that an IRI *IS* a list."

No, I'm sorry. An IRI is a sequence of unicode characters.

Some people may think of an IRI as a list, others think of an IRI as a
magical and meaningless incantation. But those things you're talking
about are higher level semantic interpretations.  

There are two processes in place:


(1) transform IRI as sequence of unicode characters  to visual
presentation

(2) transform iRI as (sequence of unicode characters, interpreted as a
list) to visual presentation


What is *optimum* and *best* and *most accessible* and *user friendly*
for (2) may well be different than what is best for (1).

HOWEVER: I think it is more important that the results of
(1) and (2) be the SAME than it is that (2) be optimum.


If you disagree with that premise, then we can talk about what
is optimal for (2) and how we will mitigate the damage from
the possibility that (1) is different than (2).

I will note, in passing, that it *does* seem like some browsers,
when you copy  a *URI* from the browser address bar and paste into
some other window, the spaces will be converted to %20, i.e.,
there's at least a character level transformation, which kind
of makes sense in context. 

That is, there might be a separate kind of user interface
element which is an "IRI explanation", which doesn't use 
the normal Unicode -> visual display but instead has some
graphical representation based on showing the individual
parsed components of the IRI (oh, put the HOST in a red
box and the PATH in a blue box and the scheme in a tiny
font off to the right.)

I will also note that I think this is an area of "best practice"
that is likely to, and should be allowed to, evolve more rapidly 
than the base IRI protocol element which we are trying to
define quickly, that best practice can vary from browser to
browser without any need to standardize this, as it is a 
user interface element along with tabs and flashing history
lists, and in any case, I think belongs in a separate 
document. If UTC 36 is not that document, then I would 
suggest putting it in a separate one.

IMHO,

Larry
--
http://larry.masinter.net


________________________________________
From: Larry Masinter [masinter@gmail.com] on behalf of Larry Masinter
[LMM@acm.org]
Sent: Wednesday, March 03, 2010 6:00 PM
To: Shawn Steele; 'Slim Amamou'
Cc: public-iri@w3.org; Peter Constable; unicode@unicode.org
Subject: RE: BIDI IRI Display (was spoofing and IRIs)

If the same Unicode string is used for an IRI in running text and for
an IRI in a context where its use as a "ordered list", then it would
seem like

* the presentation of the IRI in different contexts is the same

is more important than

* the presentation of the IRI in known IRI contexts is optimal

Do you agree? I don't see how you can have both.

Larry
--
http://larry.masinter.net


-----Original Message-----
From: Shawn Steele [mailto:Shawn.Steele@microsoft.com]
Sent: Wednesday, March 03, 2010 9:13 AM
To: Slim Amamou; Larry Masinter
Cc: public-iri@w3.org; Peter Constable; (unicode@unicode.org)
Subject: RE: BIDI IRI Display (was spoofing and IRIs)

> An IRI is a sequence of Unicode characters. Is there not
> already a well-defined way of converting a sequence of
> Unicode characters to a visual display?

The problem (from my perspective at least) is that the Unicode BIDI
rules are somewhat "generic".  Unicode expects things like / and . to
be used in a context of same-script stuff, like a date, time or
number.  IRIs use them as delimiters for a list of elements (labels in
the domain name or folders in the path), in a hierarchical form.  The
Unicode BIDI algorithm doesn't recognize that there's an underlying
hierarchy, so it can end up "swapping" pieces in that hierarchy in
some cases.

I'm not sure UTR#36 is the proper place to clarify display of such
ordered lists.  Proper BIDI rendering of IRIs isn't just a security,
but also a usability, problem.  It does seem like perhaps this concept
should be mentioned in Unicode somewhere.  (IRIs aren't the only place
that similar ordered lists happen).

-Shawn
Received on Thursday, 4 March 2010 15:54:33 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:39:41 UTC