agenda+ UTS55, UAX31 and URLPattern (FW: [whatwg/urlpattern] Consider fully supporting RTL and Bidi URLs (#43))

The below reply to our comments bears some discussion before we reply.

 

I note that Urlpattern (the spec we’re commenting on) is not the same thing as URL. 

 

UTS55 is a new thing, created to address the “Trojan source” discussion. UAX31 is kind of a (hard to read) superset of our Charmod-norm.

 

I think my reply to “what did you have in mind” would be to point out that “users” of urlpattern are authors of urlpatterns, not end-users of the resulting URLs and these users need protection from bidi spoofing/reordering problems. However, specific recommendations need to be approached carefully. It’s worth discussing and maybe looping in Robin Leroy and Mark Davis to explain the Unicode stuff.

 

Thanks,

 

Addison Phillips

Chair (W3C Internationalization WG)

 

Internationalization is not a feature.

It is an architecture.

 

 

 

From: Jeremy Roman <notifications@github.com> 
Sent: Wednesday, March 20, 2024 2:51 PM
To: whatwg/urlpattern <urlpattern@noreply.github.com>
Cc: Addison Phillips <addisonI18N@gmail.com>; State change <state_change@noreply.github.com>
Subject: Re: [whatwg/urlpattern] Consider fully supporting RTL and Bidi URLs (#43)

 

I spent some time today reading through UTS 55, UAX 31, and CHARMOD. I'm still not entirely clear on what your request/recommendation here is, so would appreciate more insight on that point.

We have a few things we'd like to be consistent with, including WHATWG URL, ECMA-262 regular expressions, and ECMAScript (JavaScript) -- the web technologies most likely to be used in combination with URL patterns.

The major thing you've raised, if I understand correctly, is the potential for RTL characters to lead to a misleading visual rendering of a URL pattern, though of course if you were to read the code point sequence directly there is no ambiguity.

As far as I can tell, URLs have the same treatment when not URL-encoded https://example.com/%D9%85%D9%86%D8%AA%D8%AC/%D9%85%D8%B9%D8%B1%D9%81 renders as https://example.com/منتج/معرف, even though these place the two path components in opposite visual orders. Though URLs under the hood operate entirely in ASCII. At least, this doesn't seem a new issue to the platform -- but more importantly I'm worried that some resolutions might end up being equally confusing by mismatching with how URLs work.

One thing UAX 31 § 4.1.1 discusses is ignore directional marks next to identifiers, which would allow a source file to represent it in a way that flowed in a consistent direction, which would allow containing the bidi flow to within a particular identifier or similar without introducing on direction-neutral syntax characters, but I'm not sure whether that's an improvement (it doesn't prevent misleadingly rendered ones from being typed, editors may not emit it, and if everything else is RTL it may depart further from expectations -- but I don't know enough about RTL languages to know for sure what the expectations are), and it's not permitted in the similar case of named capture groups in ECMAScript regular expressions as far as I can tell.

CHARMOD advises somewhat against allowing non-ASCII characters in "application internal identifiers", which these identifiers (but not literals that appear in the URL) are. But ECMAScript and ECMAScript regular expressions do prevent them, and disallowing non-ASCII seems like it's also not a great experience for speakers of those languages, anyway.

Otherwise, we could certainly write a "security considerations" section that warns that it is possible to write URL patterns using non-ASCII characters which may have a misleading or confusing visual rendering. There's currently nothing that displays these to the user, so this wouldn't really have any normative effect -- just a warning to developers and a possible opportunity for tool authors.

What did you have in mind?

—
Reply to this email directly, view it on GitHub <https://github.com/whatwg/urlpattern/issues/43#issuecomment-2010702265> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAQ3WX2DCTAVZARPHS4KYTYZIAC3AVCNFSM4VBJEG32U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBRGA3TAMRSGY2Q> .
You are receiving this because you modified the open/close state.  <https://github.com/notifications/beacon/AAAQ3WQXYSCOEYZQUOENQDDYZIAC3A5CNFSM4VBJEG32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOO7MODOI.gif> Message ID: <whatwg/urlpattern/issues/43/2010702265@github.com <mailto:whatwg/urlpattern/issues/43/2010702265@github.com> >

Received on Thursday, 21 March 2024 00:08:59 UTC