- From: James M Snell <jasnell@gmail.com>
- Date: Sat, 01 Dec 2007 21:25:45 -0800
- To: URI <uri@w3.org>
To go along with Joe's URI Template work, I've been working on support for IRI Templates. The key differences between URI and IRI templates are a) the characters allowed within the {...} tokens and the pct-encoding rules. Whereas a URI Template is used to produce URI's, an IRI Template is used to produce IRI's. As one can expect, there are a number of issues that can make working with IRI Templates more difficult than URI Templates. The most difficult issue is handling of bidi characters. I've been working on some rules that I'd like to get some feedback on. First, here's my ABNF production for IRI Templates: ivalue = *(iunreserved / pct-encoded) ; replacement value for token iunreservedsansdash = (alphanum / "." / "_" / "~" / ucschar) iarg = *(reserved / iunreserved / pct-encoded) ivarname = iunreservedsansdash *(iunreserved) ivardefault = ivalue ivar = ivarname [ "=" ivardefault ] ivars = ivar [*(sep ivar)] ivarnodefault = ivarname ivarsnodefault = ivarname [*(sep ivarname)] ioperator = ( append "|" iarg "|" ivar ) / ( prefix "|" iarg "|" ivar ) / ( join "|" iarg "|" ivars ) / ( listjoin "|" iarg "|" ivarnodefault ) / ( opt "|" iarg "|" ivarsnodefault ) / ( neg "|" iarg "|" ivarsnodefault ) / ( extop "|" (iarg / range) "|" (ivar / ivars / ivarnodefault / ivarsnodefault) ) itoken = "{" ivar / ioperator "}" itemplate = *(reserved / ipchar / iprivate / itoken ) itemplate-expansion = IRI / IRI-reference Within this production, the ivar, ivalue and iarg productions can contain bidi characters. The rules for handling bidi chars in an IRI Template are: 1. IRI Templates MUST be stored and transmitted in logical order 2. IRI Templates MUST be rendered using the unicode bidi algorithm 3. The entire IRI Template MUST be rendered as if they were in a LTR embedding (preceded by U+202A, and followed by U+202C). This is the same as IRI's a defined by RFC3987. As with IRI's, there is no need to explicitly use this embedding if the template can be displayed properly without it. 4. Each pipe-delimited segment in the {...} token is treated as a separate component. 5. The first component (the op component) is always rendered LTR 6. The second component (the arg component) is always rendered LTR, as if they were in an LTR override (preceded by 0x202D, and followed by 0x202C). This ensures that the arg will always be rendered in logical order (LTR) in order to avoid any possible confusion. 7. The third component (the var component) is segmented depending on the number of vars and specified default values. The following illustrates the segmentation <LRM>var</LRM>=<LRO>default</LRO>,<LRM>var</LRM>=<LRO>default</LRO> Note that like the arg component, the default is always rendered using a LTR override. This ensures that the default is always presented in logical order. 8. The IRI Template itself MUST NOT contain bidi formatting characters. An implementation may wish to provide a modified "for display" version of the IRI Template with appropriate bidi formatting characters inserted into appropriate locations in the template to ensure proper rendering, but those control characters MUST be removed prior to processing the template. 9. A component SHOULD NOT use both LTR and RTL characters. 10. A component using RTL characters SHOULD start and end with RTL characters. To illustrate the effect this has on the template, imagine the following scenario. Assume that capital letters are RTL. I have a template whose logical ordering is: http://example.org?{-join|ABCD|EFGH=IJKL,MNOP=qrst} (yes, I know it's unlikely that the join separator will be a string of RTL characters but I'm doing this to illustrate a point) Since the |, = and , characters are directionally neutral, without any bidi formatting, when rendered the template will end up looking something like: http://example.org?{-join|PONM,LKJI=HGFE|DCBA=qrst) Which is obviously incorrect and confusing. It can get even uglier if the arg and default have a mix of LTR and RTL characters. By contrast, with the bidi rules applied, the template is rendered as: http://example.org?{-join|ABCD|HGFE=IJKL,PONM=qrst} Notice that the only characters displaying RTL are the varname's. The arg and default components, both of which are treated as literal values to be inserted into the IRI are displayed in the same logical order in which they are expected to be inserted into the IRI. Also note that each of the components appear in the proper order in the rendered template. There is no confusion or ambiguity in the template. Have I missed anything? - James
Received on Sunday, 2 December 2007 05:25:55 UTC