W3C home > Mailing lists > Public > public-html@w3.org > July 2014

Re: [whatwg] several messages about the HTML syntax

From: fantasai <fantasai.lists@inkedblade.net>
Date: Tue, 22 Jul 2014 08:47:47 -0700
Message-ID: <53CE87A3.4030006@inkedblade.net>
To: Ian Hickson <ian@hixie.ch>
CC: whatwg@whatwg.org, "public-html@w3.org WG" <public-html@w3.org>
On 03/02/2008 03:02 PM, Ian Hickson wrote:
>
> On Tue, 31 Jul 2007, Philip Taylor wrote:
>>
>> IE undocumentedly recognises some which nobody else does:
>>
>> aafs    U+206D  ACTIVATE ARABIC FORM SHAPING
>> ass     U+206B  ACTIVATE SYMMETRIC SWAPPING
>> iafs    U+206C  INHIBIT ARABIC FORM SHAPING
>> iss     U+206A  INHIBIT SYMMETRIC SWAPPING
>> lre     U+202A  LEFT-TO-RIGHT EMBEDDING
>> lro     U+202D  LEFT-TO-RIGHT OVERRIDE
>> nads    U+206E  NATIONAL DIGIT SHAPES
>> nods    U+206F  NOMINAL DIGIT SHAPES
>> pdf     U+202C  POP DIRECTIONAL FORMATTING
>> rle     U+202B  RIGHT-TO-LEFT EMBEDDING
>> rlo     U+202E  RIGHT-TO-LEFT OVERRIDE
>> zwsp    U+200B  ZERO WIDTH SPACE
>>
>> (I believe that list is complete.)
>>
>> The first eleven were suggested on
>> https://listserv.heanet.ie/cgi-bin/wa?A2=ind9605&L=html-wg&P=4579 some
>> time ago but don't seem to have gone very far (except into IE).
>>
>> I can see some legitimate users at
>> <http://www.tasb.com/services/field/staff/index.aspx?print=true> and
>> <http://www.pelesoft.co.il/> and maybe there's a few dozen or hundred
>> more elsewhere (but I can't measure it easily). There's some in text-art
>> at <http://yy28.60.kg/test/read.cgi/maido3/1096370177/l50> and quite a
>> lot in weird places like
>> <http://cheese.2ch.net/life/kako/1010/10103/1010391447.html> or
>> <http://zerosen52.gozaru.jp/log/1093422333.html> that I don't understand
>> but that seem to all be on 2channel (or copied from it). I've no idea
>> how common they are in general.
>>
>> Are these used significantly on the web, or would they be considered
>> highly useful if anyone knew they existed, or should HTML5 just ignore
>> them?
>
> I'm very skeptical about introducing entities for the codes that are
> redundant with dir="" and <bdo> (namely, lre, lro, pdf, rle, rlo).

I agree 100% with this rationale.

> I don't know enough about the others to have an educated opinion. I can
> set up a search to examine the data in more detail.

I don't know much about the others, but I can provide some info on ZWSP.
It is (as specced) equivalent to <wbr>. Specifically, it
   * defines a word break (line-breaking opportunity)
   * thereby breaks Arabic joining

For contrast:
   ZWSP - Breaks a word (and therefore also Arabic joining) with no visible space.
   ZWJ  - Not a word break. Forces joining behavior.
   ZWNJ - Not a word break, but breaks joining.

ZWJ and ZWNJ are primarily useful for Arabic and other shaped scripts. I'm
not sure of the common uses for ZWJ, but ZWNJ is frequently used in Persian
to visually separate grammatical prefixes from the rest of the word (without
breaking the word or introducing extra space).

ZWSP is more likely to be used in Thai and related scripts, to define word
boundaries. Thai does not use spaces between words, so break opportunities
need to either be marked with ZWSP or found with a dictionary. Even in the
presence of automatic dictionary-breaking, however, there are ambiguous
cases which will need ZWSP to show the correct break-point.

There's further discussion about this in
   https://www.w3.org/Bugs/Public/show_bug.cgi?id=13108
I've no comment on concerns about compatibility with XML, but I can say
that I've typed &zwsp; multiple times expecting it to work and find it
surprising that &zwj; and &zwnj; work, but &zwsp; does not...

~fantasai
Received on Tuesday, 22 July 2014 15:48:21 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:16:39 UTC