Re: several messages about New Vocabularies in text/html

On Sat, 5 Apr 2008, David Carlisle wrote:
> > 
> > Is there some permanent URI from which the absolute latest unicode.xml 
> > file from which that document is created can always be found? (I don't 
> > mind if it's not in w3.org space, in case you edit the document 
> > elsewhere where the document would be more up to date, it's just a 
> > reliably up to date URI that I'm looking for.)
> 
> The one linked to from the document at
>
>    http://www.w3.org/2003/entities/2007xml/
>
> is (always) the latest version of the file (and of the stylesheets used 
> to extract information from other sources into that file, and from that 
> file into the document's tables and DTD entity declarations).

Awesome.

What is the intent of the STIX set? It has some <entity> entries with 
empty id="" attributes, and has clashes with the other sets:

 entity lowast is associated both with U0204E and U02217
 entity blank is associated both with U02422 and U02423
 entity cudarrr is associated both with U02935 and U02939
 entity olarr is associated both with U021BA and U02940
 entity orarr is associated both with U021BB and U02941
 entity larrpl is associated both with U02939 and U02946
 entity veebar is associated both with U022BB and U02A61
 entity Lt is associated both with U0226A and U02AA1
 entity Gt is associated both with U0226B and U02AA2
 entity fnof is associated both with U00192 and U1D453

Apart from the STIX entities, is the idea that any modern specification, 
for maximum compatibility, would just support all the entities defined, or 
are there other sets that should be avoided?


HTML5 has the following entities (supported for legacy reasons) which are 
not in the unicode.xml file:

 AMP   U+00026
 COPY  U+000A9 
 GT    U+0003E
 LT    U+0003C
 QUOT  U+00022 
 REG   U+000AE
 TRADE U+02122 

Would it be possible to add these?

In HTML5, we changed the mappings for &lang; and &rang;. The legacy 
mappings of these two characters are to characters that are defined as 
canonically equivalent to CJK wide characters. The new mappings are:

 lang  U+027E8
 rang  U+027E9

Is it possible to change these?

HTML5 also has to support a number of entities without the normally 
required trailing semi-colon. For example, &AElig; can also be written 
&AElig in text/html, and it still works (though it isn't valid). I propose 
to support these independently of unicode.xml, as part of the HTML5 spec.

I've now added the ~2000 entities in the unicode.xml file to the HTML5 
spec. It will automatically keep up to date with the unicode.xml file as 
the file changes. The real question will be whether we actually want to 
support all these new entities in text/html.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Sunday, 6 April 2008 01:10:45 UTC