Re: HTML 5 removed "numeric character reference" term - why?

On Fri, 22 Jun 2007, Mike Brown wrote:
> Ian Hickson wrote:
> >
> > I agree that the spec is somewhat walking all over the SGML spec's 
> > terms for this stuff.
> > 
> > The problem is I need a term that means both "character entity 
> > references" and "numeric character references" and isn't unwieldly. 
> > Any suggestions? Right now the spec uses the term "character entity 
> > references" in the "writing html" section and just "entities" in the 
> > "parsing html" section.
> 
> "Character references" seems sufficiently general to me.
> 
> When introducing the term, just mention that it subsumes the concepts of 
> both "character entity references" and "numeric character references" 
> from SGML, HTML and XML.

On Fri, 22 Jun 2007, David Håsäther wrote:
> 
> I've never seen the term "character entity reference" referring to a 
> character reference with a single entity name. HTML 4 uses the term to 
> refer to character references[1]. [...]
>
> The thing is that the spec now uses "character entity reference" to refer
> to both character references and entity references (which should be clear
> by now). So naming the just "character references" would not include entity
> references at all.
 
Ok...

Note that what the HTML5 spec has are not what SGML and XML have. In 
HTML5, there are three things:

   &foo; - a way to include a character by name
   c - a way to include a codepoint by decimal number
   	 - a way to include a codepoint by hexidecimal number

They are no DTDs, so these aren't entities. All three are merely 
equivalent ways of doing character escapes. They're the equivalent of the 
CSS construct starting with a backslash: "\99".


> > The problem is I need a term that means both "character entity 
> > references" and "numeric character references" [...]
> 
> ... and hex character references I presume.
> 
> Why do they have to fall under the same category?

Well, in the parser they're all handled by the same state in the state 
machine, and in the syntax they are always allowed together. So if they 
had separate names, I'd always be saying "foo and bar" which is just a 
pain.

I'd be happy to use the term "character escapes" or some such. But then 
I'm also happy to use the word "entity". So those of you who care about 
what this is called should decide on some term and let me know what to put 
in the spec.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 22 June 2007 09:30:29 UTC