use of character entities (was: Re: Joint meeting at TPAC from HTML and i18n core WG minutes 2007-11-09) from Martin Duerst on 2007-11-20 (public-html@w3.org from November 2007)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Tue, 20 Nov 2007 15:12:32 +0900
To: public-html@w3.org, public-i18n-core@w3.org
Message-Id: <6.0.0.20.2.20071120145813.066d04d0@localhost>

Dear I18N WG, HTML WG,

This mail contains some comments about .

>Validator checking entity reqs
>
>   Henri: I don't check that character entities are only used for
>   characters that are unclear.
>   ... because I can't tell mechanically whether the character is
>   unclear

I think you could tell mechanically if you had a list of these.
Obviously, nobody ever cared to come up with a suitable list,
but it wouldn't be too difficult to come up with something
reasonable based on some Unicode character properties,
starting with things such as "all white space except SP",
and so on.

So my point here is not that it's not testable, it's that it
isn't fully specified out.


>   Ishida explain that this part of charmod is about best practices
>
>   it's not should in the normative sense

Richard, where did you get this from? The character model is very
clear about what SHOULD means. It's used in the IETF sense, and
it means: do it unless you have a good reason not to do it.

What is true is that the Character Model tends to err on the side
of strictness rather than lazyness in some cases. The world may not
collapse if you happen to occasionally ignore a SHOULD. But then,
that's why it's a SHOULD, not a MUST.


I think that on this issue, Bjoern Hoermann once theatened to create
something like a validator that would produce an error message for
each and every 'clear' character encoded as an entity.

This would of course be very bad usability design. For users, it would
first be much better if this produced a warning, not an error (after
all, it's just a SHOULD), and second, if the message was aggregated
("Warning: 200 unnecessary character entities detected, you may want
to change them to actual characters (e.g. &#xABCD; -> @@).").


>   Elika: Maybe you should go through the document and change the
>   wording of should sentences that don't match RFC2119 to something
>   else
>
>   Ishida: Well, we mean it that way for authors. Maybe we need to
>   create different classes and explain which recommendations apply to
>   which

We already have these classes, don't we? That's the [S], [I], [C]
indicators, or not? Of course, if we really got any of these wrong
in Charmod fundamentals, we should fix it, but first, please check
seriously whether there actually is a problem or not.


>   <fsasaki> [13]http://hsivonen.iki.fi/charmod-norm-checking/
>
>     [13] http://hsivonen.iki.fi/charmod-norm-checking/
>
>   Henri: I documented which constructs in HTML5 result in a continuous
>   string
>   ... I don't have any other comment there except that I wrote this
>   and it is available :)

Interesting work!

Regards,   Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Tuesday, 20 November 2007 06:14:03 UTC