Re: use of character entities (was: Re: Joint meeting at TPAC from HTML and i18n core WG minutes 2007-11-09)

On Nov 20, 2007, at 08:12, Martin Duerst wrote:

>> Validator checking entity reqs
>>
>>  Henri: I don't check that character entities are only used for
>>  characters that are unclear.
>>  ... because I can't tell mechanically whether the character is
>>  unclear
>
> I think you could tell mechanically if you had a list of these.

Yes, but there is no objective list at this time in the spec or  
normatively referenced by the spec. (And even if there were, I'm not  
convinced that checking would be a good idea.)

> The world may not collapse if you happen to occasionally ignore a  
> SHOULD. But then, that's why it's a SHOULD, not a MUST.

I think the bar for making conformance requirements against  
technically unnecessary but technically harmless things should be very  
high. Escaped characters produced exactly as good a DOM as unescaped  
characters, so they are technically harmless except for the extra  
bytes transferred over the network. Making something like this a  
SHOULD devalues SHOULDs.

> I think that on this issue, Bjoern Hoermann once theatened to create
> something like a validator that would produce an error message for
> each and every 'clear' character encoded as an entity.
>
> This would of course be very bad usability design.

Indeed.

> For users, it would first be much better if this produced a warning,

Suggesting warnings instead of errors is a typical way to cop out of  
considering which spec requirements really need to be requirements.  
Emitting a warning here would still devalue validator *messages* in  
general and would produced as much output for the user to read.  
Changing errors to warnings doesn't improve usability. It potentially  
makes it worse since it means the user needs to think more.

> not an error (after all, it's just a SHOULD),

I disagree that SHOULD equals warning. SHOULDs are technical  
requirements and violations of technical requirements are errors. If a  
spec author wanted merely to document and aesthetic convention, SHOULD  
is inappropriate.

I think it is OK to use warnings when:
  1) The author is doing something that actually might cause technical  
harm and the validator developer would have wanted to emit an error  
but couldn't find spec text to back it up.
OR
  2) The situation genuinely requires human inspection to determine  
whether there is actual technical harm.

> and second, if the message was aggregated
> ("Warning: 200 unnecessary character entities detected, you may want
> to change them to actual characters (e.g. ꯍ -> @@).").

If you are the author, perhaps you had a reason to use escapes--such  
as an input method that is limited or wanted CMS source code to be all  
ASCII in order to avoid having to deal with non-ASCII program code  
issues in version control.

>>  Elika: Maybe you should go through the document and change the
>>  wording of should sentences that don't match RFC2119 to something
>>  else
>>
>>  Ishida: Well, we mean it that way for authors. Maybe we need to
>>  create different classes and explain which recommendations apply to
>>  which
>
> We already have these classes, don't we? That's the [S], [I], [C]
> indicators, or not? Of course, if we really got any of these wrong
> in Charmod fundamentals, we should fix it, but first, please check
> seriously whether there actually is a problem or not.

The way I see it is that a validator should check that a document  
meets requirements placed on content [C]. However, doing so for C047  
and C048 requirement would devalue validation messages.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 20 November 2007 08:35:50 UTC