RE: [Serial] I18N WG last call comments

Hello Michael,

Many thanks for your list. This is very helpful for further
analysis below.

At 20:53 04/05/24 +0100, Michael Kay wrote:

> >
> > >There
> > >are many ways that we allow XSLT stylesheets to generate
> > non-conformant
> > >HTML, and I don't see that this one is particularly
> > different from the
> > >others.
> >
> > Could you point to a list of these, or list (some of) them here?
> >
>
>For example:
>
>* you can produce elements and attributes that aren't defined in HTML
>
>* you can nest elements in ways that aren't allowed in HTML

Those two would be very difficult to check because they would make the
whole thing dependent on HTML versions,... They would require XSLT
processors to do actual validation, which would be a serious
additional requirement.


>* you can give attributes values that aren't allowed in HTML

These are difficult to check because in many cases, the
allowed values are only described in prose in the HTML spec.
Not even validation can express these.


>* you can use any system ID and public ID that you like in the doctype
>declaration

To quite some extent, this is a feature, to allow for new HTML versions.


>* you can use disable-output-escaping (or now character maps) to produce any
>kind of garbage that takes your fancy

Well, yes, but then you know you are off on your own. We have looked
at character maps, and although they can be misused, the description
and the intent is okay, and sometimes, they can be helpful. If somebody
*really* wants to output windows-1252 as iso-8859-1, they could actually
use charmaps.


>* you can suppress the escaping of URIs in URI-valued attributes

That may make sense because the user already escaped them.
And it's optional, not automatic.


>* you can suppress the generation of the META element defining the character
>encoding or generate your own that contains a value unrelated to the true
>character encoding

Again, optional; the XSLT programmer has to explicitly select it.


>All these features are occasionally useful either to exploit non-standard
>features in browsers, or to generate output designed for processing by
>software other than HTML browsers.

As we have charmaps, I think we are well-enough covered for the case
where somebody *really* wants to read in windows-1252 declared as
iso-8859-1, and output it in the same way. But if this is really the
strange wish of a user, they should have to explicitly do some work
to get there. The largest majority of users doesn't want to do
GIGO (garbage-in-garbage-out), and with a very simple and cheap
change to the spec and the implementations, we can help these
users become aware of integrity problems in their data.


Regards,    Martin.

Received on Tuesday, 1 June 2004 04:05:57 UTC