Re: Validating XHTML5 with XML entities

Hi Jeff,

On Aug 28, 2008, at 12:12 AM, Jeff Schiller wrote:

>
> Robert,
>
> I get it now - and I agree it's a shame that non-numeric entities are
> not treated opaquely in XML (but even namespaces require a prefix
> declaration to avoid yellow screens of death).

They shouldn't for XML well-formedness, but without the declaration  
(either in the document or in the schema or prose defining the  
document), the document would not be processed as an XML namespaces  
document: simply as an XML document with many element and attribute  
names with colons. So  again though, those default declaration could  
be made by HTML5 (for the XML serialization or even for the text/html  
serialization if we supported namespaces). So that means authors could  
simply write <math:cn> and its mapping to the MathML vocabulary would  
be available by default. Authors would still be able to set  
xmlns:math='<someotheruri>' on the root element to override those  
default declarations, but HTML5 could provide a collection of default  
declarations for easy authoring (MathML, SVG, RDF, XMP, XForms, XLink,  
XInclude, etc.). Every HTML5 XML serialized (or event text/html  
serialized) document would automatically be processed as an XML  
namespaces document by HTML5 UAs. Moreover, they would do it with no  
need for authors to declare xmlns prefixes.

> So in your proposal HTML-aware UAs will inherently know about the HTML
> character entities while standalone parsers will need the help of a
> DTD that someone will have to write?  In essence, browsers will have
> no need of a DTD.

Exactly, the major browsers that process XHTML (all but IE) already do  
this. It would be contrary to our stated process and design principles  
to overlook that and not make it part of HTML5.

> Personally, I hate the concept of named XML entities in the first
> place and would prefer numeric character references throughout (and
> yes, ideally I'd like to use the Unicode characters directly in the
> markup - but what I'm dealing with is 'WordPress chrome', not content
> that I write).

Well I can understand that. Its not a very robust method of  
transclusion. My view is that WordPress should do the right thing and  
stick with a UTF and use literal characters, while HTML5 should do the  
right thing and accommodate authors and authoring tools who don't  
follow that advice.

Take care,
Rob

Received on Wednesday, 27 August 2008 21:41:00 UTC