Re: XHTML attributes vs whitespace

Julian Reschke wrote:
> Henri,
> 
> in
> 
> <http://www.w3.org/Bugs/Public/show_bug.cgi?id=9965#c12>
> 
> you say:
> 
> "With <!DOCTYPE html>, all attribute are CDATA attributes."
> 
> Could you elaborate how you came to that conclusion?

>From http://www.w3.org/TR/REC-xml/#AVNormalize :
"All attributes for which no declaration has been read SHOULD be treated by a non-validating processor as if declared CDATA."

The type for an attribute can only be declared by the bit of syntax called AttType in the BNF. Follow the BNF from AttType upwards: AttDef, AttlistDecl, markupdecl. From markupdecl upwards, the possibilities are extSubsetDecl and intSubset. From intSubset going upwards we get to doctypedecl. From extSubsetDecl upwards, we get to extSubset or includeSect. From includeSect to conditionalSect. From conditionalSect to extSubsetDecl.

Thus, we have two root cases to consider: doctypedecl and extSubset.

<!DOCTYPE html> matches doctypedecl but without the optional part that would match intSubset. Thus, we can't get from <!DOCTYPE html> to AttType via intSubset, so a non-CDATA declaration for an attribute can't have been read that way.

>From extSubset, follow backwards to this sentence: "The external subset, if any, MUST match the production for extSubset." Note that there's nothing else than the "external subset" that's defined to match extSubset. At this point, the XML spec isn't entirely rigorous in its definitions, so one has to make a leap by inference from what is said that the only way to get an "external subset" is via a reference that's ExternalID in the BNF. Going from ExternalID upwards, we can get to doctypedecl or to PEDef or to EntityDef. We can discard the doctypedecl case, since <!DOCTYPE html> doesn't match the optional ExternalID part of the production. From EntityDef we get to GEDecl. From GEDecl and PEDecl only to EntityDecl. From EntityDecl only to markupdecl that was already covered above.

QED.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 14 October 2010 08:15:43 UTC