- From: James Clark <jjc@jclark.com>
- Date: Tue, 24 Jul 2012 14:41:39 +0700
- To: John Cowan <cowan@mercury.ccil.org>
- Cc: public-microxml <public-microxml@w3.org>
Taking out the things that John questions (all reasonably questionable in my view) leaves us with: # Documents document ::= s element s # Elements element ::= startTag content endTag content ::= (element | dataChar | charRef)* startTag ::= '<' name (s+ attribute)* s* '>' endTag ::= '</' name s* '>' # Attributes attribute ::= name s* '=' s* attributeValue attributeValue ::= '"' ((dataChar - '"') | charRef)* '"' | "'" ((dataChar - "'") | charRef)* "'" # Data characters dataChar ::= char - ('<' | '&' | '>') # Character references charRef ::= hexCharRef | namedCharRef hexCharRef ::= '&#x' [0-9a-fA-F]+ ';' namedCharRef ::= '&' charName ';' charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos' # Names name ::= nameStartChar nameChar* nameStartChar ::= [A-Z] | [a-z] | "_" nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] # White space s ::= #x9 | #xA | #xD | #x20 # Characters char ::= s | ([#x21-#x10FFFF] - forbiddenChar) forbiddenChar ::= surrogateChar | #FFFE | #FFFF surrogateChar ::= [#xD800-#xDFFF] > In addition, UTF-8 as the only character encoding. Yes, although I think I would like to have both the concept of - a well-formed MicroXML byte sequence, which would be encoded UTF-8 only, and - a well-formed MicroXML character sequence, for which encoding is irrelevant. The list of issues to consider then becomes: - empty element tags eg <foo/> - comments - bare DOCTYPE declaration eg <!DOCTYPE html> - namespaces/prefixes on elements/attributes - processing instructions - Unicode names for elements/attributes - allow > in attribute values for Canonical XML compatibility? - decimal character references James
Received on Tuesday, 24 July 2012 07:42:32 UTC