- From: John Cowan <cowan@mercury.ccil.org>
- Date: Tue, 24 Jul 2012 02:11:11 -0400
- To: James Clark <jjc@jclark.com>
- Cc: public-microxml <public-microxml@w3.org>
James Clark scripsit: > # Documents > document ::= s element s > # Elements > element ::= startTag content endTag > content ::= (element | dataChar | charRef)* > startTag ::= '<' name (s+ attribute)* s* '>' > endTag ::= '</' name s* '>' > # Attributes > attribute ::= name s* '=' s* attributeValue > attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"' > | "'" ((attributeValueChar - "'") | charRef)* "'" > attributeValueChar ::= char - ('<' | '&') IIRC, the only reason to allow > in attribute values was ffor compatibility with Canonical XML, but that was not listed in the goals. Either it should be (and I'm far from sure of that) or > should be excluded for simplicity and uniformity. > # Data characters > dataChar ::= char - ('<' | '&' | '>') > # Character references > charRef ::= decCharRef | hexCharRef | namedCharRef > decCharRef ::= '&#' [0-9]+ ';' Do we really need these? A lot of HTML has them, but they don't seem useful or necessary any more. I'd leave them out for the sake of minimalism. > hexCharRef ::= '&#x' [0-9a-fA-F]+ ';' > namedCharRef ::= '&' charName ';' > charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos' > # Names > name ::= nameStartChar nameChar* > nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | > [#xF8-#x2FF] | [#x370-#x37D] > | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | > [#x2C00-#x2FEF] > | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] > | [#x10000-#xEFFFF] > nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | > [#x0300-#x036F] | [#x203F-#x2040] Let's start with ASCII-only names, and add the Unicode names back if there's a compelling reason for them. nameStartChar = [A-Z] | [a-z] | "_" nameChar ::= nameStartChar | [0-9] | "-" | "." > # White space > s ::= #x9 | #xA | #xD | #x20 > # Characters > char ::= s | ([#x21-#x10FFFF] - forbiddenChar) > forbiddenChar ::= surrogateChar | #FFFE | #FFFF > surrogateChar ::= [#xD800-#xDFFF] In addition, UTF-8 as the only character encoding. > There are lots of different ways to describe the data model. Here's > one way of doing it, which is designed to be very close to JsonML. Looks good to me as a minimal model. > With this starting point, the list of features to consider adding > would be: Looks good. > I would suggest we discuss further the goals and the starting point, > and then consider each of these features. Okay. -- Do I contradict myself? John Cowan Very well then, I contradict myself. cowan@ccil.org I am large, I contain multitudes. http://www.ccil.org/~cowan --Walt Whitman, Leaves of Grass
Received on Tuesday, 24 July 2012 06:11:34 UTC