- From: James Clark <jjc@jclark.com>
- Date: Tue, 24 Jul 2012 10:46:23 +0700
- To: public-microxml <public-microxml@w3.org>
We currently have 26 participants on this CG, which is rather more than I expected. For this CG to be successful, I think we are going to have to be careful to avoid the natural tendency for large groups to produce large specs. For this reason, I would suggest that we start something ultra-small and then add to it only if we get consensus. At the moment, three grammars have been proposed: - my first blog post: http://blog.jclark.com/2010/12/microxml.html - my second blog post: http://blog.jclark.com/2010/12/more-on-microxml.html - John Cowan's Editor's Draft: http://home.ccil.org/~cowan/MicroXML.html However, I think these all include features that a reasonable person might want to leave out . So here's my suggested starting point, which is a subset of the intersection of these three grammars. The goal is that this shouldn't have anything in it that anybody on the CG thinks they might want to leave out. I expect everybody (including me) will have stuff that they want to add. # Documents document ::= s element s # Elements element ::= startTag content endTag content ::= (element | dataChar | charRef)* startTag ::= '<' name (s+ attribute)* s* '>' endTag ::= '</' name s* '>' # Attributes attribute ::= name s* '=' s* attributeValue attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"' | "'" ((attributeValueChar - "'") | charRef)* "'" attributeValueChar ::= char - ('<' | '&') # Data characters dataChar ::= char - ('<' | '&' | '>') # Character references charRef ::= decCharRef | hexCharRef | namedCharRef decCharRef ::= '&#' [0-9]+ ';' hexCharRef ::= '&#x' [0-9a-fA-F]+ ';' namedCharRef ::= '&' charName ';' charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos' # Names name ::= nameStartChar nameChar* nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] # White space s ::= #x9 | #xA | #xD | #x20 # Characters char ::= s | ([#x21-#x10FFFF] - forbiddenChar) forbiddenChar ::= surrogateChar | #FFFE | #FFFF surrogateChar ::= [#xD800-#xDFFF] There are lots of different ways to describe the data model. Here's one way of doing it, which is designed to be very close to JsonML. This defines the data model as a grammar over a particular kind of tree. These trees have one atomic type, a character (equivalent to a Unicode code-point), and two composite types, arrays and maps. In the following, [...] denotes arrays, and {...} denotes maps: document ::= element element ::= [name, attributes, content] attributes ::= { (name => attributeValue)* } attributeValue = [ char* ] content ::= [ (char | element)* ] name ::= [ nameStartChar, nameChar* ] char, nameStartChar, nameChar ::= <single character as in grammar for concrete syntax> With this starting point, the list of features to consider adding would be: - empty element tags eg <foo/> - comments - bare DOCTYPE declaration eg <!DOCTYPE html> - namespaces/prefixes on elements/attributes - processing instructions Note that all of these features have implications for the data model and/or HTML5-friendliness. I would suggest we discuss further the goals and the starting point, and then consider each of these features. James
Received on Tuesday, 24 July 2012 03:47:12 UTC