- From: John Cowan <cowan@mercury.ccil.org>
- Date: Tue, 24 Jul 2012 02:11:11 -0400
- To: James Clark <jjc@jclark.com>
- Cc: public-microxml <public-microxml@w3.org>
James Clark scripsit:
> # Documents
> document ::= s element s
> # Elements
> element ::= startTag content endTag
> content ::= (element | dataChar | charRef)*
> startTag ::= '<' name (s+ attribute)* s* '>'
> endTag ::= '</' name s* '>'
> # Attributes
> attribute ::= name s* '=' s* attributeValue
> attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
> | "'" ((attributeValueChar - "'") | charRef)* "'"
> attributeValueChar ::= char - ('<' | '&')
IIRC, the only reason to allow > in attribute values was ffor
compatibility with Canonical XML, but that was not listed in the goals.
Either it should be (and I'm far from sure of that) or > should be
excluded for simplicity and uniformity.
> # Data characters
> dataChar ::= char - ('<' | '&' | '>')
> # Character references
> charRef ::= decCharRef | hexCharRef | namedCharRef
> decCharRef ::= '&#' [0-9]+ ';'
Do we really need these? A lot of HTML has them, but they don't
seem useful or necessary any more. I'd leave them out for the sake
of minimalism.
> hexCharRef ::= '&#x' [0-9a-fA-F]+ ';'
> namedCharRef ::= '&' charName ';'
> charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'
> # Names
> name ::= nameStartChar nameChar*
> nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] |
> [#xF8-#x2FF] | [#x370-#x37D]
> | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
> [#x2C00-#x2FEF]
> | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
> | [#x10000-#xEFFFF]
> nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 |
> [#x0300-#x036F] | [#x203F-#x2040]
Let's start with ASCII-only names, and add the Unicode names back if there's
a compelling reason for them.
nameStartChar = [A-Z] | [a-z] | "_"
nameChar ::= nameStartChar | [0-9] | "-" | "."
> # White space
> s ::= #x9 | #xA | #xD | #x20
> # Characters
> char ::= s | ([#x21-#x10FFFF] - forbiddenChar)
> forbiddenChar ::= surrogateChar | #FFFE | #FFFF
> surrogateChar ::= [#xD800-#xDFFF]
In addition, UTF-8 as the only character encoding.
> There are lots of different ways to describe the data model. Here's
> one way of doing it, which is designed to be very close to JsonML.
Looks good to me as a minimal model.
> With this starting point, the list of features to consider adding
> would be:
Looks good.
> I would suggest we discuss further the goals and the starting point,
> and then consider each of these features.
Okay.
--
Do I contradict myself? John Cowan
Very well then, I contradict myself. cowan@ccil.org
I am large, I contain multitudes. http://www.ccil.org/~cowan
--Walt Whitman, Leaves of Grass
Received on Tuesday, 24 July 2012 06:11:34 UTC