Re: Starting point from John Cowan on 2012-07-24 (public-microxml@w3.org from July 2012)

From: John Cowan <cowan@mercury.ccil.org>
Date: Tue, 24 Jul 2012 02:11:11 -0400
To: James Clark <jjc@jclark.com>
Cc: public-microxml <public-microxml@w3.org>
Message-ID: <20120724061111.GN31596@mercury.ccil.org>

James Clark scripsit:

> # Documents
> document ::= s element s
> # Elements
> element ::= startTag content endTag
> content ::= (element | dataChar | charRef)*
> startTag ::= '<' name (s+ attribute)* s* '>'
> endTag ::= '</' name s* '>'
> # Attributes
> attribute ::= name s* '=' s* attributeValue
> attributeValue ::= '"' ((attributeValueChar - '"') | charRef)* '"'
>                  | "'" ((attributeValueChar - "'") | charRef)* "'"
> attributeValueChar ::= char - ('<' | '&')

IIRC, the only reason to allow > in attribute values was ffor
compatibility with Canonical XML, but that was not listed in the goals.
Either it should be (and I'm far from sure of that) or > should be
excluded for simplicity and uniformity.

> # Data characters
> dataChar ::= char - ('<' | '&' | '>')
> # Character references
> charRef ::= decCharRef | hexCharRef | namedCharRef
> decCharRef ::= '&#' [0-9]+ ';'

Do we really need these?  A lot of HTML has them, but they don't
seem useful or necessary any more.  I'd leave them out for the sake
of minimalism.

> hexCharRef ::= '&#x' [0-9a-fA-F]+ ';'
> namedCharRef ::= '&' charName ';'
> charName ::= 'amp' | 'lt' | 'gt' | 'quot' | 'apos'
> # Names
> name ::= nameStartChar nameChar*
> nameStartChar ::= [A-Z] | [a-z] | "_" | [#xC0-#xD6] | [#xD8-#xF6] |
> [#xF8-#x2FF] | [#x370-#x37D]
>                 | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
> [#x2C00-#x2FEF]
>                 | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
> | [#x10000-#xEFFFF]
> nameChar ::= nameStartChar | [0-9] | "-" | "." | #xB7 |
> [#x0300-#x036F] | [#x203F-#x2040]

Let's start with ASCII-only names, and add the Unicode names back if there's
a compelling reason for them.

nameStartChar = [A-Z] | [a-z] | "_"

nameChar ::= nameStartChar | [0-9] | "-" | "."

> # White space
> s ::= #x9 | #xA | #xD | #x20
> # Characters
> char ::= s | ([#x21-#x10FFFF] - forbiddenChar)
> forbiddenChar ::= surrogateChar | #FFFE | #FFFF
> surrogateChar ::= [#xD800-#xDFFF]

In addition, UTF-8 as the only character encoding.

> There are lots of different ways to describe the data model. Here's
> one way of doing it, which is designed to be very close to JsonML.

Looks good to me as a minimal model.

> With this starting point, the list of features to consider adding
> would be:

Looks good.

> I would suggest we discuss further the goals and the starting point,
> and then consider each of these features.

Okay.

-- 
Do I contradict myself?                         John Cowan
Very well then, I contradict myself.            cowan@ccil.org
I am large, I contain multitudes.               http://www.ccil.org/~cowan
        --Walt Whitman, Leaves of Grass

Received on Tuesday, 24 July 2012 06:11:34 UTC