- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 30 Mar 2007 13:50:00 +0000
- To: public-wai-ert@w3.org
- Cc: Jose Kahan <jose.kahan@w3.org>
- Message-ID: <20070330134929.GC4104@w3.org>
I've been geeking with Shadi and Jose a bit and come with some
thoughts which need to be verified before they are taken as reason to
launch missles.
There are two versions of XML, 1.0 and 1.1 . 1.1's |CharData| and
|Name|s are supersets of 1.0's. Therefor, any 1.0 can be encapsulated
in 1.1 . Also, any 1.1 character my be represented in 1.0, but may
require expression as a |CharRef|. Some 1.1 |Name|s are not
expressible in 1.0 .
http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-document
[[
[1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
[5] Name ::= NameStartChar (NameChar)*
[22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)?
[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[28] doctypedecl::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>'
[39] element ::= EmptyElemTag
| STag content ETag
[43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*
[66] CharRef ::= '&#' [0-9]+ ';'
| '&#x' [0-9a-fA-F]+ ';'
]]
With a rule like
Well-formed XML documents may be encapulsated within XMLLiterals in
RDF/XML documents with an equal or greater xml version.
we rule out expressing 1.1 |document|s in RDF/XML written in XML 1.0 ;
probably an acceptable choice.
An xml |document| may have up to 1 |prolog| (and, in fact, 1.1
documents do need a |prolog| with an |XMLdecl|, wierd). This, of
course, requires special "encapsulation". Shadi explained that you
were considering splitting the body up into the xml |prolog| and the
rest of the document.
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-XMLLiteral
says that the XMLLiteral object is essentially a language tag and a
canonical XML document¹ with no XML or DTD declarations. The XML
declaration is superceded by the in the context of a encapsulating
document, so I wouldn't write it down (the version and encoding are
authoritative only in the encapsulating document).
You are left with
Misc* (doctypedecl Misc*)?
but you only need to play record this extra attribute if there is a
|doctypedecl|. Further, the trailing Misc* can go into the body,
leaving only the leading |Misc|* and the |doctypedecl|. The
|doctypedecl| can even be broken down to extract the rarely-seen
internal subset into another element, and making it easy to look for
XHMTL documents:
<Body>
<leadingMisc parseType="literal"> <?leading-PI?> </leadingMisc>
<dtd publicId="..." systemId="..." />
<intSubset>greeting [
<!ELEMENT greeting (#PCDATA)>
]</intSubset>
<rest parseType="literal">...</rest>
</Body>
Just some ideas -- have fun.
¹c14n must be expressed in utf-8 and XMLLiterals may be expressed in
whatever encoding the parent document is in -- noting expressivity
limitations above.
--
-eric
office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Friday, 30 March 2007 13:51:26 UTC