- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 30 Mar 2007 13:50:00 +0000
- To: public-wai-ert@w3.org
- Cc: Jose Kahan <jose.kahan@w3.org>
- Message-ID: <20070330134929.GC4104@w3.org>
I've been geeking with Shadi and Jose a bit and come with some thoughts which need to be verified before they are taken as reason to launch missles. There are two versions of XML, 1.0 and 1.1 . 1.1's |CharData| and |Name|s are supersets of 1.0's. Therefor, any 1.0 can be encapsulated in 1.1 . Also, any 1.1 character my be represented in 1.0, but may require expression as a |CharRef|. Some 1.1 |Name|s are not expressible in 1.0 . http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-document [[ [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) [5] Name ::= NameStartChar (NameChar)* [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [28] doctypedecl::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>' [39] element ::= EmptyElemTag | STag content ETag [43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' ]] With a rule like Well-formed XML documents may be encapulsated within XMLLiterals in RDF/XML documents with an equal or greater xml version. we rule out expressing 1.1 |document|s in RDF/XML written in XML 1.0 ; probably an acceptable choice. An xml |document| may have up to 1 |prolog| (and, in fact, 1.1 documents do need a |prolog| with an |XMLdecl|, wierd). This, of course, requires special "encapsulation". Shadi explained that you were considering splitting the body up into the xml |prolog| and the rest of the document. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-XMLLiteral says that the XMLLiteral object is essentially a language tag and a canonical XML document¹ with no XML or DTD declarations. The XML declaration is superceded by the in the context of a encapsulating document, so I wouldn't write it down (the version and encoding are authoritative only in the encapsulating document). You are left with Misc* (doctypedecl Misc*)? but you only need to play record this extra attribute if there is a |doctypedecl|. Further, the trailing Misc* can go into the body, leaving only the leading |Misc|* and the |doctypedecl|. The |doctypedecl| can even be broken down to extract the rarely-seen internal subset into another element, and making it easy to look for XHMTL documents: <Body> <leadingMisc parseType="literal"> <?leading-PI?> </leadingMisc> <dtd publicId="..." systemId="..." /> <intSubset>greeting [ <!ELEMENT greeting (#PCDATA)> ]</intSubset> <rest parseType="literal">...</rest> </Body> Just some ideas -- have fun. ¹c14n must be expressed in utf-8 and XMLLiterals may be expressed in whatever encoding the parent document is in -- noting expressivity limitations above. -- -eric office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Friday, 30 March 2007 13:51:26 UTC