W3C home > Mailing lists > Public > public-wai-ert@w3.org > March 2007

some HTTP encapsulation thoughts

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 30 Mar 2007 13:50:00 +0000
To: public-wai-ert@w3.org
Cc: Jose Kahan <jose.kahan@w3.org>
Message-ID: <20070330134929.GC4104@w3.org>
I've been geeking with Shadi and Jose a bit and come with some
thoughts which need to be verified before they are taken as reason to
launch missles.

There are two versions of XML, 1.0 and 1.1 . 1.1's |CharData| and
|Name|s are supersets of 1.0's. Therefor, any 1.0 can be encapsulated
in 1.1 .  Also, any 1.1 character my be represented in 1.0, but may
require expression as a |CharRef|. Some 1.1 |Name|s are not
expressible in 1.0 .

[1]  document   ::= (  prolog  element  Misc*  )   -  (  Char* RestrictedChar    Char*  )  
[5]  Name	::= NameStartChar (NameChar)*
[22] prolog	::= XMLDecl  Misc* (doctypedecl  Misc*)?
[23] XMLDecl	::= '<?xml' VersionInfo  EncodingDecl? SDDecl? S? '?>'
[28] doctypedecl::= '<!DOCTYPE' S  Name (S  ExternalID)? S? ('[' intSubset ']' S?)? '>'
[39] element	::= EmptyElemTag
		    | STag content ETag 
[43] content	::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*
[66] CharRef	::= '&#' [0-9]+ ';'
		    | '&#x' [0-9a-fA-F]+ ';'

With a rule like
  Well-formed XML documents may be encapulsated within XMLLiterals in
  RDF/XML documents with an equal or greater xml version.
we rule out expressing 1.1 |document|s in RDF/XML written in XML 1.0 ;
probably an acceptable choice.

An xml |document| may have up to 1 |prolog| (and, in fact, 1.1
documents do need a |prolog| with an |XMLdecl|, wierd). This, of
course, requires special "encapsulation". Shadi explained that you
were considering splitting the body up into the xml |prolog| and the
rest of the document.

says that the XMLLiteral object is essentially a language tag and a
canonical XML document┬╣ with no XML or DTD declarations. The XML
declaration is superceded by the in the context of a encapsulating
document, so I wouldn't write it down (the version and encoding are
authoritative only in the encapsulating document).

You are left with 
  Misc* (doctypedecl  Misc*)?
but you only need to play record this extra attribute if there is a
|doctypedecl|. Further, the trailing Misc* can go into the body,
leaving only the leading |Misc|* and the |doctypedecl|. The
|doctypedecl| can even be broken down to extract the rarely-seen
internal subset into another element, and making it easy to look for
XHMTL documents:

    <leadingMisc parseType="literal"> <?leading-PI?>   </leadingMisc>
    <dtd publicId="..." systemId="..." />
    <intSubset>greeting [
&lt;!ELEMENT greeting (#PCDATA)&gt;
    <rest parseType="literal">...</rest>

Just some ideas -- have fun.

┬╣c14n must be expressed in utf-8 and XMLLiterals may be expressed in
 whatever encoding the parent document is in -- noting expressivity
 limitations above.

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA

Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Friday, 30 March 2007 13:51:26 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:55:55 UTC