some HTTP encapsulation thoughts from Eric Prud'hommeaux on 2007-03-30 (public-wai-ert@w3.org from March 2007)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 30 Mar 2007 13:50:00 +0000
To: public-wai-ert@w3.org
Cc: Jose Kahan <jose.kahan@w3.org>
Message-ID: <20070330134929.GC4104@w3.org>
I've been geeking with Shadi and Jose a bit and come with some
thoughts which need to be verified before they are taken as reason to
launch missles.


There are two versions of XML, 1.0 and 1.1 . 1.1's |CharData| and
|Name|s are supersets of 1.0's. Therefor, any 1.0 can be encapsulated
in 1.1 .  Also, any 1.1 character my be represented in 1.0, but may
require expression as a |CharRef|. Some 1.1 |Name|s are not
expressible in 1.0 .

http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-document
[[
[1]  document   ::= (  prolog  element  Misc*  )   -  (  Char* RestrictedChar    Char*  )  
[5]  Name ::= NameStartChar (NameChar)*
[22] prolog ::= XMLDecl  Misc* (doctypedecl  Misc*)?
[23] XMLDecl ::= '<?xml' VersionInfo  EncodingDecl? SDDecl? S? '?>'
[28] doctypedecl::= '<!DOCTYPE' S  Name (S  ExternalID)? S? ('[' intSubset ']' S?)? '>'
[39] element ::= EmptyElemTag
      | STag content ETag 
[43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*
[66] CharRef ::= '&#' [0-9]+ ';'
      | '&#x' [0-9a-fA-F]+ ';'
]]

With a rule like
  Well-formed XML documents may be encapulsated within XMLLiterals in
  RDF/XML documents with an equal or greater xml version.
we rule out expressing 1.1 |document|s in RDF/XML written in XML 1.0 ;
probably an acceptable choice.

An xml |document| may have up to 1 |prolog| (and, in fact, 1.1
documents do need a |prolog| with an |XMLdecl|, wierd). This, of
course, requires special "encapsulation". Shadi explained that you
were considering splitting the body up into the xml |prolog| and the
rest of the document.

http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-XMLLiteral
says that the XMLLiteral object is essentially a language tag and a
canonical XML document¹ with no XML or DTD declarations. The XML
declaration is superceded by the in the context of a encapsulating
document, so I wouldn't write it down (the version and encoding are
authoritative only in the encapsulating document).

You are left with 
  Misc* (doctypedecl  Misc*)?
but you only need to play record this extra attribute if there is a
|doctypedecl|. Further, the trailing Misc* can go into the body,
leaving only the leading |Misc|* and the |doctypedecl|. The
|doctypedecl| can even be broken down to extract the rarely-seen
internal subset into another element, and making it easy to look for
XHMTL documents:

  <Body>
    <leadingMisc parseType="literal"> <?leading-PI?>   </leadingMisc>
    <dtd publicId="..." systemId="..." />
    <intSubset>greeting [
&lt;!ELEMENT greeting (#PCDATA)&gt;
]</intSubset>
    <rest parseType="literal">...</rest>
  </Body>

Just some ideas -- have fun.


¹c14n must be expressed in utf-8 and XMLLiterals may be expressed in
 whatever encoding the parent document is in -- noting expressivity
 limitations above.
-- 
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Friday, 30 March 2007 13:51:26 UTC