RE: Quick Comments on Types/Encoding of XML

Hi Joseph, Don and others on the teleconference,

I was not completely with it yesterday because my wife's been in hospital
the day before (she had a miscarriage (which really hurts) with some
complications).

Anyway, I wanted to further clarify the 'input/output type' problem we were
discussing on the telecon.

It seems that there was some confusion because there were two issues at
hand. First was the charset encoding and second was the MIME type.

For charset encoding in the XML transformations, I solved the problem
(interestingly in the way that Peter Norman suggested).  Some on the call
indicated that this was not a solution they preferred.  OK, so we can tweak
it a bit, but the charset issue is solved.

By the way, the charset transformations require knowledge of how the
document type deals with changes in charset (unlike base64 which couldn't
care less).  I classified them as c14n algorithms to be defined in section
7.5 because XML c14n seems to have the precedent on charset changes, and we
should stick to the standards as much as possible.

For other types of documents, such as MIME type, it seemed exceedingly
pedantic to me to define an input type for the algorithm when the input type
is evident from the algorithm itself.  I do not believe that the same
transform can process multiple types of documents unless the transform
doesn't actually parse the document (like base64) in which case it does not
need to know the input type.  A JPEG decompressor just isn't going to work
on an XFDL form.

It is the responsibility of the entity that creates the transformation
sequence to match up the transformations based on the expected input and
output of the types of algorithms that *the entity* strings together.  The
signature system simply will not have need of this information.  If the
transformation code receives invalid data, it will generate errors (e.g. if
you push non-xml data into an xpath transform, the xml processor will blow
chunks).

Basically, the transformations I defined did not need this information at
all, and the application specific transformations are 1) quite unlikely to
need the type information, and 2) quite capable of placing the type
information in the Transformation element content even if they do need it.

It's really a question of who does the error checking.  When I say the type
information is not needed, I'm not saying that data of the wrong type
shouldn't generate errors, I'm just saying that the core signature syntax
and processing rules should not be burdened with trying to figure out the
input document type of a Transformation (which would be necessary in order
to check whether they match the recorded input type).

It has been said repeatedly that we need to be quite specific about the
processing rules of these transformations.  I was quite specific, but I
think some mistook my not restating the rules of XML or other specifications
for ambiguity.  I would encourage a reread of the section before further
discussion, and please address concerns about the actual proposed material
to me.  It doesn't make sense to have a telecon call where we spend 40
minutes discussing things that are solved either by the spec or implicitly
by the specs to which the spec refers.

Thanks,
John Boyer
Software Development Manager
UWI.Com -- The Internet Forms Company


Note, if the XML is anything other than UTF-8 or UTF-16, the encoding should
be present:

http://www.w3.org/TR/REC-xml#sec-guessing

        Because each XML entity not in UTF-8 or UTF-16 format must begin
with an
        XML encoding declaration, in which the first characters must be
'<?xml', any
        conforming processor can detect, after two to four octets of input,
which of the
        following cases apply.


_________________________________________________________
Joseph Reagle Jr.
Policy Analyst           mailto:reagle@w3.org
XML-Signature Co-Chair   http://w3.org/People/Reagle/

Received on Friday, 15 October 1999 13:09:17 UTC