Re: Options for dealing with IDs

On Thursday, January 9, 2003, 2:40:27 AM, noah wrote:


nuic> I'm somewhat nervous about option 6, and perhaps some of the others, as 
nuic> they relate to XML Schema.

I would certainly want any specification of these (as opposed to the
brief thumbnail sketches I gave) to say exactly how they relate to XML
Schema. Ideally, I would like XML Schema to treat such declarations as
contributing directly to the PSVI.

nuic> I think we have roughly the following situation.  XML Schema takes an 
nuic> Infoset as its input.  That Infoset may be produced by a non-validating 
nuic> XML processor, a DTD-validating XML processor, or might be a synthetic 
nuic> Infoset.

Okay. So, in the case that xml:idAttr (option 6) were added to XML
then the input Infoset would already contain decorations on those
attributes saying that they were of type xsd:ID (assuming a suitable
declaration of the xsd prefix).

nuic>  XML schema provides a built in datatype xsd:ID [1].  When used
nuic> in an XML schema as the type of an attribute XML Schema structures 
nuic> provides constraint checking analgous to what would happen with DTDs [2].

Yes. So, as the input Infoset would be well formed but not validated,
it might well be that the Schama validation stage would find that
validation constraints were failed. For example

<?xml version="1.0" encoding="UTF-8"?>
<foo xml:idAttr="pling">
        <toto pling="x1"/>
        <tata pling="x1"/>
        <titi pling="x1"/>
</foo>

is well formed, has attributes of type ID and will fail validation
because the value of the ID attribute is not unique.

nuic> As far as I know, this schema-level checking is independent of
nuic> any that might have been done per a DTD.

I believe so.

nuic> As a result of such a validation episode, the processor can
nuic> report in the PSVI that a given attribute has been determined to
nuic> be of type xsd:ID or xsd:IDREF. Furthermore, Schema introduces
nuic> the so-called identity constraint mechanism [3] (key/keyref)
nuic> which is more general than ID/IDREF; I think it's fair to say
nuic> that xsd:ID/xsd:IDREF is provided primarily for backwards
nuic> compatibility, in the sense of allowing reasonably
nuic> straightforward conversion of DTDs to schemas.

Yes; schema allows an element to have multiple keys and have the
identity checked on each of them separately.

nuic> So, schema largely reproduces XML 1.0 ID/IDREF, but it does so using an 
nuic> Infoset (which may have already been the result of DTD-validation) as 
nuic> input.  Thus the notions of ID in XML 1.0 and ID in Schema are 
nuic> intentionally similar, but in some sense duplicate each other.

I read this as meaning that, provided a spec for xml:id or xml:idAttr
were defined in terms of its effect on the Infoset, then W3C XML
Schema would be all set to handle such ID declarations without change.

nuic> Option 6 introduces: 

nuic>         xml:idAttr="name"

nuic> (where name is a sample value.)  Question: how should this interact with 
nuic> the mechanisms of XML schema.  What if a name attribute is declared as 
nuic> being of type xsd:Integer.

What happens today if a schema declares two different types for the
same attribute?

nuic>  Keep in mind that schema takes Infoset as 
nuic> input.  Does the assigned ID type now show up in that input infoset?

That would seem to be the right approach.

nuic> What are the right rules for schema processing? Should the type
nuic> reported in the PSVI be xsd:ID (or is there a separate
nuic> xml:IDtype?) for the name attribute?

The former, I would imagine. I don't see any benefit in a separate
type. What happens now if an instance has already been DTD validated
and some of its attributes are IDs? Do they show up as dtd:ID or
xsd:ID?

nuic> This all strikes me as a mess.

I don't see it as a mess; specifically i see it as less of a mess than
the "live with this mess" option.

nuic>  Whatever the other pros and cons of option
nuic> 6, I think these anomalies result in part from the fact that XML does not 
nuic> really offer the pluggability that would allow schema to participate as a 
nuic> first class replacement for DTDs.

I agree - its an optional, post-parsing, post-processing step.

nuic> Accordingly, schema does the best it can running at a separate
nuic> layer, but when we try to re-introduce typing at the XML level
nuic> as well we run the risk of complexity creeping in.

How is this different to the typing that can already occur during
parsing as part of DTD validation?

nuic> I could be
nuic> wrong, but if we go with option 6 I suspect we would probably
nuic> want to rev XML schema to take account of it (and there are all
nuic> sorts of deployement issues in reving XML schema, I would
nuic> think.)

As far as I read your arguments, if correctly specified in terms of
effect on the PPI (Post Parsing Infoset) option 6 would require zero
change to W3C XML Schema.



-- 
 Chris                            mailto:chris@w3.org

Received on Friday, 10 January 2003 10:13:17 UTC