RE: Comments on XML Schemas: Structures (long) from Murray Maloney on 1999-06-09 (www-xml-schema-comments@w3.org from April to June 1999)

From: Murray Maloney <murray@muzmo.com>
Date: Wed, 09 Jun 1999 12:50:57 -0400
To: Ronald Bourret <rbourret@ito.tu-darmstadt.de>
Cc: "'Murray Maloney'" <murray@muzmo.com>, "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Message-Id: <3.0.1.32.19990609125057.0071fb70@mail.muzmo.com>
At 11:53 AM 6/9/99 +0200, Ronald Bourret wrote:
>Murray Maloney wrote:
>> Nothing in XML Schema prevents definition of entities and notations
>> in an instance. The ability to define them in a schema and an instance
>> is preserved from XML. Allowing this mixture is the status quo in XML.
>
>The status quo (which is not sacred, as things such as nearly well-formed 
>XML show) is broken here and should be fixed.  The physical layout of an 
>XML document has no more to do with the logical schema to which that 
>document conforms than the physical layout of database data on a disk has 
>to do with the logical tables and columns into which that data is arranged. 
>DTDs mix these concepts together and schemas are our best chance for 
>separating them.

OK, so you are arguing against the status quo on the basis that 
the XML Schema should not propogate what you consider to have been
an poor design choice in SGML/XML.

I don't agree, but I do understand. Others share your sentiment.
>
>> Actually, both XML and SGML provide for a combined collection of names
>> for all instance entities. That is, unparsed entities and both forms of
>> parsed entities share a single set of names.
>
>This is not true for parameter and general entities. To quote the fifth 
>paragraph of section 4 of the XML spec, "Furthermore, they occupy different 
>namespaces; a parameter entity and a general entity with the same name are 
>two distinct entities." But you are correct that unparsed entities share 
>the same namespace with parsed general entities.  I finally noticed that 
>the second sentence in 4.2.2 refers to them as "general unparsed entity," 
>which means they share the same namespace.

Right. But XML Schema says nothing about parameter entities.

>I disagree that there is no particular need to specify the expected root 
>beforehand. In fact, many applications expect a particular root.  For 
>example, if I write a module that reads a schema and validates an instance 
>document against that schema, that module clearly expects the root element 
>of the schema to be <schema> and will throw an error if it is not.

And, if it does not get <schema> as the root, it will reject the document,
won't it? What difference would it make if the schema for schema said
that schema was the root element? And, is there only one root?

If there is only one root for a given schema, then it is detectable
by inspection of the schema.

If any/many of the elements in the schema can serve as the root, then
what advantage is there in declaring that?
>
>Thus, the root element type is part of the expected structure of the 
>document and part of its logical schema. Put another way, we can add an 
>optional root element type declaration to schemas and move that part of 
>validation to a generic processor, or we can leave it out and force 
>applications to validate the root element type themselves, just like they 
>do with data types today.

Sorry, I still don't quite see this. The root element of any wf-document
is self-evident. Starting with that element, that wf-fragment is either
valid according to the schema or it is not.

I am assuming that any wf-fragment can be validated.
>The social reason is precisely my point. This is entirely a user interface 
>issue. The abbreviation-plus-colon-plus-name is easier to read, easier to 
>write, and provides a familiar entry point for people who know namespaces. 
>Of the following three choices, I find the third by far the easiest to read 
>and write:
>
>   i) <elementTypeRef name="bar" schemaAbbrev="foo">
>   ii) <elementTypeRef name="bar" schemaName="http://foo">
>   iii) <elementTypeRef name="foo:bar">

I find ii) to be the most precise expression of intent, and it has
the virtue of representing what I expect to find in the Info Set.
See more...
>
>>
>> 	<elementTypeRef name="HTML:BLOCKQUOTE"/>
>> 	<archetypeRef   name="HTML:BLOCKQUOTE"/>
>> 	<attrGroupRef   name="HTML:BLOCKQUOTE"/>
>> 	<modelGroupRef  name="HTML:BLOCKQUOTE"/>
>>
>> According to 'Namespaces in XML', these are all the same name.
>
>These are definitely the same according to 'Namespaces in XML', as 
>namespaces do not apply to element or attribute values, but that is beside 
>the point. 

I don't think that it is beside the point. 'Namespaces in XML'
was developed simply because there was no way *in an XML instance
or XML DTD* to specify a full URI as an element or attribute name.
The 'name' rules prevented it. 

XML Schema does not suffer from the limitations of XML DTDs and is
friendly with other specs in the XML family. DOM/XSL/InfoSet all
consider an XML element/attribute name to consist of the 'local 
part' and 'URI part'. The 'prefix part' is used for as long as it
is needed, but the 'URI part' survives. So, it seems appropriate that 
an XML Schema should be allowed to support direct URI specification, 
rather than having to always declare a prefix and use it.

So, if we allow 'URI part' naming. Then we need to add 'prefix part'.
Next thing you know, there are three ways to name something.

Not only that, but a QName is an aggregate datatype -- which most pure 
dataheads abhor. Not only that, it has both name and link semantics 
associated with it.

>What I am suggesting is that schemas borrow a familiar, 
>easy-to-use mechanism and apply it to a similar situation -- that of naming 
>values that will be used as markup in instance documents.

I understand what you are suggesting, and the current spec allows
one to use the 'prefix part' and the 'local part'. But it does not
use an aggregate QName datatype.
>
>> >Section 3.4.2 -- Archetype Definition
>> Providing for default and fixed values for element content is
>> useful in designing applications that 'fill themselves in'.
>> A document creation tool, such as an editor or invoice generator,
>> can automatically insert content to satisfy local constraints
>> on a public schema.
>>
>> Furthermore, this eliminates another of the differences between
>> elements and attributes. When deciding between using an element
>> or an attribute, the primary consideration is whether or not the
>> value is allowed to contain element content, and whether or not
>> there may be more than one instance on/in an element.
>
>Good point.  So when is the element default applied?  When the element 
>isn't there or when it is empty?

Application conventions. An editing application might automatically
fill in all required elements and attributes and defaults. A server
might similarly expand an incoming or outgoing document with metadata.
If the an element or attribute is empty, then the default would apply
if the datatype did not allow an empty value, such as a REQUIRED NMTOKEN.
>
>> >Section 3.4.9 -- Element Type Declarations
>> >Locally-scoped element type names break XML 1.0 validity and are 
>probably
>> >not worth the confusion they will cause -- remember that most document
>> >authors are not programmers and are not likely to understand scoping. I
>> >suggest you delete them.
>>
>> Preserving XML 1.0 validity is not a requirement.
>>
>> The WG seems to split on this question. I tend to agree with Ron,
>> but I would not lay down in the road.
>
>Fair enough about 1.0 validity -- that was really a bogus argument on my 
>part.  My main objection is that I don't think this is worth the confusion 
>it will cause.

I agree with the confusion argument. Another consideration is whether
getting all of the details right will make adoption of a schema happen
sooner or later.
>
>> >Section 4.1 -- Associating Instance Document Constructs with 
>Corresponding
>> >Schemas
>> >What is the relationship between schemaIdentity, schemaName, and the
>> >namespace URI? My guess is that all should be the same, but this is 
>never
>> >stated.
>>
>> The schemaIdentity, spelled <schema name="..." ... >, is the URI
>> of the current schema. The schemaName="..." property is a URI
>> that is a reference to a schema. The schemaAbbrev property is
>> an 'NCname' that is a reference to an imported schemaName.
>
>Unfortunately, the spec does not state how schemaName refers to a schema. 
>The reasonable assumption is that the schemaName in an import statement 
>must be identical to the schemaIdentity in the imported schema. However, 
>there is nothing in the spec to prevent the vastly-less-useful 
>interpretation that schemaName is simply a URI that locally identifies the 
>imported schema and that the same schema could be imported into different 
>schemas using different schemaNames.  For example:
>
>myLocalCopyOfFoo.xsd:
>   <schema name="http://foo">...</schema>
>
>yourLocalCopyOfFoo.xsd:
>   <schema name="http://foo">...</schema>
>
>Schema A:
>   <import schemaAbbrev="foo" schemaName="myLocalCopyOfFoo.xsd"/>
>
>Schema B:
>   <import schemaAbbrev="foo" schemaName="yourLocalCopyOfFoo.xsd"/>
>
>The same problem exists in the relationship between schemaIdentity and the 
>URI used in namespace declarations.

I'll have to think on that one.
>
>> >Section 4.7 -- Schema Inclusion
>> >a) Does schema inclusion solve any problems that can't be solved with
>> >external entities? If not, delete it.
>>
>> First, we use XML instance syntax for all constructs, thereby ensuring
>> that we can build tools with standard XML components such as the DOM,
>> the Information Set, and even SAX.
>>
>> 	<include schemaName='myOtherSchema'/>
>>
>> Second, it reduces the number of steps from two (declare/reference):
>>
>> 	<!ENTITY myOtherSchema 'myOtherSchema.xsd'>
>> 	&myOtherSchema;
>
>I'm still not convinced.  This forces schema processors to do work that 
>would otherwise be done for them by the parser.  Can you give me an example 
>showing how this benefits the user in a way that external entities cannot? 
> (Saving one line is not enough, nor is preserving physical document 
>structure with SAX, as that was never a goal of SAX and will probably be 
>solved with SAX2 anyway.)

-- There has been plenty of agitation for XML processors that simply 
   do not deal with entities in any way shape or form. 
-- Using element/attributes gives me a manageable typed link.
-- Using entities generalizes a specific schema function.
-- I see no reason not to use element/attribute syntax 
-- My processor can keep track of includes as a class

That's not to say that you can't use entities if you want.

>
>> >Section 4.8 -- Access to Schemata
>> The separation of schemaName from schemaLocation is application- and
>> processing environment-specific. Whether through URN lookup mechanisms,
>> or OASIS catalogs, or URL-redirection and content negotiation, the
>> schemaLocation must be handled outside of the schema -- in my opinion
>
>I half-way agree with you. It is much better theoretical design to just use 
>schema names to refer to schemas. Unfortunately, this means that I can't 
>ever write a schema that can be reliably used by all schema processors. 
>This is a major hole and could at least partially be solved by introducing 
>separate location information.

We are only trying to write a schema that will work with processors
that understand this schema language. It is not up to the schema
language to decide 'a priori' how to resolve the name of a schema.
>
>I'm very open to other suggestions, but leaving it to the marketplace means 
>that people will simply name their schemas with the URL of the schema 
>location, which is both non-scaleable and requires schema processors to be 
>connected to the Web.

But it will work for those that choose to use a URL, and it will also
work for those who choose to use URNs, proxies, and registries.
>
>> >Section 3.4.7 -- Element-Only Content
>> >The example states that the default of maxOccur is 1. In fact, maxOccur 
>has
>> >no default.

Right. My mistake. I misread you.

>> >Section 3.6 -- Entities and Notations
>> >Why are notations included with entities?  Entities are a physical
>> >construct and notations are a logical construct.
>>
>> Notations are (mostly) used with external unparsed entities.
>> While notations may also be used to specify the content of
>> an elements (qua XML and SGML), they are hardly ever used
>> that way.
>
>Perhaps not, but they nevertheless remain a logical construct, not a 
>physical construct.

Do you think that they should be included with datatypes, perhaps?



----------------------------------------------------------
Murray Maloney, Esq.          Phone: (905) 509-9120
Muzmo Communication Inc.      Fax:   (905) 509-8637
671 Cowan Circle              Email: murray@muzmo.com
Pickering, Ontario 		Web:   http://www.muzmo.com
Canada, L1W 3K6
Received on Wednesday, 9 June 1999 12:59:52 UTC