Re: Disagreement with Component Designators basic assumption from Jacek Kopecky on 2003-01-20 (www-xml-schema-comments@w3.org from January to March 2003)

From: Jacek Kopecky <jacek@systinet.com>
Date: 20 Jan 2003 20:11:20 +0100
To: holstege@mathling.com
Message-Id: <1043085103.9560.194.camel@krava.in.idoox.com>
Dear Mary,

it may be that I'm missing some usecase of yours. Can you please point
me to the pertinent usecase descriptions? I only see two basic usecases
- refer to a component in order to use it or refer to it in order to
speak about it.

I also read two W3C mailing list threads [1, 2] on the TAG issue of
converting QNames to URIs [3]. I see there are different positions on
this issue but I still agree with Dan Connolly et al. as far as I
understand their messages in these threads. When I read the SCD
document, nothing I saw made me rethink my stance.

Mainly, I'm thinking: I understand why we want to turn QNames into URIs
(like for RDF purposes) - because QNames are used so often for
identification which is the main purpose of URIs. I don't understand why
we want to be able to use URIs for identification of things which even
in their native environments are *not* identified with a single
context-less identifier. I don't understand *at all* why anonymous
things should be referencable directly. Why not just give them names?

Simple things should be simple, complex things should be doable. The SCD
draft takes the other approach - all should be done in the same way,
however complex it may be.

Further, I believe that the attribute {...}foo and the element {...}foo
and the complex type {...}foo are only aspects of the same thing. No XML
language I've known so far gives me a reason to think otherwise - for
example if an attribute and an element had common properties which were
commonly defined different for attribute and element with the same name,
I would be thinking of a compromise. (see below my signature in a P.S.)

Please see my further replies inside your email below.

Because it seems that the TAG issue now awaits input from XML Schema
(which should be satisfied by the SCD document), I'm trying to affect
that in what I believe is the better direction.

I'd like to post these thoughts to the widest interested audience, which
mailing list do you think would be appropriate?

Best regards,

                   Jacek Kopecky

                   Senior Architect, Systinet Corporation
                   http://www.systinet.com/


[1] http://lists.w3.org/Archives/Public/www-webont-wg/2002Feb/0028
[2] http://lists.w3.org/Archives/Public/www-tag/2002May/0048.html
[3] http://www.w3.org/2001/tag/ilist#rdfmsQnameUriMapping-6

P.S: To be able to say that two global components with the same QName
only define two aspects of the same thing, we'd have to solve property
clashes. Type definitions and element definitions have no clashing
properties apart from the annotations, which I don't believe to be a
problem. 

Element definitions and attribute definitions clash in type definition,
value constraints and annotation. To solve this, I'd be speaking of
attribute type defns, attribute value constraints, element type defns
and element value constraints; but there are people who would remove the
distinction between attributes and elements altogether and in that case
the element and attribute qnames wouldn't be allowed to conflict,
removing all clashes.



On Fri, 2003-01-17 at 18:19, Mary Holstege wrote:
> One option on the table, which we have considered, is to define some kind of
> "canonicalized" XML representation for XML Schemas, that removes the syntactic
> conveniences that take the transfer syntax away from the component model. We
> could then define XPaths wrt _that_. OTOH, this may just be too confusing, and
> doesn't actually solve all the use cases.

Which ones?

> > In XML Schema itself, components are designated with their names
> > (expanded name - with the namespace) and the symbol space where the name
> > should be looked for is known from the context. Why is it that other
> > uses need more insight into a schema than XML Schema itself? In XML
> > Schema, if an element declaration needs to be pointed to, it is defined
> > on the top level. Why is there a need to designate an anonymous type
> > definition, for example?
> 
> Type references for anonymous types are fairly crucial for a number of use
> cases, including schema formalization and schema processor comparison. The use
> cases and rationales are merely sketched in the draft, and obviously need some
> attention. 

In RDF, anonymous nodes cannot be pointed to directly except in the
document which actually defines them. I cannot point with a simple URI
to an anonymous node in your RDF graph. By not giving the node a name,
you forbade me to speak about that node directly.

How is that different in XML Schema? Why does an XML Schema anonymous
type need to be referenced? If it does, why not just name it? Legacy and
ownership problems? We'll never get rid of these, and do we want to make
component designation so unspekably complex just because it will reduce
these problems *a bit*? It would make the usual case (where legacy and
ownership are not a problem) very confusing and complex.

> There are a number of use cases, many of which fall into these broad classes:
> * type references 
>   Being able to refer to any type in a composed XML Schema. XML Query has a
>   number of use cases here, and anonymous types are definitely in view.

See above, name the types you want to refer to or refer to them
indirectly (the canonical schema form and XPath would do here).

> * out-of-band schema annotation
>   Associating semantics or other information with pieces of a schema where
>   you can't or don't want to touch the text of the schema document (or you
>   don't have a schema _document_ at all)

Same as above.

> * schema analysis
>   Example: Take a dump of the PSVI along the lines generated by XSV and use it
>   to compare processors or different versions of a processor. Example: take a
>   composed schema and dump all the SCDs of all the components. Use this as a
>   basis to compare whether two composed schemas are, in some sense, the "same".

You only need a common output format. Granted, the SCDs in the present
form solve the case, but they *misuse* URIs where a non-URI string value
format specification is sufficient.

> > I think the basic question is whether or not two symbols with the same
> > name and in the same namespace are related. 
> >
> > For example a complex type and an element {http://example.org/}date,
> > which I will from now on write as the qname ns:date. In my opinion the
> > two are related. In RDF/XML (one of the usecases for component
> > designators) the URI form of the qname above would be
> > http://example.org/date, a simple concatenation of the namespace name
> > and the local part.
> 
> The problem with this is that it doesn't meet the use cases for referring to
> components that have no name. It is the ability to refer to these components in
> a uniform and consistent way that is the key motivator for SCDs. 

It is promoting the good practice of giving identifiers to the things
that may need to be identified that is the key motivator for me here.
8-)

> It also doesn't allow one to distinguish the type named 'foo' in the namespace
> "http://example.org" from the element named 'foo' in the same namespace, so
> even if you focused on only named components, you'd still want a symbol space
> marker of some kind.

I say you either don't want to make the distinction or that the
distinction is clear from the context.

> > My position is that if something is being said about ns:date, it either
> > only pertains to one component type (e.g. the supertype name on ns:date
> > simple type or the substitution group of ns:date element), or it
> > shouldn't matter which it is (e.g. the publisher of the definitions).
> 
> For many use cases that is likely so. On the other hand, some of the schema
> analysis and out-of-band annotation use cases are not well-served by such a
> model.

I disagree. I think schema analysis (as described above) misuses URIs
and I think out-of-band annotation can refer directly to named global
things and indirectly to others.

> > If we accept the current way of thinking represented in the Component
> > Designators draft, we will end up with great number of possible URIs
> > representing a single expanded name: XML Schema Component Designators,
> > WSDL component designators and more for every language and symbol space
> > for symbols named with an expanded name.
> 
> Schema component designate components, not expanded names. To the extent that
> expanded names refer to different schema components, then, yes, there is more
> than one schema component designator. But each schema component has only one
> canonical schema component designator. 

I was trying to say that what started in part as an effort to unite the
two identifier worlds of URIs and QNames (both pretty simple concepts)
may destroy the simplicity which would be a great loss. We don't want
another simplifying revolution in 10 years (like XML was a few years
ago) if we can avoid it.

> > To simplify all (at no cost to generality, IMO) we should accept and
> > promote the idea that one expanded name means one thing (with possibly
> > many aspects). If two things are being defined, they ought to be named
> > differently.
> 
> Doesn't this say that one should not be able to define elements and
> attributes (for example) with the same name if they are in the same namespace?
> That seems like quite a burden to impose on users of XML.

No, it does not - I'm not saying that an attribute and an element are
necessary two distinct things, they are merely aspects of one or two
things, depending on their relation.

> > I propose we stick to the RDF/XML way of turning an expanded name into a
> > URI by concatenation.
> 
> Perhaps. The rationale for not doing this was to leverage the XPointer
> framework, and because that style reduces the length of the string when you get
> beyond simple references to named components. But it is worth considering, as
> it makes it easier to perform simple string comparisons to test equivalence.

Making something complex because we have to use a product in search of
more deployment is not a good thing. And current SCD style increases the
length of the string on simple references, which doesn't really matter
because IMO it's all wrong anyway. 8-)

Jacek
Received on Monday, 20 January 2003 14:11:31 UTC