- From: Simon St.Laurent <simonstl@simonstl.com>
- Date: Wed, 07 Mar 2001 17:58:35 -0500
- To: www-xml-schema-comments@w3.org
I sent this to xml-dev, but I don't know how many schema folk follow that busy forum. I don't know how interested people will be in the specific proposal I offer here, but I hope it will at least be worthy of some thought. ---------------------------------------- Using namespace-qualified identifiers (QNames) for type identification seems to introduce some significant difficulties while only saving a few keystrokes. This proposal suggests using bare URIs rather than QNames to improve interoperability and extensibility. [I've long been a critic of the (lack of) URI structure, notably on XML-URI last summer and on various IETF lists. While I still have plenty of reservations about URI structure and syntax, the basic idea is more and more intriguing, and I'm probably going to have to eat a few of my past words in making this proposal.] At present, the typing mechanism in W3C XML Schema is both extremely extensible and deeply constrained. W3C XML Schema Datatypes [1] provides a family of primitive datatypes and mechanisms for extending them through facets for defining atomic types, while W3C XML Schema Structures [2] allows developers to create molecules from these sets of atoms. Types, whether built-in or created by the designer, are assigned names which are referenced with namespace-qualified names (type="QName"). Types have a URI component, which application must derive from the namespace declarations in the document. They also have a local name, separate from the URI component, which identifies the particular type in the list of types associated with that namespace URI. Prefixes are used as an abbreviation mechanism. This creates a number of interesting problems for XML Schemas on a number of levels. The first problem is caused by the use of namespace prefixes within attribute values, which requires applications to maintain additional information about prefix-namespace mapping. This is certainly allowed by the Namespaces in XML spec [3], but is an extension of the capability provided there and this support isn't entirely "natural" to some views of the namespace specification. The second problem may not appear to be a problem when type structures are viewed entirely within the context of W3C XML Schema. Definining a type requires the use of W3C XML Schema syntax, and the inclusion of that declaration within the schema in order that both its namespace URI and it's local name can be assimilated with the larger schema. This creates a barrier to other schema approaches which choose to rely on W3C XML Schema Datatypes for convenience and interoperability reasons. RELAX [4], for instance, uses W3C XML Schema Datatypes within RELAX descriptions, but restricts users to the built-in types defined within that specification. This allows RELAX developers to focus on RELAX, without having to harness RELAX implementions to W3C XML Schema implementations which can process W3C XML Schema type declarations. It also allows RELAX to avoid the URI+local name issues involved in W3C XML Schema processing, as it relies solely on the name portion of the datatypes. Although RELAX has chosen the (human-friendly) approach of relying on the names of built-in datatypes, I'd like to suggest that a slightly different approach might be simpler, far more extensible, and still workable. Rather than rely on a combination of a namespace URI and a local name to identify types, the use of a bare URI would allow processors to include data typing information created in a number of different frameworks without mandating the use of a particular syntax for information type definition. For example, I might create a datatype defining a 'simonSKU' identified by the URI http://simonstl.com/dt/simonSKU. At that location I'd have a RDDL [5] document, which would provide a human-readable description as well as links to a W3C XML Schema definition of the data type, perhaps a Perl regular expression which can be used to check my SKU, a Java class which can be used to check it, etc. There could also be some RDF around describing relationships between this type and other types, or additional properties of the type like creator, projects in which it's used, etc. It would be my responsibility to make sure all of these things worked consistently, of course (and maybe a testing resource in RDDL would be cool), but applications could use my datatype processing as appropriate, and humans could have a full set of documentation as well. I'm well-aware that this approach would involve potentially substantial changes in both W3C XML Schema and RELAX to implement, so I'm not exactly expecting it to happen. (RDF Schema [6] already uses a similar URI-based approach.) It may well have been considered and rejected at a prior date. I suspect it isn't necessary to meet the requirements of W3C XML Schema within its own worldview, but might simplify the implementation of certain aspects of W3C XML Schema and provide future extensibility in new directions. Also, URIs could point quite easily to locations within a single W3C XML Schema document - this doesn't require schema fragmentation, so long as only a single processing context is needed. This approach might also simplify future projects which handle type information as metadata, not necessarily as part of a validation process. [1] - XML Schema Part 2: Datatypes (http://w3.org/TR/2000/CR-xmlschema-1-20001024/) [2] - XML Schema Part 1: Structures (http://w3.org/TR/2000/CR-xmlschema-1-20001024/) [3] - Namespaces in XML (http://w3.org/TR/1999/REC-xml-names-19990114) [4] - Regular Language Expressions (http://www.xml.gr.jp/relax/) [5] - Resource Directory Description Language (http://www.rddl.org) [6] - Resource Description Framework Schema (http://w3.org/TR/2000/CR-rdf-schema-20000327) Simon St.Laurent Associate Editor O'Reilly and Associates
Received on Wednesday, 7 March 2001 17:58:24 UTC