Re: schema arrangement or architecture

Hi Andrew,

Andrew Welch <andrew.j.welch@gmail.com> writes:

> Given the "salami slice" style of schema (all elements and types are
> global) are there any established practices for arranging a large
> schema? eg
> 
> - try to keep to one large file or break it down into several files?

I think either way is fine. I wouldn't arbitrarily split the schema
into multiple files (e.g., 10KB goes into first file, 10KB -- second,
etc.) just for the sake of making them smaller. On the other hand,
logically grouping certain types/elements and factoring them out
into separate schema files can result in a schema that is  easier
to understand and reuse. One advantage of having one large schema
is the ease of passing it around and embedding it into an application.

If you go the multiple schema files way, try to avoid inclusion
cycles since that may cause trouble with tools (e.g., data binding)
that you or your users may use on the schema. Also making each
file self-sufficient (that is, it xsd:include's all external 
type/element definitions it references) makes it possible to
reuse parts of your schema.


> - ensure all element definitions are always in a single file, or where
> possible keep element definitions in the same file as their type
> definitions?

I would definitely prefer the latter. It makes it easier to study your
schema.


> - should all attribute be defined globally?

I think that might be overly verbose. I also don't see much use
in making every element global (especially if your schema has a
target namespace). There isn't that much extra typing in 

<element name="foo" type="bar"/>

compared to 

<element ref="foo"/>

My ideal schema has all its types defined globally (e.g., no
anonymous types), and all its elements and attributes --
locally, except for the elements that are valid document roots.


> Is the secret of maintaining a large schema down to good tooling
> rather than an intimate knowledge of what's in each file?

My experience with schemas that are produced by tools (e.g., XML
editors) showed that they are often of a very poor quality. One
common trait of such schemas, if they consist of multiple files,
is self-insufficiency of the individual files (e.g., they need
to be included into the "main" schema file in order to be valid).
This prevents reuse. They also often have inclusion cycles. I
call this style of schema authoring "Rat's nest" ;-).


Boris

-- 
Boris Kolpackov, Code Synthesis Tools   http://codesynthesis.com/~boris/blog
Open source XML data binding for C++:   http://codesynthesis.com/products/xsd
Mobile/embedded validating XML parsing: http://codesynthesis.com/products/xsde

Received on Tuesday, 3 June 2008 07:45:17 UTC