W3C home > Mailing lists > Public > xmlschema-dev@w3.org > December 2005

RE: SV: SV: SV: SV: Schema help

From: Michael Kay <mike@saxonica.com>
Date: Sat, 3 Dec 2005 10:28:37 -0000
To: <noah_mendelsohn@us.ibm.com>, "'Bryan Rasmussen'" <brs@itst.dk>
Cc: <petexmldev@tech-know-ware.com>, <xmlschema-dev@w3.org>
Message-ID: <E1EiUdb-0002iw-23@maggie.w3.org>
I want to express a constraint that my bill of materials contains no cycles.
I need a pretty powerful language to express that constraint. I don't want
to use several different languages to do different parts of the validation
job simply because they are capable of expressing different kinds of
constraint. Therefore, I want a single schema language with a lot of
expressive power.
XML schema should support at least the full syntax of XPath 2.0, and
preferably the full syntax of XQuery, for describing constraints. People
will be doing that anyway, they'll just be doing it outside their schemas
rather than inside them.
Michael Kay


From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] 
Sent: 03 December 2005 00:02
To: Bryan Rasmussen
Cc: 'Michael Kay'; 'petexmldev@tech-know-ware.com'; 'xmlschema-dev@w3.org'
Subject: Re: SV: SV: SV: SV: Schema help

Should Schematron be seriously considered?  Absolutely.  It has many
attractive qualities and seems to be doing well for at least some users.  Is
it such an obvious choice that we should rush it?  I don't think so.
Picking up on just one point mentioned in this thread... 

Bryan Rasmussen writes: 

> Given that there is likely to be some argument in W3C as to how
> far such constraints should be implemented I doubt they will 
> come out as powerful as Schematron constraints 

Probably true, but that doesn't necessarily make the Schematron approach the
best.  I think we need to also consider Tim Berners-Lee's Principle of Least
Power [1].  Since what Tim has written on this is just a few paragraphs,
I'll quote them all here: 

"In choosing computer languages, there are classes of program which range
from the plainly descriptive (such as Dublin Core metadata, or the content
of most databases, or HTML) though logical languages of limited power (such
as access control lists, or conneg content negotiation) which include
limited propositional logic, though declarative languages which verge on the
Turing Complete (PDF) through those which are in fact Turing Complete though
one is led not to use them that way (XSLT, SQL) to those which are
unashamedly procedural (Java, C). 

The choice of language is a common design choice. The low power end of the
scale is typically simpler to design, implement and use, but the high power
end of the scale has all the attraction of being an open-ended hook into
which anything can be placed: a door to uses bounded only by the imagination
of the programmer. 

Computer Science in the 1960s to 80s spent a lot of effort making languages
which were as powerful as possible. Nowadays we have to appreciate the
reasons for picking not the most powerful solution but the least powerful.
The reason for this is that the less powerful the language, the more you can
do with the data stored in that language. If you write it in a simple
declarative from, anyone can write a program to analyze it in many ways. The
Semantic Web is an attempt, largely, to map large quantities of existing
data onto a common language so that the data can be analyzed in ways never
dreamed of by its creators. If, for example, a web page with weather data
has RDF describing that data, a user can retrieve it as a table, perhaps
average it, plot it, deduce things from it in combination with other
information. At the other end of the scale is the weather information
portrayed by the cunning Java applet. While this might allow a very cool
user interface, it cannot be analyzed at all. The search engine finding the
page will have no idea of what the data is or what it is about. This the
only way to find out what a Java applet means is to set it running in front
of a person. 

I hope that is a good enough explanation of this principle. There are
millions of examples of the choice. I chose HTML not to be a programming
language because I wanted different programs to do different things with it:
present it differently, extract tables of contents, index it, and so on." 

I think we need to consider the choice of constraint mechanisms from this
perspective too.  At least in principle, the ideal would be something just
powerful enough, but no more.  I know that Schematron is sometimes
implemented on top of XSLT, and full XSLT is much too powerful for me to be
comfortable using it as part of validation.  I would like to study whether
Schematron per se may be more appropriately limited, and I have not yet
looked into it in detail.  Perhaps someone who knows Schematron better than
I do can enlighten me? 


[1] http://www.w3.org/DesignIssues/Principles.html#PLP 

Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142

	Bryan Rasmussen <brs@itst.dk> 

11/18/2005 03:31 AM 

        To:        "'noah_mendelsohn@us.ibm.com'"
        cc:        "'petexmldev@tech-know-ware.com'"
<petexmldev@tech-know-ware.com>, "'xmlschema-dev@w3.org'"
<xmlschema-dev@w3.org>, "'Michael Kay'" <mike@saxonica.com> 
        Subject:        SV: SV: SV: SV: Schema help

Well on the subject of co-occurence constraints I would just like to
reiterate what I said earlier, with some extension:

Given that there is likely to be some argument in W3C as to how far such
constraints should be implemented I doubt they will come out as powerful as
Schematron constraints, furthermore I have a hard time seeing this as
producing a syntax as nice as Schematron, therefore I would really like to
see something like:

1. XML Schema adopts Schematron as an extension language of some sort.
2. XML Schema puts some thought into how Schematron can be combined with XML
Schema to the benefit of both, beyond the normal  method of drop schematron
in appinfo. 

I have some ideas on #2, but I'm somewhat conflicted about them - what model
makes sense, syntax etc. so I don't really want to just blurt out with it.
I'd be more interested in hearing what kinds of things other people could
see connecting the two languages.

Bryan Rasmussen

-----Oprindelig meddelelse-----
Fra: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com]
Sendt: 17. november 2005 18:51
Til: Bryan Rasmussen
Cc: 'Michael Kay'; 'petexmldev@tech-know-ware.com';
Emne: Re: SV: SV: SV: Schema help

Well, I think there are good reasons from time to time to revisit the 
effectiveness of the W3C process and the compromises embodied therein. I'm 
not convinced that a deep dive on that is the best use of this particular 
mailing list.   I happen to like the working groups I've been on that do 
their work in public (in my case, both the TAG and XMLP) and I'd be happy 
for schema to go the same way.  Then again, I really don't think that's a 
substitute for having people who have 30% of their time committed to 
working on a technology.  There's a lot of detail work and care required 
to revise a specification even if there's agreement on the general ideas. 
The discussions need to involve people who have the knowledge and the time 
commitment to work through interactions with existing features of the 
specification.  In the case of co-constraints, it would seem to me that 
there ought to be a careful look taken at the relationship between the 
existing key/keyref/unique constraint mechanisms and anything new that's 
proposed.  It would be nice to believe that we wouldn't just be sprouting 
new and uncoordinated ways of doing things every few years.

So, I personally welcome broader input, but what we're really short of are 
the people who can edit the specification text, draft prose, be 
responsible for the details, etc.  Of course, there are also lots of other 
messy issues to consider when you change the working mode of a group 
including anti-trust laws in various jurisdictions, IP issues, etc.  If 
people feel that they have ideas for how the W3C can do these things 
better, I think the right place to go would be to the W3C staff and/or the 
workgroup chairs.  I personally would not be against having the schema WG 
switch to using a public mailing list for its discussions.  I suspect that 
requires a recharter, but in principle I'm fine with it.  I don't think 
that will solve much of our resource problems.  We don't lack for people 
with good ideas, in email or in person.  We're missing the people to do 
the archticture and drafting work that goes into making all the details 
fit together.   It's hard to do that well without meeting F2F from time to 

Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142

Bryan Rasmussen <brs@itst.dk>
Sent by: xmlschema-dev-request@w3.org
11/17/2005 04:59 AM

       To:     "'Michael Kay'" <mike@saxonica.com>
       cc:     "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>, 
"'petexmldev@tech-know-ware.com'" <petexmldev@tech-know-ware.com>, (bcc: 
Noah Mendelsohn/Cambridge/IBM)
       Subject:        SV: SV: SV: Schema help

Damn, an earlier typo in the email address of Pete Cordell added in by me
was replicated in your email. Just on the off chance this thread goes any
further I thought I should correct it. I've cc'ed Pete on this mail. Sorry
for the problem.

Bryan Rasmussen

-----Oprindelig meddelelse-----
Fra: Michael Kay [mailto:mike@saxonica.com]
Sendt: 17. november 2005 10:47
Til: noah_mendelsohn@us.ibm.com; Bryan Rasmussen
Cc: xmlschema-dev@w3.org; ',petexmldev@tech-know-ware.com'
Emne: RE: SV: SV: Schema help

> 1) Although most widely used schema validators are fairly 
> slow, one can in 
> fact implement the XML schema rules at quite high speed.  My team is 
> hoping to publish some work in that area in coming months, 
> and I suspect 
> that others in the industry are working in the same 
> direction.  I think 
> it's important to the success of any technology we choose 
> that it be able 
> to meet the performance needs of our customers.

I would resist this kind of thinking. SQL was successful because it put
functionality first, and left implementors to devise optimisation
strategies. Users need a constraint language that is capable of expressing
arbitrary constraints on the content of a document, and it should be left 
the implementor to work out which of these constraints can be evaluated in
streaming mode and which can't.

SQL today allows the full power of the query language to be used to 
integrity contraints, and users learn when they need to restrict their
ambitions to meet performance requirements. 90% of applications aren't
performance critical anyway.

There's no point telling users to go and use some other technology to do
their validation, the other technology isn't going to be fast either.

Michael Kay
Received on Saturday, 3 December 2005 10:29:09 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:56:09 UTC