- From: Robin Berjon <robin@berjon.com>
- Date: Fri, 29 May 2009 16:59:47 +0200
- To: noah_mendelsohn@us.ibm.com
- Cc: "Henry S. Thompson" <ht@inf.ed.ac.uk>, www-tag@w3.org
Hi Noah,
On May 29, 2009, at 16:14 , noah_mendelsohn@us.ibm.com wrote:
> Robin Berjon wrote:
>> Or more precisely, that a lot of languages are defined using
>> XML Schema — the distinction being that in a fair number of
>> cases that I've seen the schema may be in the specification
>> but then no one ever uses it as part of the production chain.
>
> Do you mean XSD is not used in production for validation or also
> that it's
> not used by tooling.
Sorry, I was indeed unclear and failed to state the bias in my
sampling. I actually mean not used *at all*. A lot of the usage I see
is rooted in the mobile or TV industry (the latter having even more
constrained devices), or around rather simple data format (of the kind
that is used in the Widgets Packaging and Configuration specification,
or similar rather straightforward formats).
Things may be improving (slowly), but in a lot of the cases here there
is no data binding, and no validation at any step. People will
generally proceed with more general testing that will catch validity
errors in the XML at the processor-level, but without relying on
schema-based validation. I don't have numbers to back this up, but I'm
not pulling it out of thin air either: I've seen a shockingly large
number of schemata extracted from specifications (by SDOs or
customers) that either weren't accepted by schema validators, or
didn't properly validate real content* (or in extreme cases weren't
even well-formed XML) — and no one had noticed months into deployment.
This is the community that I think we should address: people who use
schemata mostly for documentation, and could really use usage guidance.
I think that there are several factors at play here. One is that we
are talking about documents that are at least an order of magnitude
simpler (by whichever measure) than those used for instance in the
financial industry or in B2B, ERP, etc. This makes data binding less
valuable, and the far lesser degree of language composability means
that user agent validation tends to be less complex (yet more
complete) than schema-based approaches.
Another is that existing schema languages haven't really been designed
to operate on constrained devices. XML Schema is amenable to streaming
but its complexity gets in the way; RelaxNG tends to take up too much
memory (I haven't looked at other options). And in any case they slow
things down. I've heard the argument that if you can do mobile video
then surely you can do a bit of validation, but video has an impact on
the user experience that validation lacks — thereby justifying the
cost of research in faster software, special chips, etc. And not
validating at the receiving endpoint tends to reduce the value of
validating at all.
Then you have to take into account the fact that a lot of that data is
produced on systems that don't have very potent or reliable schema
implementations (at least not for XML Schema, RNG and Schematron tend
to be more readily available). The vast tracts of information produced
using PHP, Perl, Ruby, etc. are unlikely to use much validation or
data binding. And that a huge part of the web, including less visible
mobile bits.
I can't seem to dig it up now but I recall an article (I believe from
Tim Bray) from about a decade ago in which he explained that people
didn't use DTDs with XML — they mostly drew up a few examples
documents and emailed them over. My experience is that there's still a
large community that keeps working in pretty much the same way —
except that they'll toss in a schema because it seems to be what's
done, because it makes it look real and professional.
Please note that I'm dissing neither XML Schema nor people who use
that approach. I just happen to think that we would probably provide
better value by looking at what the people who couldn't tell a
deterministic content model from the distinguished property value
denoting it and don't want to are doing, where they're shooting
themselves in the feet, what could be simplified, etc. than by
debating miscellaneous technicalities differentiating schema languages
(no matter who much fun that is).
* elementFormDefault probably accounting for a solid majority of such
cases.
--
Robin Berjon - http://berjon.com/
Feel like hiring me? Go to http://robineko.com/
Received on Friday, 29 May 2009 15:00:20 UTC