Re: What would count as an unbiased survey?

Ah, so you were looking primarily at mobile users, with simple documents 
and typically very small devices.  That's  useful datapoint, but we do 
seem to be spending a lot of energy reasoning from the negative.  Nobody 
doubts that there are lots of communities in which XSD is not used or 
misused.  As you pointed out, this same community often doesn't bother to 
check XML well-formedness, but that doesn't do much to make the case that 
XML itself isn't widely used, or that the well-formedness requirement 
doesn't have value and isn't enforced in many other communities. 


This all feels a bit like trying to say (and I know you're not saying this 
Robin, but thread has this feel):  "gee, do we really know whether C++ is 
widely used?  I don't see a lot of it on LISP machines, and by the way, 
can you please define 'widely used' and come up with objective measures?" 

You can always find important communities that don't use some technology 
or other.  I would posit that by most useful >subjective< measures C++ is 
widely used, and not occasionally misused.  Full stop.  Is it really worth 
debating the troubles in pinning down objective measures?  For a less 
widely use language like, say, Euler [1] I agree; you have to do some real 
digging to see where the use is.  C++ has, IMO, crossed the chasm.  You no 
longer need to debate whether it has a large user community.  We all know 
it does.  Any evidence to the contrary would be presumed suspect.  By 
quite similar measures I would argue that XML itself is widely used, full 
stop.

In a similar subjective spirit, I just don't see why we're having this 
debate about XSD.  It's not as widely used as C++ or XML itself, but I 
believe that in spirit it too has crossed the chasm.   It's used in every 
WSDL;  I strongly believe that it's used as input to lots of widely used 
databinding tools, which in turn generate code that's widely deployed in 
production.   There's lots of annecdotal evidence of widespread adoption, 
such as what Henry's provided.  It's obviously used as a documentation 
standard.   It's the type system for XSL 2.0 and XQuery, use of which is I 
think starting to rise.  Furthermore, to avoid the appearance of hyping my 
own companies products, I'll point to my competitor's as evidence of 
significant continued investment [2] in XSD.  Does anyone really think 
that a company like Microsoft would make what appears to be a major 
2008/2009 investment in graphical tools for building and manipulating XSD 
schemas if they didn't know that there was, by some subjective measure, a 
very large user community for that technology? 

As Henry Thompson said:

> The _only_ reason for pursuing this question is to rebut the 
> proposition, often advanced but not, to my knowledge, ever 
> substantiated, that W3C XML Schema is not used very much, so e.
> g. delaying the next version is not a big deal.

Are other schema languages also widely used?  No doubt (subjective 
conclusion).  Much more widely than XSD?  For purposes of this discussion, 
I don't care (though I doubt it.)  Is there anyone in this thread really 
seriously saying that XSD, a W3C Recommendation, is so rarely used that 
>such lack of use is the reason for not going ahead with an improved 
version of the Recommendation<?

I understand that Rick has raised >other< reasons for not updating XSD, 
which I'll oversimplify as "it's a distraction from the important business 
of admitting that XSD is a bad language and that we should be working on 
either cleanup or replacement".   I happen to disagree, but it's 
appropriate to consider those points on the merits.  To say, however, that 
there is no significant XSD user community who might benefit from these 
enhancments flies in the face of all the evidence I've seen.  From what 
I've seen on the schema-dev mailing list, there is also good evidence that 
many of the XSD 1.1 enhancements will in fact be useful to current XSD 
users.

Let's please just move ahead with the review of the XSD 1.1 CR, and 
evaluation of implementation experience.   Rick's concern has, I believe, 
been registered as a comment on XSD 1.1 [3], and I expect that will cause 
it to get due consideration as part of the W3C process.

Noah

[1] http://en.wikipedia.org/wiki/Euler_(programming_language)
[2] 
http://blogs.msdn.com/xmlteam/archive/2007/08/27/announcing-ctp1-of-the-xml-schema-designer.aspx

[3] http://www.w3.org/Bugs/Public/show_bug.cgi?id=6940


--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








Robin Berjon <robin@berjon.com>
05/29/2009 10:59 AM
 
        To:     noah_mendelsohn@us.ibm.com
        cc:     "Henry S. Thompson" <ht@inf.ed.ac.uk>, www-tag@w3.org
        Subject:        Re: What would count as an unbiased survey?


Hi Noah,

On May 29, 2009, at 16:14 , noah_mendelsohn@us.ibm.com wrote:
> Robin Berjon wrote:
>> Or more precisely, that a lot of languages are defined using
>> XML Schema — the distinction being that in a fair number of
>> cases that I've seen the schema may be in the specification
>> but then no one ever uses it as part of the production chain.
>
> Do you mean XSD is not used in production for validation or also 
> that it's
> not used by tooling.

Sorry, I was indeed unclear and failed to state the bias in my 
sampling. I actually mean not used *at all*. A lot of the usage I see 
is rooted in the mobile or TV industry (the latter having even more 
constrained devices), or around rather simple data format (of the kind 
that is used in the Widgets Packaging and Configuration specification, 
or similar rather straightforward formats).

Things may be improving (slowly), but in a lot of the cases here there 
is no data binding, and no validation at any step. People will 
generally proceed with more general testing that will catch validity 
errors in the XML at the processor-level, but without relying on 
schema-based validation. I don't have numbers to back this up, but I'm 
not pulling it out of thin air either: I've seen a shockingly large 
number of schemata extracted from specifications (by SDOs or 
customers) that either weren't accepted by schema validators, or 
didn't properly validate real content* (or in extreme cases weren't 
even well-formed XML) — and no one had noticed months into deployment. 
This is the community that I think we should address: people who use 
schemata mostly for documentation, and could really use usage guidance.

I think that there are several factors at play here. One is that we 
are talking about documents that are at least an order of magnitude 
simpler (by whichever measure) than those used for instance in the 
financial industry or in B2B, ERP, etc. This makes data binding less 
valuable, and the far lesser degree of language composability means 
that user agent validation tends to be less complex (yet more 
complete) than schema-based approaches.

Another is that existing schema languages haven't really been designed 
to operate on constrained devices. XML Schema is amenable to streaming 
but its complexity gets in the way; RelaxNG tends to take up too much 
memory (I haven't looked at other options). And in any case they slow 
things down. I've heard the argument that if you can do mobile video 
then surely you can do a bit of validation, but video has an impact on 
the user experience that validation lacks — thereby justifying the 
cost of research in faster software, special chips, etc. And not 
validating at the receiving endpoint tends to reduce the value of 
validating at all.

Then you have to take into account the fact that a lot of that data is 
produced on systems that don't have very potent or reliable schema 
implementations (at least not for XML Schema, RNG and Schematron tend 
to be more readily available). The vast tracts of information produced 
using PHP, Perl, Ruby, etc. are unlikely to use much validation or 
data binding. And that a huge part of the web, including less visible 
mobile bits.

I can't seem to dig it up now but I recall an article (I believe from 
Tim Bray) from about a decade ago in which he explained that people 
didn't use DTDs with XML — they mostly drew up a few examples 
documents and emailed them over. My experience is that there's still a 
large community that keeps working in pretty much the same way — 
except that they'll toss in a schema because it seems to be what's 
done, because it makes it look real and professional.

Please note that I'm dissing neither XML Schema nor people who use 
that approach. I just happen to think that we would probably provide 
better value by looking at what the people who couldn't tell a 
deterministic content model from the distinguished property value 
denoting it and don't want to are doing, where they're shooting 
themselves in the feet, what could be simplified, etc. than by 
debating miscellaneous technicalities differentiating schema languages 
(no matter who much fun that is).

* elementFormDefault probably accounting for a solid majority of such 
cases.

-- 
Robin Berjon - http://berjon.com/

     Feel like hiring me? Go to http://robineko.com/

Received on Friday, 29 May 2009 16:42:25 UTC