W3C home > Mailing lists > Public > www-tag@w3.org > March 2010

Re: including a schema with "HTML: The Markup Language" Clarifying TAG Re: Courtesy notification

From: Dan Connolly <connolly@w3.org>
Date: Thu, 18 Mar 2010 09:35:42 -0500
To: "Michael(tm) Smith" <mike@w3.org>
Cc: noah_mendelsohn@us.ibm.com, Paul Cotton <paul.cotton@microsoft.com>, Philippe Le Hegaret <plh@w3.org>, Sam Ruby <rubys@intertwingly.net>, "www-tag@w3.org WG" <www-tag@w3.org>, Maciej Stachowiak <mjs@apple.com>
Message-ID: <1268922942.4118.1323.camel@pav.lan>
On Thu, 2010-03-18 at 19:11 +0900, Michael(tm) Smith wrote:
> Dan Connolly <connolly@w3.org>, 2010-03-16 09:38 -0500:
> 
> > On Mon, 2010-03-15 at 19:49 +0900, Michael(tm) Smith wrote:
> > > I moved the status of bug 8611 to resolved=wontfix.
> > > I think discussion of what kind of schema or set of schemas (e.g.,
> > > RelaxNG plus Schematron plus whatever else) to publish, and how to
> > > publish it, should be raised either as a new bug or as an HTML WG
> > > Tracker issue.
> > 
> > That seems kinda odd, Mike; the way the document is built uses
> > a schema for its backbone. The schema is there in the source:
> >   http://dev.w3.org/cvsweb/html5/markup/schema/
> 
> The schema is not in the source any longer (I've removed it).
> 
> As far as the relationship between that schema and the H:TML doc,
> I'm not sure "backbone" is an accurate word to describe that; the
> RelaxNG grammar itself is one among several sources of information
> the document is built from.
> 
> And that grammar on its own doesn't even completely express all
> the constraint checks that need to be done by an actual validation
> tool that uses it as an input

Stipulated.

Can we put that whole line of reasoning aside please?
It's irrelevant to my request. I'm not asking for a formalism
adequate to build a validator. Just a (machine-readable) sketch
of what-goes-where.

Even though a schema is an incomplete description of a language,
it's often useful; e.g. it can show which elements are allowed
where so that authoring tools can use it to build
context-sensitive auto-complete lists.

> Anyway, my point is, that RelaxNG grammar is far from being on its
> own a worthwhile means for doing many of the most useful
> constraint checks that a modern, useful HTML validation tool
> really needs to do in practice -- so far, IMHO, that it would be
> potentially very misleading to publish it and encourage its use
> outside the context of its function as part of a complete HTML
> validation tool (like validator.nu).

Misleading in what way? Do any examples come to mind?

> Another problem with that particular grammar is that it is not
> optimized for use as a part of a formal description of the HTML
> language. It is instead (to some degree at least) optimized for
> the particular use that it is being put to within the validator.nu
> service. And it may be in the future that it will end up being
> optimized for that use to an even greater degree. For example,
> there are a number of constraint checks that are currently done in
> the grammar but that -- in order to generate more useful error
> messages reported for them -- would instead much better be done
> using Schematron (or in Schematron-workalike code such as
> validator.nu uses). (That's maybe due in part to the poor quality
> of error messages that Jing (which validator.nu uses) currently
> emits for certain cases -- like the case of required-but-missing
> attributes.)

Maybe I can find them myself, but if you could be more specific
about those cases, I'd appreciate it. I'm interested to look at
how much these optimizations make the schema less useful.


> Anyway, I also think that in a validation system that relies on
> multiple means for doing constraint checks, just because something
> is expressible in a grammar-based schema that's part of that
> systems doesn't mean it *should* be expressed in a grammar, nor
> that the grammar is necessarily the best place to express it. So
> in the case of this particular schema, there is no guarantee that
> some constraints that are currently in that grammar won't
> eventually be moved out to a different part of the system.
> 
> > So you're already publishing the information that's in
> > the schema;
> 
> The information in that schema was developed from the almost
> completely prose-based document-conformances constraint expressions
> in the HTML5 spec itself. So it can reasonably be viewed as just
> one possible implementation of just some particular document-
> conformance constraints that are expressed in the HTML5 spec.

Very well, but it's likely to be good enough for lots of other
uses, and I'd like to save others the work of re-doing
the development of a schema from the HTML 5 spec prose.


> > is there some reason not to just include the
> > source of the schema in an informative
> > appendix (with whatever disclaimers you like), so that
> > other people can make similar uses of it?
> 
> For one thing, the H:TML doc, as it stands now, by design does not
> actually use the same expression language as that schema. The
> parts that express the same constraints that are in that schema
> are instead expressed in the H:TML doc in natural language.
> That natural language is currently generated in part (through an
> very baroque, fragile, inelegant build process I hacked together)
> from that schema, but I do not guarantee that it will always
> remain so. I originally did it that way in part as an experiment
> to see how doable and useful it'd end up being, and I'm still not
> sure myself that it's been a successful experiment; it may be
> better in the long run to sever the tie and just maintain those
> parts manually rather than generating them.

Sure, things may change in the future. But until you do
sever the tie, it seems pretty useful to include the schema
in an appendix, as it saves others the work of
"putting the toothpaste back in the tube", i.e. reversing
the prose-generation process.


> On top of that, if the HTML WG were to publish a formalism or set
> of formalisms for the HTML language, I think there is a good
> argument against making that particular schema the basis for it
> (for the reasons I mentioned above -- that it's already optimized
> to some degree for particular use with validator.nu, and may in
> the future end up being optimized for that even more so).

I'd like to see the specifics of that argument; I'm not persuaded
by the general claims that the context in which it was developed
makes it worse than nothing.


-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
gpg D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Thursday, 18 March 2010 14:35:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:20 GMT