Re: including a schema with "HTML: The Markup Language" Clarifying TAG Re: Courtesy notification

Dan Connolly <connolly@w3.org>, 2010-03-18 09:35 -0500:

> On Thu, 2010-03-18 at 19:11 +0900, Michael(tm) Smith wrote:
> > And that grammar on its own doesn't even completely express all
> > the constraint checks that need to be done by an actual validation
> > tool that uses it as an input
[...]
> I'm not asking for a formalism
> adequate to build a validator. Just a (machine-readable) sketch
> of what-goes-where.
> 
> Even though a schema is an incomplete description of a language,
> it's often useful; e.g. it can show which elements are allowed
> where so that authoring tools can use it to build
> context-sensitive auto-complete lists.

Yes, agreed.

But there is nothing today that prevents anybody from copying or
checking out the validator.nu schema source files and integrating
them into other applications however they see fit. It is not
necessary for the HTML WG to (re)publish the validator.nu schema
files in W3C TR space in order to facilitate those kinds of reuses
of that schema.

I think what it might be more appropriate for the HTML WG to do is
to publish an Group Note, titled "Editing and Validating HTML5" or
something, that would provide information about existing tools
that have HTML5-specific editing features and validating features,
as well as tutorial information on how to integrate HTML5-specific
editing features and validation features into existing applications.

That document could provide readers with URLs for online and
offline HTML5 validation services, as well as information about
available schemas, and URLs for the locations where they are
maintained and from where they can downloaded; e.g., it could have
a section saying, "The following HTML5-related schema
implementations are available", and then list the location of the
validator.nu HTML5 schema along with any other schema
implementations that might become available. It would be the
responsibility of the editor and the WG to make a qualitative
judgment about which resources listed in that document (with part
of the criteria of course being whether they actually conform to
to the document-conformance criteria in the HTML5 spec).

I would be willing to volunteer to be the editor for such a
document, if others think it would be worth having.

A possible objection that I can anticipate we might have to the
existence of a document like the one I just described is that I
know some people will likely say it's not a good idea to have
multiple schema implementations available -- that there should
just be one good one that is commonly used by everybody. I would
have to say that I'd disagree with that. I think we instead want
to encourage people to develop alternative schema implementations,
and have those be evaluated equally on the basis of their relative
merits. I guess it could well be that might never happen, and the
validator.nu schema might eventually become the de facto common
schema that other applications use. But I don't think we should
risk prematurely ensuring that outcome without giving time and
opportunity for others to develop alternatives.

> > IMHO, that it would be
> > potentially very misleading to publish it and encourage its use
> > outside the context of its function as part of a complete HTML
> > validation tool (like validator.nu).
> 
> Misleading in what way? Do any examples come to mind?

Misleading in that if it is published in W3C TR space, even if it
is so published by the WG with an understanding that it's just one
of many possible ways of expressing a particular subset of the
document conformance criteria in the language, and with agreement
that it's intended for a particular limited class of use cases --
for example, for use in editing applications that provide
context-sensitive auto-complete lists, but not for use on its own
in building a validation service -- it nevertheless risks ending
up being seen by the community at *the* validation schema endorsed
by the HTML WG and the W3C, in the same way the HTML4 and XHTML1
DTDs that the W3C published are effectively the only normative
validation schemas for those versions of the language, and the W3C
Markup Validation Service built around those is seen as *the*
single, only officially endorsed validation service for HTML.

I think we should instead do what we can to encourage multiple
implementors to compete to provide the best validation services.

> > [I mentioned the validator.nu schema being optimized for use
> > within the validator.nu service]
> 
> Maybe I can find them myself, but if you could be more specific
> about those cases, I'd appreciate it. I'm interested to look at
> how much these optimizations make the schema less useful.

One specific case is the handling of required attributes. An
example is the link element, for which the HTML5 spec defines href
and rel as required attributes. So I make an RNC schema that
defines the content model and attribute model for the link element
like this:

	link.elem =
		element link { link.inner & link.attrs }
	link.attrs =
		(	common.attrs
		&	shared-hyperlink.attrs.href
		&	shared-hyperlink.attrs.rel
		&	shared-hyperlink.attrs.hreflang?
		&	shared-hyperlink.attrs.media?
		&	shared-hyperlink.attrs.type?
		&	link.attrs.sizes?
		#	link.attrs.title included in common.attrs
		)
	link.inner =
		( empty )

If I then take that schema along with a document that contains an
instance of a link element that lacks an href attribute and I feed
it to Jing, then the complete text of the error message that Jing
emits is this:

  Required attributes missing on element link

Note that it doesn't tell you which attributes are actually
missing. So to make up for that deficiency in Jing, what the
validator.nu backend currently actually uses is a schema that
defines the href and rel attributes on the link element as being
optional (not required). What that does is, it causes the above
not-particularly-useful error message to be suppressed. So what
validator.nu is currently using instead to check for the
href-and-rel-required constraint is Java code in this file:

  https://whattf.svn.cvsdude.com/syntax/trunk/non-schema/java/src/org/whattf/checker/schematronequiv/Assertions.java

(Search for "link required attrs" in that file.)

That causes the following much-more-useful error message to be
emitted instead:

  A link element must have an href attribute

However, the RNC schema files that are in the validator.nu source
repository actually still define rel and href as required on link.
The change to make them optional (to suppress the Jing error) is
done through a hacky part of the validator.nu build. The reason is
was done that way was in the interest of helping to make the
schema more "portable" (for lack of a better word) outside of the
context of its specific use within Jing. Somebody less concerned
about trying to keep it portable might instead just optimize it by
forgoing the build hackery and just making the change directly in
the source repository and being done with it.

(Anyway, fwiw, the real solution to the particular issue that
motivated all that is to fix Jing. George Bina made a Jing patch
that does that -- that causes Jing itself to emit useful error
messages for required-but-missing attributes and other cases. I
have done a rough initial integration of that into a development
instance of the validator.nu backend, but I still need to make
some further changes to actually get it done properly. In the
means time, I posted the following note to myself about it -

http://lists.w3.org/Archives/Public/www-archive/2010Jan/0077.html

> > The information in that schema was developed from the almost
> > completely prose-based document-conformances constraint expressions
> > in the HTML5 spec itself. So it can reasonably be viewed as just
> > one possible implementation of just some particular document-
> > conformance constraints that are expressed in the HTML5 spec.
> 
> Very well, but it's likely to be good enough for lots of other
> uses, and I'd like to save others the work of re-doing
> the development of a schema from the HTML 5 spec prose.
> [...]
> Sure, things may change in the future. But until you do
> sever the tie, it seems pretty useful to include the schema
> in an appendix, as it saves others the work of
> "putting the toothpaste back in the tube", i.e. reversing
> the prose-generation process.

The availability of that schema in the validator.nu source
repository already does that. Anybody else can grab it from there
just as I did. If we want to publicize its availability further, I
think an HTML WG Note like the "Editing and Validating HTML5" one
I proposed above would be a very good way to do that.

> > On top of that, if the HTML WG were to publish a formalism or set
> > of formalisms for the HTML language, I think there is a good
> > argument against making that particular schema the basis for it
> > (for the reasons I mentioned above -- that it's already optimized
> > to some degree for particular use with validator.nu, and may in
> > the future end up being optimized for that even more so).
> 
> I'd like to see the specifics of that argument; I'm not persuaded
> by the general claims that the context in which it was developed
> makes it worse than nothing.

I hope some of what I wrote above helps to clarify that. I hope
it's also clear by now that I also don't think it's necessary for
the HTML WG to publish or endorse any single specific formalism or
set of formalisms for HTML5 to the exclusion of any others. I
think we should instead just provide the HTML5 spec itself as the
standard expression of the document-conformance constraints of the
language, and encourage multiple implementors to develop of
schemas and other formalisms for specific use cases (e.g., for use
in providing context-sensitive auto-complete lists in editing apps).

  --Mike

-- 
Michael(tm) Smith
http://people.w3.org/mike

Received on Friday, 19 March 2010 10:23:22 UTC