Re: Pedagogic validation

Lars Gunther writes:

> I have written a blog post about what *might* a validator feature, an
> extra setting for a "pedagogic" profile.
>
> http://itpastorn.blogspot.com/2009/09/pedagogic-validation-of-html.html

You mention both 'polyglot' and 'pedagogic' in that post, suggesting
they may be equivalent.  However there's a big difference: polyglot is a
matter of fact (whether a document will be tret equally if parsed as
each of HTML and XHTML), whereas pedagogic is a matter of opinion --
folks could reasonably disagree on which conventions are easier to
teach, and any consensus views on that may change over time.

I think it's entirely reasonable for somebody to declare a labelled
group of restrictions on HTML 5 style as being useful for teaching
purposes.  It's also reasonable for somebody else to declare a
_different_ set of restrictions also being useful for teaching purposes
(and to label them differently).

It's also reasonable for anybody to implement a validator which checks
for HTML 5 meeting these styles.

But I'm not sure it's a good idea to have one particular subjective
style choice endorsed as being "the pedagogic style" by W3C (or
apparently endorsed, by being labelled as such as an option on a W3C
validator).

> Is is quite long, so I will not paste all of its contents into this
> mail.

It's just as easy to read long e-mails as long webpages!  And when
replying having the text already in an e-mail makes it easier to quote:

  ... everyday problems I encounter as a teacher of markup languages, in
  addition to what a normal validation would reveal:

    * Students forgetting to quote attribute values, even though they
      contain multiple words.<img alt=My dog>

'Normal validation' would catch the above, since dog isn't a valid
attribute of <img>.

    * Students messing up the balance of the quotation marks:
    <img src="foo.jpg alt=My dog">

'Normal validation' also catches the above, complaining about the space
in the path.

Not trying to use quote marks would also have been an improvement (for
getting a validation failure) in this case, since the src=foo.jpg
attribute would then have the correct value and the unquoted alt text
would give the unknown attribute error from the previous example.

    * Students messing up the DOM since they do not (yet) know all the
      rules for when an element is implicitly closed by another elements
      starting tag.

With some students (particularly those not from technical backgrounds) I
find the opposite: they can understand why you need to indicate where,
say, a link both begins and ends, but also see, say, <li> as meaning
'start the next bullet point in the list' -- and it's implicit in
starting that one that you're no longer in the previous one.

So it isn't a case of learning that </li> 'closes' a list item (whatever
that means) and then that </li> can be an implicit tag; it's simply that
to do what they want to achieve </li> doesn't crop up at all.
Mentioning it can even provoke comments of the form "Stupid computer
-- why can't it work out that if I've gone on to the next bullet point
then I've finished with the previous one?!".  Not having to explain tags
which don't do anything (and not having to type them -- those new to
creating webpages are often slow typers) is a pedagogical advantage with
such students.

  All elements should be explicit

  This would mean that:

    * Root-element (html), head and body tags should not be optional
      (grade 5).

With beginners I find that to be unnecessary boilerplate, which only
frustrates them because it gets in the way of the 'fun' part, by being
more typing (for which there isn't apparently any purpose, other than
computers being tedious) and a potential source of errors (typing it
incorrectly actually makes documents wrong).

    * Void elements must have a trailing slash. (grade 3)

Beginners without an XML background don't consider <bam/> to be
equivalent to <bam></bam> anyway, so even those with a penchant for
'matching' all their tags aren't helped by this.  Whereas those who are
happy to accept that 'the <img> command' puts an image in the document
at that point just find the slash one more thing they have to remember.
And if their documents work without it, teaching them to include it is
just wasting time (especially if one of them spots that it isn't needed
and asks what it's for).

  All non-shortened attributes must have their values quoted. (grade 5)

  As I've said above, this is a very common error. In the worst case
  scenario it might lead to very unexpected results. 

Beginners seem to be particularly bad at typing quote marks, possibly
because how to do so varies so much between countries (and between
PC/Apple).  Generally I find the more quote marks they try to type the
greater the chance they'll mess up (and it slows down their typing a
lot), so it's safer if they just need them where needed.

Since most attributes don't need them -- especially in HTML5, where the
rules have been considerably simplified -- it's relatively easy the
first time somebody wants to have a multiword attribute to say it needs
to be quoted.  Indeed anybody with a Unix shell background is already
very familiar with the idea that quotes are needed when spaces or
certain punctuation characters are involved.

  Look at this example, where the value attribute is supposed to contain
  the words "Login name":

  <input type=text value=Login name name=login>

That's something the sudent will spot straight away when the look at the
page and see the wrong text in the button.  A validator isn't going to
make that more obvious.

Also, the above already fails validation because of the duplicate name
attribute.  (With different text it would fail with an unknown
attribute.)

> The idea is to have options in the validator that can be of use in  a
> teaching situation.

I think that's a great idea.

And where I point out specific places in which I'd want different rules
from yours when teaching HTML, I'm not trying to persuade you that mine
are in any way 'better' or that you should change your mind on this
matter.

Merely that it's entirely possible to reasonably have different ideas on
subjective things like this, so no one ruleset should be able to claim
rights to being the best for pedagogical reasons.

Smylers

Received on Thursday, 10 September 2009 13:50:15 UTC