Re: Grammar enforcement while parsing in Amaya

Joseph J Panjikaran wrote:

> I was viewing a FRAMESET based HTML page in Amaya1.4.
> Inside the FRAMESET tag an H1 tag had crept in.
> 
> The file was not written using Amaya and I am just using it to view it.
> Unfortunately, the H1 tag is recognised inside FRAMESET.
> 
> I checked the grammar specified in HTML.S file. H1 is not defined
> inside FRAMSET.

You could also have checked the SGML DTD for HTML 4.0:

    http://www.w3.org/TR/REC-html40/sgml/loosedtd.html

as both the DTD and the HTML.S file define the same structure.
You are right: a FRAMESET element cannot contain an H1 element as a child.

> I dont know about IE, but Netscape4.04 does not allow this.

The only safe reference I know is the HTML 4.0 specification:

    http://www.w3.org/TR/REC-html40/

> The parser should enforce the grammar to some extent and should not
> allow such blatant abberrations and reject the tag.

In principle, you are right.  But Amaya has to cope with existing Web
pages, and, as you know, very few of them validate against the HTML DTD.
When designing Amaya, we were faced with a difficult choice:
(1) adopt a strict position and reject all invalid pages. Most users
    would be very disapointed not to be able to see mainy pages that
    other Web clients can display;
(2) accept invalid pages and let Amaya fix the most common bugs.

We took position (2) and decided that Amaya should try to fix bugs, but
without losing information.  If an element is not valid in a given context,
Amaya tries to change the structure locally to make that element valid,
but it doesn't delete the element or move it to a different place, which
could change its meaning. That's why the H1 element is kept in the FRAMESET.

Another important design decision that has been made for Amaya is that, even
if it accepts invalid documents, the structure and markup that it produces
is always valid. Obviously, only elements created or changed by Amaya itself
are concerned here. Some invalid parts of the original document may remain
when the document is saved.

W3C has also developed a validator that allows you to check documents.
Have a look at:

    http://validator.w3.org/

You could also  use HTML tidy to fix erroneous documents:

    http://www.w3.org/People/Raggett/tidy/

> Another classical case is allowing INPUT tag without an enclosing FORM tag.
> I checked for TABLE related tags,  some degree of checking has been hard
> coded to ContextOK() of html2thot.c file

This is an example of these errors that Amaya tries to recover.  If the
structure of a table is wrong, Amaya can not edit it. For that reason, it
applies some tranformations that make the structure correct.

> I dont think this is the right way of grammar enforcement, since the whole
> purpose(as i see it)
> of having a '.S' and '.STR' file is to make the code independent of the
> grammar to a large exent.

The issue is that a DTD or a .S file only specifies the structure of a
document class, not its semantics.  When you consider an invalid document,
there are often several ways to transform its structure to make it valid,
but each transformation may have a different impact on the document semantics.
The DTD of .S file does not allow you to choose the right transformation.
A specific piece of code is then needed.

> What is the reason for a very liberal grammar enforcement?

See above.

> BTW I am an ardent fan of Amaya. I really enjoyed using Compiler application
> given in amaya1.4
> Good error messages are given when compiling "HTML.S". It helped a  lot!!
> Eagerly waiting for the goodies in Amaya1.5

Thanks for your support.

> regards
> Joseph

Vincent.

-------------------------------------------------------
Vincent Quint                       INRIA Rhone-Alpes
W3C/INRIA                           ZIRST
e-mail: Vincent.Quint@w3.org        655 avenue de l'Europe
Tel.: +33 4 76 61 53 62             38330 Montbonnot St Martin
Fax:  +33 4 76 61 52 07             France

Received on Monday, 15 February 1999 03:06:36 UTC