Re: Grammar enforcement while parsing in Amaya

Hello.
Thank u for providing valuable insights to design decisions.
I have been trying to modify the HTML.S file to get a cut down version
of tags and attributes to be recognised by Amaya.
The new HTML.S is now compiled to HTML.STR and HTML.H using the
compiler.exe. I have also cut down the respective tags and attributes from
HTML.A
also. I then use the new HTML.STR and HTML.h when building the Amaya.
There seems to be some problems.
When parsing and this part of code is reached in tree.c
/*************************/
        case CsIdentity:
    /* structure is the same as that defined by another rule of the */
      /* same scheme */
    create = FALSE;
    pSRule2 = &pSS->SsRule[pSRule->SrIdentRule - 1];
    if (pSRule2->SrParamElem || pSRule2->SrAssocElem || pSRule2->SrConstruct
== CsBasicElement ||
    pSRule2->SrNInclusions > 0 || pSRule2->SrNExclusions > 0 ||
        pSRule2->SrConstruct == CsConstant || pSRule2->SrConstruct ==
CsChoice ||
        pSRule2->SrConstruct == CsPairedElement ||
        pSRule2->SrConstruct == CsReference || pSRule2->SrConstruct ==
CsNatureSchema)
       create = TRUE;
    t1 = NewSubtree (pSRule->SrIdentRule, pSS, pDoc, assocNum, Desc,
       create, withAttr, withLabel);
    if (pEl == NULL)
       pEl = t1;
    else
       InsertFirstChild (pEl, t1);
    break;
/***************************************/
it goes into infinite recursion by calling NewSubtree () function with the
typenum parameter=0.
Any insights will be appreciated.

-----Original Message-----
From: Vincent Quint <quint@w3.org>
To: Joseph J Panjikaran <josephj@wipinfo.soft.net>
Cc: Amaya mailing list <www-amaya@w3.org>
Date: Monday, February 15, 1999 1:41 PM
Subject: Re: Grammar enforcement while parsing in Amaya


>Joseph J Panjikaran wrote:
>
>> I was viewing a FRAMESET based HTML page in Amaya1.4.
>> Inside the FRAMESET tag an H1 tag had crept in.
>>
>> The file was not written using Amaya and I am just using it to view it.
>> Unfortunately, the H1 tag is recognised inside FRAMESET.
>>
>> I checked the grammar specified in HTML.S file. H1 is not defined
>> inside FRAMSET.
>
>You could also have checked the SGML DTD for HTML 4.0:
>
>    http://www.w3.org/TR/REC-html40/sgml/loosedtd.html
>
>as both the DTD and the HTML.S file define the same structure.
>You are right: a FRAMESET element cannot contain an H1 element as a child.
>
>> I dont know about IE, but Netscape4.04 does not allow this.
>
>The only safe reference I know is the HTML 4.0 specification:
>
>    http://www.w3.org/TR/REC-html40/
>
>> The parser should enforce the grammar to some extent and should not
>> allow such blatant abberrations and reject the tag.
>
>In principle, you are right.  But Amaya has to cope with existing Web
>pages, and, as you know, very few of them validate against the HTML DTD.
>When designing Amaya, we were faced with a difficult choice:
>(1) adopt a strict position and reject all invalid pages. Most users
>    would be very disapointed not to be able to see mainy pages that
>    other Web clients can display;
>(2) accept invalid pages and let Amaya fix the most common bugs.
>
>We took position (2) and decided that Amaya should try to fix bugs, but
>without losing information.  If an element is not valid in a given context,
>Amaya tries to change the structure locally to make that element valid,
>but it doesn't delete the element or move it to a different place, which
>could change its meaning. That's why the H1 element is kept in the
FRAMESET.
>
>Another important design decision that has been made for Amaya is that,
even
>if it accepts invalid documents, the structure and markup that it produces
>is always valid. Obviously, only elements created or changed by Amaya
itself
>are concerned here. Some invalid parts of the original document may remain
>when the document is saved.
>
>W3C has also developed a validator that allows you to check documents.
>Have a look at:
>
>    http://validator.w3.org/
>
>You could also  use HTML tidy to fix erroneous documents:
>
>    http://www.w3.org/People/Raggett/tidy/
>
>> Another classical case is allowing INPUT tag without an enclosing FORM
tag.
>> I checked for TABLE related tags,  some degree of checking has been hard
>> coded to ContextOK() of html2thot.c file
>
>This is an example of these errors that Amaya tries to recover.  If the
>structure of a table is wrong, Amaya can not edit it. For that reason, it
>applies some tranformations that make the structure correct.
>
>> I dont think this is the right way of grammar enforcement, since the
whole
>> purpose(as i see it)
>> of having a '.S' and '.STR' file is to make the code independent of the
>> grammar to a large exent.
>
>The issue is that a DTD or a .S file only specifies the structure of a
>document class, not its semantics.  When you consider an invalid document,
>there are often several ways to transform its structure to make it valid,
>but each transformation may have a different impact on the document
semantics.
>The DTD of .S file does not allow you to choose the right transformation.
>A specific piece of code is then needed.
>
>> What is the reason for a very liberal grammar enforcement?
>
>See above.
>
>> BTW I am an ardent fan of Amaya. I really enjoyed using Compiler
application
>> given in amaya1.4
>> Good error messages are given when compiling "HTML.S". It helped a  lot!!
>> Eagerly waiting for the goodies in Amaya1.5
>
>Thanks for your support.
>
>> regards
>> Joseph
>
>Vincent.
>
>-------------------------------------------------------
>Vincent Quint                       INRIA Rhone-Alpes
>W3C/INRIA                           ZIRST
>e-mail: Vincent.Quint@w3.org        655 avenue de l'Europe
>Tel.: +33 4 76 61 53 62             38330 Montbonnot St Martin
>Fax:  +33 4 76 61 52 07             France
>
>
>

Received on Tuesday, 16 February 1999 07:36:03 UTC