Re: including a schema with "HTML: The Markup Language" Clarifying TAG Re: Courtesy notification

On Mar 17, 2010, at 3:53 AM, Graham Klyne wrote:

> OK, now I understand better where you are coming from.
>
> All of which I guess underscores Larry's point: it's hard (if not  
> generally impossible) to use a grammar/schema/other-formal- 
> description to check *all* aspects of program/input correctness, but  
> that doesn't take away from the value of using one to validate those  
> aspects that are amenable to such validation.
>
> In my experience, it is often the process of expressing/reviewing a  
> language in some formalism that is of greatest value, for  
> understanding implications of and problems in its design.  I believe  
> Dan Connolly reported some similar experiences w.r.t. XQuery a few  
> years ago (Amsterdam WWW conference, developer day, IIRC).

I think that is probably true if one is truly inventing a syntax. But  
schemas for markup languages generally assume the surface syntax is  
all taken care of and describe how the resulting pieces are allowed to  
be assembled.


>
> <aside>
> (I'm not sure about the HTML lexer, but an XML lexer can't (easily)  
> be described in terms of a finite state machine because of context  
> sensitivity of the tokenization process - something I learned trying  
> to fix up an XML parser written in Haskell, which might in turn be  
> regarded in some ways as being pretty close to a general-purpose,  
> machine processable formal specification language.)
> </aside>

You can check it out yourself if you want: <http://dev.w3.org/html5/spec/Overview.html#tokenization 
 >

My hypothesis that it's expressible as an FSM is based on the fact  
that the specification is explicitly in terms of input characters and  
resulting state transitions. Although I may have missed instances of  
reading hidden unbounded state. There is also the fact that side  
effects can modify the input stream in the middle of parsing, but I  
think the tokenizer in isolation is still an FSM.

Regards,
Maciej

Received on Wednesday, 17 March 2010 11:16:07 UTC