- From: Maciej Stachowiak <mjs@apple.com>
- Date: Wed, 17 Mar 2010 04:15:33 -0700
- To: Graham Klyne <GK-lists@ninebynine.org>
- Cc: Larry Masinter <LMM@acm.org>, 'Dan Connolly' <connolly@w3.org>, "'Michael(tm) Smith'" <mike@w3.org>, noah_mendelsohn@us.ibm.com, 'Paul Cotton' <paul.cotton@microsoft.com>, 'Philippe Le Hegaret' <plh@w3.org>, 'Sam Ruby' <rubys@intertwingly.net>, www-tag@w3.org
On Mar 17, 2010, at 3:53 AM, Graham Klyne wrote: > OK, now I understand better where you are coming from. > > All of which I guess underscores Larry's point: it's hard (if not > generally impossible) to use a grammar/schema/other-formal- > description to check *all* aspects of program/input correctness, but > that doesn't take away from the value of using one to validate those > aspects that are amenable to such validation. > > In my experience, it is often the process of expressing/reviewing a > language in some formalism that is of greatest value, for > understanding implications of and problems in its design. I believe > Dan Connolly reported some similar experiences w.r.t. XQuery a few > years ago (Amsterdam WWW conference, developer day, IIRC). I think that is probably true if one is truly inventing a syntax. But schemas for markup languages generally assume the surface syntax is all taken care of and describe how the resulting pieces are allowed to be assembled. > > <aside> > (I'm not sure about the HTML lexer, but an XML lexer can't (easily) > be described in terms of a finite state machine because of context > sensitivity of the tokenization process - something I learned trying > to fix up an XML parser written in Haskell, which might in turn be > regarded in some ways as being pretty close to a general-purpose, > machine processable formal specification language.) > </aside> You can check it out yourself if you want: <http://dev.w3.org/html5/spec/Overview.html#tokenization > My hypothesis that it's expressible as an FSM is based on the fact that the specification is explicitly in terms of input characters and resulting state transitions. Although I may have missed instances of reading hidden unbounded state. There is also the fact that side effects can modify the input stream in the middle of parsing, but I think the tokenizer in isolation is still an FSM. Regards, Maciej
Received on Wednesday, 17 March 2010 11:16:07 UTC