- From: David G. Durand <dgd@cs.bu.edu>
- Date: Wed, 2 Oct 1996 13:44:08 -0400
- To: w3c-sgml-wg@w3.org
Main points of this note: 1. I was wrong earlier, SGML content models are strictly less-expressive than full regular expressions 2. Since the differences are a proven cause of confusion, we should just use regular expressions (the standard literature offers cookbook solutions for easier and slower as well as harder but faster faster implementations. Content-model verification is unlikely to be the bottleneck in a parser anyway. Because of 1, 2 implies that some XML content models will be stronger than their translations to SGML content models. As Michael reminded me in a private note, SGML content models are _not_ equivalent to DFAs. They are strictly less-powerful. He also gave a counterexample: ((a,b)*, a?) This has a deterministic parsing automaton, but no non-ambiguous RE (in the sense of SGML). As Michael reminded me, Anne Brueggemann-Klein's papers on SGML content models describe the relevant theory and prove some useful results. On re-reading rather shan skimming the papers, I felt better because I finally had a clear definition of ambiguity, and an algorithm to test it, but it reinforced my earlier comment: >If you haven't guessed, I'd rather eliminate the ambiguity rules and be >able to use a rich and well-understood theory instead of having to help >invent a new theory that only allows things we might rather not allow >anyway. The Brueggemann-Klein papers (I have tech reports, but they may have made it to a journal by now) and the key literature they cite are relatively obscure, compared to the stnadard RE algorithms. It's particularly bad that the results are not in the standard books like the Dragon book. If there's one book that everyone (even undergrads) already knows they have to have if they're going to write a parser, that is it. Michael added: >Doing the instance-to-model match would make perfect sense if you >expected application actions to be defined on 'the third X token of >the content model'. Maybe some members of WG8 expected that to >happen. But I've never seen software that allowed such specifications, >and I have a hard time imagining users who would want it, either. Since people _don't_ do it now, that's an argument that we needn't worry about what they might do, if they did have it. -- David RE delenda est. --------------------------------------------+-------------------------- David Durand dgd@cs.bu.edu | david@dynamicDiagrams.com Boston University Computer Science | Dynamic Diagrams http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
Received on Wednesday, 2 October 1996 13:40:17 UTC