- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Fri, 13 Jul 2001 16:59:24 +0200
- To: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>, <xml-dev@lists.xml.org>
- Cc: <www-xml-blueberry-comments@w3.org>
Although I like the idea of not producing "blueberry" when "1.0" would have done: how would you *produce* these documents? For instance, how would an XSLT processor whether certain name characters will appear later in the output or not? > -----Original Message----- > From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu] > Sent: Friday, July 13, 2001 4:49 PM > To: xml-dev@lists.xml.org > Cc: www-xml-blueberry-comments@w3.org > Subject: Re: Well-formed Blueberry > > > At 3:25 PM +0100 7/13/01, Rob Lugt wrote: > > >I can see a good reason for doing what you suggest, and I sympathise with > >your comments but the fact is that your proposal would turn a trivial > >implementation change into something much more difficult. It could also > >have a performance impact, so is unlikely to be popular with Parser > >developers. > > > > Not necessarily. Most correct parsers and other APIs already have > to check whether each character is legal in an XML name. > Blueberry doesn't really change that. It changes the list, but it > doesn't change the fact that parsers need to maintain and consult > against very large tables of characters and code points. > > I can see a number of ways to efficiently implement my proposal > without a great deal of effort. One is to maintain two tables, > one for XML 1.0 legal characters and one for the extra characters > in Blueberry. This is probably necessary anyway to allow parsers > to handle both kinds of documents. > > A typical parser would first check if a character was legal > according to XML 1.0. Only if that failed, would it then check to > see if the character was a legal Blueberry character. This is > quite natural. At least one API (JDOM) and probably others > already carefully choose which characters are checked in which > order to improve efficiency for the common characters vs. the > uncommon characters. > > In fact, to ease the handling of both kinnds of documents I'd > expect there to be two separate method calls, one like > isXMLNameCharacter() and one like isXMLBlueberryNameCharacter(). > The second method would only be called if the first returned > false. (This is hardly the only way to do it, but it is one possibility.) > > Before parsing, the parser could set a boolean variable such as > usesBlueberryCharacters to false. The > isXMLBlueberryNameCharacter() could set this variable to true the > first and every time it saw a blueberry character. Then the > parsing was done, the parser would signal a well-formedness error > if the variable was still false. > > Anyway, that's a very rough sketch, but you get the idea. The > storage of the one extra boolean, and the setting of it each time > a Blueberrry character is seen is trivial compared to the table > lookup overhead that parsers do at this stage anyway. > > If I were revising JDOM to handle Blueberry (I pick JDOM just > because its the only API whose internals I'm familiar with) > setting up the tables for the new Blueberry characters would take > as long or longer than implementing the scheme I just described. > (JDOM isn't a parser but it does perform parser-like name checks.) > > >Wouldn't a better solution be one of education and market > forces? Just like > >most people write backwards-compatible HTML today, most people > will continue > >to write backwards-compatible XML tomorrow for the simple reason > that they > >want it to be interoperable. > > > As somebody who spends most of my time educating people about XML > and related technologies, I don't want to leave to education > anything we can enforce in the code. I will most certainly warn > people through my books and seminars not to mark their documents > as Blueberry when they don't need to. But I still know I'll > encounter masses of developers who've half-read the specs, and > skimmed some half-accurate books or articles. Lord knows I've > encouraged enough brain damage in my earlier books that I don't > want to rely on books or any other form of education as being the > sole solution to a potentially nasty problem. > -- > > +-----------------------+------------------------+-------------------+ > | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer | > +-----------------------+------------------------+-------------------+ > | The XML Bible, 2nd Edition (Hungry Minds, 2001) | > | http://www.ibiblio.org/xml/books/bible2/ | > | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | > +----------------------------------+---------------------------------+ > | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | > | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | > +----------------------------------+---------------------------------+ > > ------------------------------------------------------------------ > The xml-dev list is sponsored by XML.org, an initiative of OASIS > <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To unsubscribe from this elist send a message with the single word > "unsubscribe" in the body to: xml-dev-request@lists.xml.org >
Received on Friday, 13 July 2001 10:59:56 UTC