RE: Well-formed Blueberry from Julian Reschke on 2001-07-13 (www-xml-blueberry-comments@w3.org from July 2001)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 13 Jul 2001 16:59:24 +0200
To: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>, <xml-dev@lists.xml.org>
Cc: <www-xml-blueberry-comments@w3.org>
Message-ID: <JIEGINCHMLABHJBIGKBCKEHKCKAA.julian.reschke@gmx.de>
Although I like the idea of not producing "blueberry" when "1.0" would have
done: how would you *produce* these documents?

For instance, how would an XSLT processor whether certain name characters
will appear later in the output or not?

> -----Original Message-----
> From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu]
> Sent: Friday, July 13, 2001 4:49 PM
> To: xml-dev@lists.xml.org
> Cc: www-xml-blueberry-comments@w3.org
> Subject: Re: Well-formed Blueberry
>
>
> At 3:25 PM +0100 7/13/01, Rob Lugt wrote:
>
> >I can see a good reason for doing what you suggest, and I sympathise with
> >your comments but the fact is that your proposal would turn a trivial
> >implementation change into something much more difficult.  It could also
> >have a performance impact, so is unlikely to be popular with Parser
> >developers.
> >
>
> Not necessarily. Most correct parsers and other APIs already have
> to check whether each character is legal in an XML name.
> Blueberry doesn't really change that. It changes the list, but it
> doesn't change the fact that parsers need to maintain and consult
> against very large tables of characters and code points.
>
> I can see a number of ways to efficiently implement my proposal
> without a great deal of effort. One is to maintain two tables,
> one for XML 1.0 legal characters and one for the extra characters
> in Blueberry. This is probably necessary anyway to allow parsers
> to handle both kinds of documents.
>
> A typical parser would first check if a character was legal
> according to XML 1.0. Only if that failed, would it then check to
> see if the character was a legal Blueberry character. This is
> quite natural. At least one API (JDOM) and probably others
> already carefully choose which characters are checked in which
> order to improve efficiency for the common characters vs. the
> uncommon characters.
>
> In fact, to ease the handling of both kinnds of documents I'd
> expect there to be two separate method calls, one like
> isXMLNameCharacter() and one like isXMLBlueberryNameCharacter().
> The second method would only be called if the first returned
> false. (This is hardly the only way to do it, but it is one possibility.)
>
> Before parsing, the parser could set a boolean variable such as
> usesBlueberryCharacters to false. The
> isXMLBlueberryNameCharacter() could set this variable to true the
> first and every time it saw a blueberry character. Then the
> parsing was done, the parser would signal a well-formedness error
> if the variable was still false.
>
> Anyway, that's a very rough sketch, but you get the idea. The
> storage of the one extra boolean, and the setting of it each time
> a Blueberrry character is seen is trivial compared to the table
> lookup overhead that parsers do at this stage anyway.
>
> If I were revising JDOM to handle Blueberry (I pick JDOM just
> because its the only API whose internals I'm familiar with)
> setting up the tables for the new Blueberry characters would take
> as long or longer than implementing the scheme I just described.
> (JDOM isn't a parser but it does perform parser-like name checks.)
>
> >Wouldn't a better solution be one of education and market
> forces?  Just like
> >most people write backwards-compatible HTML today, most people
> will continue
> >to write backwards-compatible XML tomorrow for the simple reason
> that they
> >want it to be interoperable.
>
>
> As somebody who spends most of my time educating people about XML
> and related technologies, I don't want to leave to education
> anything we can enforce in the code. I will most certainly warn
> people through my books and seminars not to mark their documents
> as Blueberry when they don't need to. But I still know I'll
> encounter masses of developers who've half-read the specs, and
> skimmed some half-accurate books or articles. Lord knows I've
> encouraged enough brain damage in my earlier books that I don't
> want to rely on books or any other form of education as being the
> sole solution to a potentially nasty problem.
> --
>
> +-----------------------+------------------------+-------------------+
> | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
> +-----------------------+------------------------+-------------------+
> |          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
> |              http://www.ibiblio.org/xml/books/bible2/              |
> |   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
> +----------------------------------+---------------------------------+
> |  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
> |  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
> +----------------------------------+---------------------------------+
>
> ------------------------------------------------------------------
> The xml-dev list is sponsored by XML.org, an initiative of OASIS
> <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: xml-dev-request@lists.xml.org
>
Received on Friday, 13 July 2001 10:59:56 UTC