Re Delenda est summary

At 09:34 AM 12/13/96 -0500, Gavin Nicol wrote:
>>>We seem to be confusing parsing XML, and parsing the grammar defined
>>>by the DTD is you ask me...
>>
>>But one of the important points about SGML (of which XML is a subset) is a
>>contract between the parser and the application: "I will not hand you data
>>which does not conform to the DTD." 
>
>Really? I've never seen it explicitly stated that this is true. Given
>that SGML is largely silent about error-handling, and grey on a lot of
>other fronts, it seems like this would be difficult to achieve at
>best.

If a validating parser can pass on data that doesn't conform to the DTD,
then what's the point of a validating parser? It always seemed to me that a
huge part of the reason for a validating parser was to allow application
writers to avoid special case code (e.g. "If there is a title in the
footnote, ignore it."). Forcing applications to discard whitespace is just a
special case of special case code.

>>Your solution would leave it up entirely to applications, which will (IMO)
>>almost inevitably lead to incompatibility.
>
>Depends. At least all the applications will know *exactly* what
>they'll be handed.

Sure. But when did application writers knowing exactly what the application
would process become more important than authors knowing? Moving all
whitespace elimination into the application domain leaves authors up to the
mercy of application writers.

> Why not write a filter that will rewrite your instances such that they 
> produce exactly the same parse tree in XML and SGML parsers? 

If we are presuming a preprocessor, then we can make the grammar arbitrarily
user-hostile. But preprocessing is user hostile.

At 08:51 AM 12/13/96 -0500, Gavin Nicol wrote:
>>Sure, as long as you are willing to a) do away with existing SGML parsers
>>and b) forgo reliable whitespace removal in element content, which will lead
>>to c) no whitespace in element content, or only whitespace in element
>>content according to the rules that Microsoft and Netscape dictate.
>
>This is not really true: the rules to eliminate unwanted whitespace
>would not be overly complex, and besides, if you are really so keen on
>doing exactly the same thing as current SGML tools, you can either:
>   a) Use them by using the declarations for XML
>   b) Add a validator/grove transformer that *requires* a DTD, and
>      which will give you what you want.

But this implies that as an author, I can control the parsing technologies
and conventions that my application uses. The way that authors typically
assert this form of control is through standardization. That's why we should
standardize one or two typical handlings for whitespace elimination, instead
of leaving it open to vendor interpretation. Otherwise, we are trading off
processing predictability for parsing simplicity.

As Lee says:

"In other words, white space should be retained by the XML reader, but should
be treated as whitespace and not PCDATA by a validating parser
checking a content model."

 Paul Prescod

Received on Friday, 13 December 1996 20:08:43 UTC