Re: XMLP WG Response on "SOAP and the Internal Subset"

Scott Lawrence writes:

>> Supporting entity substitutions other than the required minimum would
>> have had a fairly large effect on code size and complexity.  The
>> largest and most troublsome effect was on the buffer management - the
>> minimum required entities are all larger than the text that they turn
>> into internally, so they just collapse the data within the existing
>> buffer(s), but that's not true in the general case.

Thanks Scott.  Turns out, this was among the optimizations we at IBM had 
noticed, and was among the ones I had in mind when preparing input to the 
XMLP workgroup response.  So, that's at least two independent 
organizations doing implementations with similar insights and intuitions 
regarding the tradeoffs involved in supporting entities. 

BTW:  several have asked whether there would have been a cost to allowing 
entities in the case where the instance did not in fact use entities. 
Well, as you say, there's often a cost in code footprint, unless you have 
a way of acquiring the code dynamically.  Unless you're very careful, 
there's also potentially a cost in terms of levels of indirection to the 
various potentially discontinguous buffers, unless you're willing to build 
two versions of your code and switch to the "no buffer management" version 
when you discover that there are indeed no entity definitions.  That also 
involves more testing cost for the alternate paths, etc.  Regarding those 
who have asked for specific performance numbers, I can't say that we in 
IBM have built controlled implementations, one with and one without just 
the internal subset optimizations.    As I said in my earlier note, one 
tends to make combinations of optimizations together.  Our experience is 
that in combination it is possible to use such techniques to get very 
significant improvements over what would be typical of full function 
parsers.

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------

Received on Wednesday, 11 December 2002 14:21:20 UTC