RE: XML within XML - includes, transcludes, whatever from Gavin Thomas Nicol on 2000-11-01 (xml-dist-app@w3.org from November 2000)

From: Gavin Thomas Nicol <gtn@ebt.com>
Date: Wed, 1 Nov 2000 17:34:42 -0500
To: <xml-dist-app@w3.org>
Message-ID: <NCBBJNEMNEOKNGLADMAHOECBBFAC.gtn@ebt.com>

> Ah!  I'm not familiar with the intimate details of the current 
> parsers (parse and validate in one API call) but it may very 
> well be that this breaks down into a pass through the instance 
> to get the parse and a pass through the post-parse form (say 
> a DOM) to do the validation

Some do it this way, some don't. The impact on performance is
only significant if you want early (fatal) failure... otherwise
it's pretty much a wash. I think the general design trend, 
especially with schemas vs. DTD's, is toward a two-step approach.

> If so, what diff if we defer the validation step until later? 

I think there is a good case to be made for deferred validation...
and with the internet, the general rule is "be strict in what you
generate, liberal in what you accept" anyway. 

> ...then does the approach require a parse of all sections of the 
> original instance at once? 

The question is how do you reach the part you're interested in at
the time. SGML basically forced you to parse the document from
start to finish, but with XML, there is no reason why you couldn't
have a specialised "router" that used string matching to find
the start and end of the routing information, and only parse *that*
sequence of characters. 

For the size of the XML we're talking about here, I doubt the 
performance benefit would be worth it though.

If I were to do the routing application, I would have a SAX 
handler looking for the block I was interested in (probably in
a data driven way so that I would change the schema easily).
If I was concerned about validation, I might build a DOM out of
it and then validate the fragment using that, otherwise I'd
just extract the routing information from SAX directly.

Received on Wednesday, 1 November 2000 17:27:57 UTC