Sean Mc Grath wrote
>On the web what constitutes a document is no longer a
>fixable object. Its only when you activate the links that the entities
>should be considered as included in the top level node.
>I understand how this follows from documents having multiple possible
>layers of links semantics that are "late bound" i.e. at parse time.
>What is the default set or is there one?
I'm not sure there can be one. Hopefully there will be a way of identifying
an author generated set of documents that must be accessible to make sense
of "a document" associated with each XML file. This is then augmented as you
go through a particular link within that file by the "required reading list"
for the file accessed if that is an XML file, otherwise it will remain in
memory as the current "essential reading history".
> Specifically, I am wondering what
>Web crawlers will need to do in order to harvest descriptive markup for
At present they are supposed to follow chains ad-infinitem. (In practice
they don't). With XML they can just follow the chains the author has
specified as being required, looking in those documents for any further
"required reading" documents.
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK
Phone/Fax: +44 1452 714029 WWW home page: http://www.u-net.com/~sgml/