Fulltext queries on XML Documents

XML documents consists of two main (intermixed) parts: the structure of the
document (DOM) and some content, which is fulltext.


In the actual "XML Query Requirements" I'm missing requirements for fulltext
search capabilities on the content of XML documents. The focus of the paper is
mainly on the structure of XML documents. To be accepted by a great number of
industry partners, it is necessary for any query language to support querys
on the document's structure as well as querys on their text content.


With (e.g.) XQL you have a query language, which has some stemming functions via
'*'. But thinks like this are very simple and for a fulltext query not
sufficient. There have to be a lot more of functionallities and definitions:

-  what defines a word, a sentence, a paragraph?
-  word distances
-  phrase searching
-  a technik for synonymes, abbreviations, ...
-  case (in)sensitiveness
-  stopwords


On the other hand we have a standard supporting fulltext but only minimal
structures: "ISO 13249-2: Multimedia an Application Packages - FullText" (as
part of SQL-3). There you can find a lot of the fulltext requirements I'm
missing in "XML Query Requirements".



Jürgen Purtz

mail:  purtz@t-online.de

Received on Thursday, 30 March 2000 09:36:33 UTC