- From: Hugh Field-Richards <hsfr@hydra.dra.hmg.gb>
- Date: Fri, 3 Dec 1999 15:51:23 +0000
- To: www-xml-schema-comments@w3.org
Hi I recently posted an enquiry re DTDs and RDF on the RDF comment newsgroup. I thought I would expand on the problem that I see there is with XML DTD and schemas and fire it at the XML schema newsgroup. I am sorry if 1. this posting is a little overlong 2. what I am saying has been solved 3. in places I am stating the obvious, and 4. this is not the right newsgroup for the posting The reason that I posted the original comment was a preliminary enquiry on how containers (and other similarly general purpose) entities work within XML (the problem surfaced originally within my investigation of RDF. In fact the problem is more fundamental but the RDF container is a good focus. I believe that it is impossible to produce DTDs that define unambiguous structural content when RDF container tags are used. It leads to a problem in defining the overall structure of a set of meta-data, allowing any structural element to be contained, syntactical correctly, within this type of element, regardless of whether it is semantically valid to do so. As a result, with the current XML specifications we can construct well-formed documents-we can construct syntactically valid documents-but we are completely unable to construct semantically valid documents (i.e. meaningful) using these constructions. I would like someone to show how the XML schema approach can solve this problem. For DTD below I suspect that the words XML schema can be used interchangably. I will stick with DTD for moment. All XML (and other tagged systems based on SGML) are inherently a tree based structure. Every tagged component is wholly enclosed by another tag. This is shown by the nature of the DTD that provides details of what is contained by each tag within the content description---and, I believe, schemas does not help us any more in this regard. Each level's content is wholly defined, and has a relevance in the structure within which it sits. When a tag is used it has an explicit locality because of its existence within a single namespace. If we introduce a general purpose structure, such as <rdf:li>, which is defined outside our local context, then we have the problem of how does that new structure associate itself with its enclosing tag? Note that each list item within the list structure can contain any further structure-it is this that makes it general purpose. Any structure that appears within this list item, loses any structural context that the list item itself sits within. In other words there is no inheritance through the list structure. When we use this structure in several places it is impossible to impose any context through the list item to any enclosed structures. Thus it is impossible to have any explicit locality by means of the name space alone. It is worth putting another way: consider an array in a common programming language such as Pascal. An array (an ordered list of items) is declared and used as type seq = array[0..100] of integer; var list : seq; list[ 0 ] := 23; list[ 1 ] := 56; We can also say type seq1 = array[0..100] of char; var anotherList : seq1; anotherList[ 0 ] := 'a'; anotherList[ 1 ] := 'b'; The key point here is that while we have a common syntactical construction (type) for making ordered lists, we also have another mechanism (var) for providing semantic constructions. The key is how do we do this for our original problem? What we need is an implicit locality by means of position within the structure. Thus the list structure inherits a context from the enclosing structure, and the structure below the list would inherit that context from the list structure. Finally here is a simple example that I believe illustrates all this. Consider a simple DTD scrap <!ELEMENT rdf:seq ( rdf:li+ ) > <!ELEMENT rdf:li ( number | addr ) > -- I could write ANY here <!ELEMENT person ( tel, email ) > <!ELEMENT tel ( rdf:seq ) > <!ELEMENT number ( #PCDATA ) > <!ELEMENT email ( rdf:seq ) > <!ELEMENT addr ( #PCDATA ) > We would be able to write <tel> <rdf:seq> <rdf:li> <number>1234</number> <addr>hsfr@hydra.dra.hmg.gb</addr> </rdf:li> </rdf:seq> </tel> Any parser based on the above DTD would mark this as both well-formed and valid. But it would clearly be not what we intended. The current content model for the RDF is effectively allowing any element to appear - as Goldfarb says: "An element type that has an ANY [or equivalent:hsfr] content specification is completely unstructured." I would appreciate your comments on this, and where I have gone wrong. For us not being able to have a sematically valid document is a problem. Unless this problem is solved I believe it will be difficult to have any form of context sensitive editor, a huge problem. This is another area which is very important to us when using meta-data entered by unskilled personel. TIA Hugh F-R ------------------------------------------------------------------- Dr Hugh S. Field-Richards Defence Evaluation and Research Agency, St Andrew's Road, Malvern, Worcs, WR14 3PS, UK Tel: ++1684 895075 Fax: ++1684 896113 Email: hsfr@hydra.dra.hmg.gb The views expressed above area entirely those of the writer and do not represent the views, policy or understanding of any other person or official body. -------------------------------------------------------------------
Received on Monday, 6 December 1999 05:33:34 UTC