- From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
- Date: Sun, 06 Apr 1997 17:37:06 GMT
- To: w3c-sgml-wg@w3.org
I discovered with pleasure that a new version of XML-LINK has been mounted - I hope it's OK to discuss it, even tho' not announced? And I assume that this is the version that's being printed up for the W3C. I think it looks good and any comments below are minor criticisms. [As a RL outsider, I'd like to congratulate the ERB on the work that they have put in and the speed and harmony with which the drafts have been created. By comparison the chemical community has spent 10 years discussing the name of element 104 without resolution.] In message <2.2.32.19970402201845.00caf3ec@pop> "Steven J. DeRose" writes: > At 10:47 AM 03/11/97 -0400, you wrote: > > >Request for information: what TEI xptr implementations are there? And > >what do they implement? I'm going to be *very* reluctant to > >vote for anything, no matter how cool & peachy-keen, that nobody > >is actually using. I have coded most of the XML-LINK draft on TEI pointers (I need another hour for -ve numbers in PRECEDING). I notice that the draft has substantially changed (no complaints, I've read the pre-amble:-). Thanks for changing the tables in 3.2 3.3, etc. Much clearer now. 5.2 list item 3 'XML ID attribute'. This term is only meaningful for a document which is valid or at least has an ATTLIST for the given element. There is a (natural) assumption in this section that TEI searches will be extended primarily to valid documents, but I'd like to argue for WF documents as well. I think it's quite likely that TEI searches will be used on fragments because they are a very powerful method of rationalising partially structured data (indeed I am including the approach through my code). The point is that is that *people who don't know anything about SGML* will have no idea of the special significance of ID. They may well create documents with attributes named IDs which do not fulfil the uniqueness criterion (or the naming conventions). Since XML-LINK (but not XML-LANG) puts special emphasis on the attribute *name* ID as well as its type, this should be highlighted as a reserved word in XML-LANG. [This is not a trivial point - if a data object is referred to as an ID, then it's natural for a beginner to use that as an attribute name]. There are areas where the draft is unclear to someone who doesn't come from the extended TEI community and since the idea is that the draft is self contained, here are some: 5.3 Spans, etc. I suspect there are some well-known semantics to this word which are unknown to me and not in the draft. list item 2 refers to the TEI 'FROM' and 'TO' attributes, but I don't have a pointer to the TEI spec on this. This should be made more self-contained. 5.3.1 I am confused here. The result of evaluatiing a location term is always an element (i.e. it is either a single element or a properly nested tree of elements). However, for the ".." operator the result is "all of the text" from the first location (or the start of the element) to the end of the [text] or the element selected by the second series. For *one* location term I think this is clear - either an element with GI or a (complete) chunk of *CDATA (but not including other elements). For *two* terms, there seems to be no requirement that they start and end at the same level of the hierarchy. Thus is the example in 5.3.3 ("a sentence (A) with no embedded child elements", etc.) any contiguous subset (?span) makes sense. But what if B has children and the second location term stops 'somewhere in the middle of B'? Is this allowed? I appreciate that this makes most sense if the document is viewed as an event stream and that any span represents a 'chunk of marked up text'. But if the document has a complex structure then starting at one point in the tree and ending at a higher or lower level may be nonsense. I am also not quite clear what the word 'text' means. Is it synonymous with *CDATA? [I would prefer a term that made it clear that numbers, etc. were allowed here]. The example uses the phrase 'third span of character data', for example. I notice that PATTERN and FOREIGN have disappeared. Presumably there is no bar to applications using them - the constraint is that publicly visible XML addresses don't contain them, I assume. Since I believe that almost everyone is going to want applications to carry out PATTERN-based searches, it would be useful to have a generally agreed convention for the syntax (even if the regexs were different). P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/
Received on Sunday, 6 April 1997 13:17:44 UTC