- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Wed, 3 Oct 2001 22:00:05 +1000
- To: "Xmldev \(E-mail\)" <xml-dev@lists.xml.org>
- Cc: <wai-tech-comments@w3.org>
From: "SHARPE, Ian" <Ian.SHARPE@cambridge.sema.slb.com> > In short XML itself will not help us but by its nature and infrastructure > can be presented in a more accessible way while people who have sight can > still use a nasty GUI thing. One of the biggest flaws in markup languages for prose is that we have no way to express the rhetorical connection between elements in a generic way. Let us compress all rhetorical relations into two interesting kinds: superior (e.g.. a title, summary, gist, introduction) inferior (e.g. a digression, footnote, expansion, restatement) For example, what can tell a machine that in the following XML document <movie><refnum>m26</refnum> <title>Momento</title> <reviewSummary>I cannot remember</reviewSummary> <review>After seeing this movie I was involved in an accident in which my memory was lost. So there is no review.</review> <comment from="Bob">Too long</comment> <comment from="Kai">Too short</coment> <comment from="Guy">What kind of accident?</comment> <comment from="Pede">You spelled Memento wrong!</comment> </movie> that the reviewSummary is a superior of the review, that the third comment is an inferior of the review, that the second comment is an inferior of the first, and that the fourth comment is an inferior of the title? Why would we want to? Because then we could have a "rhetoric-aware" browser, that could take any prose and allow better navigation/indexing/collapsing/navigation. For example, to present a user with superior information before inferior (e.g. an extracted list of titles). How could this be done? 1) RDF, but this is not inline, and we need to get definitions for the rhetoric, and it involves creating a special document and following their conventions. 2) Topic Maps, but again not inline. 3) A brute force way is to allow two IDREFS attributes on every element. <movie rhet:SuperiorIs="mem"><refnum>m26</refnum> <title id="mem">Momento</title> <reviewSummary id="rs">I cannot remember</reviewSummary> <review rhet:superiorIs="rs" is="rev" >After seeing this movie I was involved in an accident in which my memory was lost. So there is no review.</review> <comment from="Bob" id="c1" >Too long</comment> <comment from="Kai" id="c2" inferiorOf="c1">Too short</coment> <comment from="Guy" id="c3" inferiorOf="rev">What kind of accident?</comment> <comment from="Pede" id="c4" inferiorOf="mem">You spelled Memento wrong!</comment> </movie> This rhetorical annotation gives enough information to, for example automatically generate a page such as <html> <title>Momento</title> <body> <h1>Momento <a href="#r1">*</a></h1> <p>After seeing this movie I was involved in an accident in which my memory was lost. So there is no review. <a href="#r2">**</a></p> <hr/> <p>Too long <a href="#r3">***</a></p> <hr/> <p id="r3">Too short</p> <hr /> <p id="r2">What kind of accident?</p> <hr /> <p id="r1">* You spelled Memento wrong!</p> </body> </html> Obviously there are more rhethorical patterns (such as that one element follows another in narrative sequence), but it should be clear from the above that if you suddenly had to render your prose suitable for people who cannot get information in large chunks satisfactorily (e.g. people on PDAs, WAP or speech synthesizers) then the annotated version would be much better. Now it might be said: oh, that is just bad structuring in the original example: you should have nested the comments together into thread <movie><refnum>m26</refnum> <title id="mem">Momento</title> <reviewSection> <reviewSummary id="rs">I cannot remember</reviewSummary> <review rhet:superiorIs="rs" is="rev" >After seeing this movie I was involved in an accident in which my memory was lost. So there is no review.</review> <reviewSection> <commentSection> <thread> <comment from="Bob"><text>Too long</text> <comment from="Kai" ><text>Too short</text></comment> </comment> </thread> <thread> <comment from="Guy"><text>What kind of accident?</text></comment> <comment from="Pede"><text>You spelled Memento wrong!</text></comment> <commentSection> </movie> Sure, that gives us more structure in the comments, which a generic outlining system would work OK with. But that does not help in filtering out interspersed non-prose data (e.g. metadata) such as the refnum element. We cannot assume that the first element in a block of an XML document containing prose functions as the title. It might also be said, why not just markup the document with the annotations inline and the convention that the first element is the superior of the parent? <movie> <title id="mem">Momento <comment from="Pede" >You spelled Memento wrong!</comment> </title> <reviewSummary id="rs">I cannot remember <review>After seeing this movie I was involved in an accident in which my memory was lost. So there is no review. <comment from="Guy" >What kind of accident?</comment> </review> </reviewSummary> <refnum>m26</refnum> <comment from="Bob" >Too long <comment from="Kai">Too short</coment> </comment> </movie> Well, that brings us closer to the rhetorical structure we are interested in, but unfortunately one simply cannot say that the first element is a superior. It might be a head/body pattern, where the first child of the second child would often be the title, instead. 4) Another way would be to allow an XPath on each element, which defaults to the path from the current element to its superior or inferior. For example <!ATTLIST movie xlink:type CDATA "simple" xlink:href CDATA "./reviewSummary" xlink:role CDATA "rhet:inferior" > (or whatever). That would reduce the burdon of markup from the instances to a great extent. 5) Another way to do it is through archectural forms: to save a fixed number of patterns (such as at www.xmlpatterns.org) and use attributes to annotate our instances with information: such as <movie form="titledContainer"><refnum>m26</refnum> <title form="titledContainerTitle">Momento</title> <reviewSummary>I cannot remember</reviewSummary> <review>After seeing this movie I was involved in an accident in which my memory was lost. So there is no review.</review> <comment from="Bob">Too long</comment> <comment from="Kai">Too short</coment> <comment from="Guy">What kind of accident?</comment> <comment from="Pede">You spelled Memento wrong!</comment> </movie> But that just gives us position- and name-independence of hierarchical data; not bad, but it doesn't capture much. So 3 or 4 look better to me. Anyway, the bottom line is that we need some markup conventions to say what the bottom line (or the title, etc) is. When I read through the various accessibility guidelines, it seems a constant theme that we need clear and concise navigation trails: in the case of XML, these need to be authored in: HTML pages tend to be small and hyperlinked and the H* hierarchy is available if abused, so perhaps this is more an issue for prose in XML rather than prose in HTML. Cheers Rick Jelliffe
Received on Wednesday, 3 October 2001 07:53:10 UTC