- From: James Clark <jjc@jclark.com>
- Date: Fri, 11 Apr 1997 11:46:18 +0700
- To: w3c-sgml-wg@w3.org
At 20:04 10/04/97 -0700, Paul Grosso wrote: >In discussions with others over that last couple days, I've come >to the conclusion we should consider added to xml-link the capability >to address into data character content (aka dataloc). > >The requirement I see is that users will expect an interface that >allows them to highlight some text in one document, highlight some >text in a second document, and make a link from one to the other. >If the target is a three word phrase in the middle of a very long >paragraph element, making the entire paragraph the target is unacceptable. I don't think the issue here is so much whether this is a desirable capability but whether it can be done robustly and whether it can be implemented easily. >Note that Char != byte, but if we can expect the XML processor to know what >Char is when it's parsing an XML file, I figure we can expect it to know >what a Char is when it's addressing into an XML file. There are many things in addition to the char/byte distinction that can mess things up: - line terminators: you move your document from a Unix to a DOS system and suddenly all your links break because your lines now end with CR/LF rather than LF. - RS/RE ignoring rules: you parse with an SGML-based XML parser, which does its standard RS/RE ignoring thing - white space collapsing: consider an application that by default does white-space collapsing a la HTML I do not believe simple char counting is going to be robust. Counting just non-white space characters would be an improvement but still quite fragile. Counting words or tokens doesn't work for many Asian languages. One possibility would be something like: STRING ("making the entire paragraph the target is unacceptable" 1) ("the" 2) to find the second occurrence of the string "the" in the first occurrence of the string "making the...unacceptable" in the location source. However, I still think this would be too hard for XML. In particular I think you are asking a lot of a style sheet mechanism to be able to attach styles to arbitrary spans of character data that are not marked up as elements. James
Received on Friday, 11 April 1997 00:59:20 UTC