- From: Doug Schepers <schepers@w3.org>
- Date: Wed, 25 Mar 2015 18:01:44 -0400
- To: "Liam R. E. Quin" <liam@w3.org>, Bill Kasdorf <bkasdorf@apexcovantage.com>
- CC: W3C Public Annotation List <public-annotation@w3.org>
Hi, Liam– On 3/25/15 4:14 PM, Liam R. E. Quin wrote: > On Wed, 2015-03-25 at 18:57 +0000, Bill Kasdorf wrote: >> Liam had cited the specific example of a profit/loss statement; in >> those, red or parens for negative numbers are common, minus signs >> not so much. All of these are choices available in Excel, for >> example, which is what makes them so commonly used. > > To be fair, XPath does not have a built-in capacity for deciding > (3,000) is -3000, although it *does* have a way for providing such > smarts (including redness). But I didn't illustrate that. XPath > extensions look like functions, and in the XML world have an > associated namespace, browser:redness(), browser:hover() or whatever. > > Another issue with numbers is internationalization, with 1,300 meaning > different things in different parts of the world and to different > people. Another is representing scientific notation, 3.6×10¹² etc. But > one could imagine some functions, av:financial(), av:scientific(), > av:numeric() for this sort of use case (I used "av" for annotated > value, arbitrarily). XPath 3 gives users mechanisms to write such > functions themselves, but here, if annotations are important to the > financial industry, to scientific journals, to places where people add > and subtract :), there seems to be merit in such facilities, along > with other hand-to-write but already-understood notations such as > dates, times, sock sizes and so forth. Interesting. We are planning to have a (as-yet-ill-defined) "customSelector" attribute that lets developers define search mechanisms specific to their use case, so this could be done as part of that. They won't be as performant as the native functionality, but it will allow for more complex searching behavior, including whatever datatyping is needed (so long as it can be applied via script). Since we aren't dealing with datatyping natively, we sidestep all these issues of finding "types of things" vs strings. FWIW, the Levenshtein distance between "1,300" and "1.300" (another common thousands separator) or indeed "1300" is 1; that would be easily found using RangeFinder with an edit distance allowance. >> To avoid going further down a rabbit hole, I think the point here is >> that you can only query a given set of content for something that is >> semantically distinct in some way in that content. Some _known_ way. > > Yes - although even searching for a paragraph containing "rabbit" is > something that can be done in XPath and not CSS. Including textual > content in a fragment identifier (e.g. XPointer with the XPath scheme) > can be much more robust against changes in documents than using (e.g.) > numeric tumblers, :nth-child and so forth. > > SoftQuad's Panorama SGML viewer used to look for the nearest ID-valued > attribute and store a path from there to the highlight start and end, > for annotations. This turned out to be very robust in practice because > IDs (and HTML name attributes) tend to be stable over time with > respect to the content they contain. Butit needed things like XPath's > full parent and sibling navigation to do that (it predated XPath and > was based on HyTime and TEI Pointers, out of which background XPath > arose). Interesting background. I'm not convinced that the ID stability will be the same on Web documents on the whole; many don't use IDs at all, and those that do often change because they are generated by CMSes (e.g. MediaWiki, which automatically derives the ID from the heading text. But I am interested in other robustness mechanisms might have been developed around the time of XPath. > My goal really was to ask whether more complex functionality for > finding ranges was considered as in-scope as a complex data model, and > to probe that with some examples. Some types of complex behavior may be in scope (again, such things as case-folding and diacritic-folding), but not this particular complex functionality you're asking about. We're not trying to reinvent XPath. In fact, the current XPath selector (and querySelector) parts are probably going to be removed, in favor of a simple startRange selector. That way, an author can find the initial starting and scoping ranges themselves (using querySelector or an XPath selector or whatever they want), and simply feed that in generically. This will mean a less complex (and probably more perfomant) API to implement. Regards– –Doug
Received on Wednesday, 25 March 2015 22:01:51 UTC