- From: Cameron McCormack <cam@mcc.id.au>
- Date: Mon, 30 Jul 2012 09:03:18 +1000
- To: Chaals McCathieNevile <w3b@chaals.com>
- CC: SVG public list <www-svg@w3.org>
Chaals McCathieNevile: > This is not a simple problem. Your proposal works so long as you put > everything in the right place first time - which is probably going to > happen in a majority of cases but not all. A similar approach is to have > sub-headings at pretty detailed granularity, so even if you move them > around, they exist. And if you remove something, changelogs help as a > place to collect the orphan ids. One technique that comes to mind is to assign each paragraph within a section a unique ID that is a hash of its text content. Whenever a change to the spec is made, we compute the edit distance between each paragraph in the section of the old revision of the spec and those in the new revision. (I think choosing the edit operations to work on a whole word rather than a character makes sense, and would help keep computation costs down.) For each paragraph in the new revision of the spec, we choose the corresponding paragraph in the old revision that has the lowest edit distance. If the edit distance is below some threshold, for example < 50% of the number of words in the old revision's paragraph, then we treat the new paragraph as being the same one and re-use the existing unique ID. Otherwise, we generate a new one. I'm not sure how well that particular threshold would work or if you'd want to add in additional heuristics, but it would be a start.
Received on Sunday, 29 July 2012 23:03:45 UTC