- From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Date: Wed, 17 Aug 2011 01:22:13 +0900
- To: Erik Wilde <dret@berkeley.edu>
- CC: uri@w3.org, Michael Hausenblas <michael.hausenblas@deri.org>
Hello Erik, Am 16.08.2011 20:38, schrieb Erik Wilde: > hello. > > On 2011-08-16 10:36 , Sebastian Hellmann wrote: >> RFC5147 provides integrity checks, but there is no proposal that >> produces robust fragment IDs. e.g. something that works on the context >> and not on line or position. A change in the document on position 0 >> might render all fragment ids obsolete. E.g. "#range=(574,585)" would >> not be valid any more, if one character was inserted at the beginning of >> the document, changing the index. > > being one of the authors of this RFC, i'd like to point out that the > initial ideas were quite a bit more complicated and included features > similar to what you are looking for. however, during the process of > getting community support, it became clear that the preference of most > people was to have simpler and easier to implement fragment identifier > features. this does make them more brittle, but things on the web can > break, and even a more complicated feature set would only have made > them less likely to break. in the end, i think it was good that the > final RFC ended up being simple and easy to understand and implement, > but it definitely may not be enough for your use cases. Easier to implement is only one aspect and I can understand that this was one of the major criteria for the community as it seems to be an easy common denominator. The format we are creating for LOD2 is for a Natural Language Processing developer community. I doubt, that they would be scared by a more complex URI pattern, but would rather embrace any offered advantages such as a tool annotating a web page and the frag-IDs either stay robust or can be corrected automatically. The different patterns will be implemented for several dozen NLP tools over the project lifetime of LOD2. What is your suggestion then, what we should be doing? We consider addressing fragments of text documents in general, with CSV and XML and XHTML being specialisations. We might just add an additional "type=RFC5147" to the fragment and then add several other types ourselves: a stable one, one for morpho-syntax, etc. I still have the following questions: - Do you know of any systems, that implement RFC5147? - What was your original use case for designing the frag-ids? - Can you point me to a site where the less brittle version you suggested are discussed? Or could you give an example? My proposal for this is here: http://aksw.org/Projects/NIF#context-hash-nif-uri-recipe - Do you know of any benchmarking of the different URI approaches w.r.t. to robustness, uniqueness, etc? I'm currently doing an evaluation so please tell me, if I should include anything. I might include your CSV-Frag Ids, but I would need some data that is changing (although I could simulate it) - What does "proposed standard" mean? This means, that the RFC is not a standard, but only "proposed" ? Thanks for your answers, Sebastian -- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org
Received on Tuesday, 16 August 2011 16:23:11 UTC