Re: Fwd: Re: Document fragment vocabulary

hello sebastian.

On 2011-08-16 09:22 , Sebastian Hellmann wrote:
> What is your suggestion then, what we should be doing? We consider
> addressing fragments of text documents in general, with CSV and XML and
> XHTML being specialisations. We might just add an additional
> "type=RFC5147" to the fragment and then add several other types
> ourselves: a stable one, one for morpho-syntax, etc.

i am not quite sure how you could see CSV and XML and XHTML as 
specialization of plain text. they do have different metamodels (at 
least plain text and CSV and *ML) and thus need pretty different 
approaches when it comes to fragment identification. i think the problem 
you're having may be a well-know ugliness in web architecture: fragment 
identifiers are specific for the media type, but URIs are (often) not. 
this is just a design defect of the web, and there;'s no easy way around 
it. sometimes people try to engineer around it somehow, but as soon as 
you're starting to think about decentralization and redirections, things 
typically fall apart. all sorts of things have been proposed over the 
years to fix this defect, but there it's a hard problem to solve in the 
general case and without breaking backwards compatibility.

> I still have the following questions:
> - Do you know of any systems, that implement RFC5147?

i've seen it being used for annotations locally, but i haven't seen 
support in any widely used pieces of software.

> - What was your original use case for designing the frag-ids?

the ability to create hyperlinks for plain text files. creating a link 
between a fragment of a plain text file and something else, for example 
an annotation system for log files (which conveniently grow very stable 
only by adding text at the end), saying "this line really looks like 
something suspicious may have happened".

> - Can you point me to a site where the less brittle version you
> suggested are discussed? Or could you give an example? My proposal for
> this is here: http://aksw.org/Projects/NIF#context-hash-nif-uri-recipe

i would have to go back to earlier versions of the draft which i have 
somewhere in my local archive, they may not be online anymore. it has 
been a while, and all i know is that we had some regex-based approach, 
which of course created the problem that *authoring* these identifiers 
can be become quite a challenge with a lot of decisions to be made. the 
advantage for the regex approach is that most programming environments 
have regex implementations, so implementation would have been easier 
than with a completely proprietary method.

> - Do you know of any benchmarking of the different URI approaches w.r.t.
> to robustness, uniqueness, etc? I'm currently doing an evaluation so
> please tell me, if I should include anything. I might include your
> CSV-Frag Ids, but I would need some data that is changing (although I
> could simulate it)

i don't think you can make benchmarking without being very specific 
about the scenario and use cases. which means you would need to have a 
sample dataset of resources changing over time that would reflect the 
scenario you are interested in, and then you could start comparing 
approaches. without that, benchmarking would be pointless.

> - What does "proposed standard" mean? This means, that the RFC is not a
> standard, but only "proposed" ?

that's just IETF terminology, don't worry about it.

cheers,

dret.

-- 
erik wilde | mailto:dret@berkeley.edu  -  tel:+1-510-6432253 |
            | UC Berkeley  -  School of Information (ISchool) |
            | http://dret.net/netdret http://twitter.com/dret |

Received on Tuesday, 23 August 2011 21:46:36 UTC