Re: Document fragment vocabulary from Michael Hausenblas on 2011-08-24 (uri@w3.org from August 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Wed, 24 Aug 2011 07:52:34 +0100
To: Erik Wilde <dret@berkeley.edu>, Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Cc: URI list <uri@w3.org>
Message-Id: <3A8E6857-0D01-4F89-917F-41A87D23CA1B@deri.org>
> i am not quite sure how you could see CSV and XML and XHTML as  
> specialization of plain text. they do have different metamodels (at  
> least plain text and CSV and *ML) and thus need pretty different  
> approaches when it comes to fragment identification. i think the  
> problem you're having may be a well-know ugliness in web  
> architecture: fragment identifiers are specific for the media type,  
> but URIs are (often) not. this is just a design defect of the web,  
> and there;'s no easy way around it.
+1

Cheers,
	Michael
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 23 Aug 2011, at 22:46, Erik Wilde wrote:

> hello sebastian.
>
> On 2011-08-16 09:22 , Sebastian Hellmann wrote:
>> What is your suggestion then, what we should be doing? We consider
>> addressing fragments of text documents in general, with CSV and XML  
>> and
>> XHTML being specialisations. We might just add an additional
>> "type=RFC5147" to the fragment and then add several other types
>> ourselves: a stable one, one for morpho-syntax, etc.
>
> i am not quite sure how you could see CSV and XML and XHTML as  
> specialization of plain text. they do have different metamodels (at  
> least plain text and CSV and *ML) and thus need pretty different  
> approaches when it comes to fragment identification. i think the  
> problem you're having may be a well-know ugliness in web  
> architecture: fragment identifiers are specific for the media type,  
> but URIs are (often) not. this is just a design defect of the web,  
> and there;'s no easy way around it. sometimes people try to engineer  
> around it somehow, but as soon as you're starting to think about  
> decentralization and redirections, things typically fall apart. all  
> sorts of things have been proposed over the years to fix this  
> defect, but there it's a hard problem to solve in the general case  
> and without breaking backwards compatibility.
>
>> I still have the following questions:
>> - Do you know of any systems, that implement RFC5147?
>
> i've seen it being used for annotations locally, but i haven't seen  
> support in any widely used pieces of software.
>
>> - What was your original use case for designing the frag-ids?
>
> the ability to create hyperlinks for plain text files. creating a  
> link between a fragment of a plain text file and something else, for  
> example an annotation system for log files (which conveniently grow  
> very stable only by adding text at the end), saying "this line  
> really looks like something suspicious may have happened".
>
>> - Can you point me to a site where the less brittle version you
>> suggested are discussed? Or could you give an example? My proposal  
>> for
>> this is here: http://aksw.org/Projects/NIF#context-hash-nif-uri- 
>> recipe
>
> i would have to go back to earlier versions of the draft which i  
> have somewhere in my local archive, they may not be online anymore.  
> it has been a while, and all i know is that we had some regex-based  
> approach, which of course created the problem that *authoring* these  
> identifiers can be become quite a challenge with a lot of decisions  
> to be made. the advantage for the regex approach is that most  
> programming environments have regex implementations, so  
> implementation would have been easier than with a completely  
> proprietary method.
>
>> - Do you know of any benchmarking of the different URI approaches  
>> w.r.t.
>> to robustness, uniqueness, etc? I'm currently doing an evaluation so
>> please tell me, if I should include anything. I might include your
>> CSV-Frag Ids, but I would need some data that is changing (although I
>> could simulate it)
>
> i don't think you can make benchmarking without being very specific  
> about the scenario and use cases. which means you would need to have  
> a sample dataset of resources changing over time that would reflect  
> the scenario you are interested in, and then you could start  
> comparing approaches. without that, benchmarking would be pointless.
>
>> - What does "proposed standard" mean? This means, that the RFC is  
>> not a
>> standard, but only "proposed" ?
>
> that's just IETF terminology, don't worry about it.
>
> cheers,
>
> dret.
>
> -- 
> erik wilde | mailto:dret@berkeley.edu  -  tel:+1-510-6432253 |
>           | UC Berkeley  -  School of Information (ISchool) |
>           | http://dret.net/netdret http://twitter.com/dret |
Received on Wednesday, 24 August 2011 06:53:19 UTC