Re: Document fragment vocabulary from Michael Hausenblas on 2011-08-16 (public-lod@w3.org from August 2011)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Tue, 16 Aug 2011 06:12:55 +0100
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Cc: Michael Martin <martin@informatik.uni-leipzig.de>, public-lod <public-lod@w3.org>, Alexander Dutton <alexander.dutton@oucs.ox.ac.uk>
Message-Id: <43541A55-0D9B-4FE5-859B-C1AB12E4806A@deri.org>
> It is not really LinkedData friendly.



Why?


> @Michael: is there some standardisation respective URIs for text   
> going on?


As you've rightly identified, an RFC already exists. What would this  
new standardisation activity be chartered for?

As and aside, this reminds me a bit of http://xkcd.com/927/


> The approach by Wilde and Dürst[1] seems to lack stability.


I don't know what you mean by this. Lack of take-up, yes. Stability,  
what's that?



> Do you think we could do such standardisation for document fragments  
> and text fragments within the Media Fragments Group[3] ?



No. Disclaimer: I'm a MF WG member. Look at our charter [1] ...


Maybe this thread should slowly be moved over to uri@w3.org [2]?


Cheers,
	Michael

[1] http://www.w3.org/2008/01/media-fragments-wg.html
[2] http://lists.w3.org/Archives/Public/uri/
--
Dr. Michael Hausenblas, Research Fellow
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html

On 16 Aug 2011, at 05:40, Sebastian Hellmann wrote:

> Hi Michael and Alex,
> sorry to answer so late, I was in holiday in France.
> I looked at the three provided resources [1,2,3] and there are still  
> some comments and questions I have.
>
> 1. The part after the # is actually not sent to the server. Are  
> there any solutions for this? It is not really LinkedData friendly.
> Compare http://linkedgeodata.org/triplify/near/51.033333,13.733333/1000/class/Amenity
> (Currently not working, but it gives all points within a 1000m radius)
>
> The client would be required to calculate the subset of triples from  
> the resource, that are addressed.
>
> 2. [1] is quite basic and they are basically using position and  
> lines. I made a qualitative comparison of different fragment id  
> approaches for text in [4] slide 7.
> I was wondering if anybody has researched such properties of URI  
> fragments. Currently, I am benchmarking stability of these uris  
> using Wikipedia changes.
> Has such work been done before?
>
> 3. @Alex: In my opinion, your proposed fragment ontology can  only  
> be used to provide documentation for different fragments.
> I would rather propose to just use one triple:
> <http://www.w3.org/DesignIssues/LinkedData.html#offset__14406-14418>  
> a <http://nlp2rdf.lod2.eu/schema/string/OffsetBasedString>
> The ontology I made for Strings might be generalized for formats  
> other than text based [5]
> One triple is much shorter. As you can see I also tried to encode  
> the type of fragment right into the fragment "offset", although a  
> notation like "type=offset"  might be better.
>
> 4.  @Michael: is there some standardisation respective URIs for  
> text  going on?
> I heard there would be a Language Technology W3C group. The approach  
> by Wilde and Dürst[1] seems to lack stability.
> Do you think we could do such standardisation for document fragments  
> and text fragments within the Media Fragments Group[3] ?
> I really thought the liveUrl project was quite good, but it seems  
> dead[6].
>
>
> In LOD2[7] and NIF[8] we will need some fragment identifiers to  
> Standardize NLP tools for the LOD2 stack.
> It would be great to reuse stuff instead of starting from scratch. I  
> had to extend [1] for example, because it did not produce stable  
> uris and also it did not contain the type of algorithm used to  
> produce the URI.
>
> All the best,
> Sebastian
>
>
> [1] http://tools.ietf.org/html/rfc5147
> [2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment
> [3] http://www.w3.org/TR/media-frags/
> [4] http://www.slideshare.net/kurzum/nif-nlp-interchange-format
> [5] http://nlp2rdf.lod2.eu/schema/string/
> [6] http://liveurls.mozdev.org/index.html
> [7] http://lod2.eu
> [8] http://aksw.org/Projects/NIF
>
> Am 04.08.2011 22:37, schrieb Michael Hausenblas:
>>
>>
>> Alex,
>>
>>> Has something already done this? Is it even (mostly?) sane?
>>
>> Sane yes, IMO. Done, sort of, see:
>>
>> + URI Fragment Identifiers for the text/plain [1]
>> + URI Fragment Identifiers for the text/csv [2]
>>
>> Cheers,
>>     Michael
>>
>> [1] http://tools.ietf.org/html/rfc5147
>> [2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment
>>
>> -- 
>> Dr. Michael Hausenblas, Research Fellow
>> LiDRC - Linked Data Research Centre
>> DERI - Digital Enterprise Research Institute
>> NUIG - National University of Ireland, Galway
>> Ireland, Europe
>> Tel. +353 91 495730
>> http://linkeddata.deri.ie/
>> http://sw-app.org/about.html
>>
>> On 4 Aug 2011, at 14:22, Alexander Dutton wrote:
>>
>>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi all,
>>>
>>> Say I have an XML document, <http://example.org/something.xml>,  
>>> and I
>>> want to talk about about some part of it in RDF. As this is XML,  
>>> being
>>> able to point into it using XPath sounds ideal, leading to  
>>> something like:
>>>
>>> <#fragment> a fragment:Fragment ;
>>>  fragment:within <http://example.org/something.xml> ;
>>>  fragment:locator "/some/path[1]"^^fragment:xpath .
>>>
>>> (For now we can ignore whether we wanted a nodeset or a single node,
>>> and how to handle XML namespaces.)
>>>
>>> More generally, we might want other ways of locating fragments
>>> (probably with a datatype for each):
>>>
>>> * character offsets / ranges
>>> * byte offsets / ranges
>>> * line numbers / ranges
>>> * some sub-rectangle of an image
>>> * XML node IDs
>>> * page ranges of a paginated document
>>>
>>> Some of these will be IMT-specific and may need some more thinking
>>> about, but the idea is there.
>>>
>>>
>>> Has something already done this? Is it even (mostly?) sane?
>>>
>>>
>>> Yours,
>>>
>>> Alex
>>>
>>>
>>> NB. Our actual use-case is having pointers into an NLM XML file
>>> (embodying a journal article) so we can hook up our in-text  
>>> reference
>>> pointer¹ URIs to the original XML elements (<xref/>s) they were
>>> generated from. This will allow us to work out the context of each
>>> citation for use in further analysis of the relationship between the
>>> citing and cited articles.
>>>
>>> ¹ See
>>> <http://opencitations.wordpress.com/2011/07/01/nomenclature-for-citations-and-references/ 
>>> >
>>> for an explanation of the terminology.
>>>
>>> - --
>>> Alexander Dutton
>>> Developer, data.ox.ac.uk, InfoDev, Oxford University Computing  
>>> Services
>>>           Open Citations Project, Department of Zoology, University
>>> of Oxford
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.11 (GNU/Linux)
>>> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>>>
>>> iEYEARECAAYFAk46nS4ACgkQS0pRIabRbjDVZQCdGblvoMgNqEietlE5EwAkPJY8
>>> pikAn2KApM0HjcXj6TZegA+Dek/DJIQX
>>> =UcCr
>>> -----END PGP SIGNATURE-----
>>>
>>>
>>
>>
>
>
> -- 
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org
Received on Tuesday, 16 August 2011 05:13:30 UTC