Re: Document fragment vocabulary from Sebastian Hellmann on 2011-08-16 (public-lod@w3.org from August 2011)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Tue, 16 Aug 2011 13:40:11 +0900
To: Michael Martin <martin@informatik.uni-leipzig.de>, public-lod <public-lod@w3.org>, Alexander Dutton <alexander.dutton@oucs.ox.ac.uk>
Message-ID: <4E49F4AB.3090800@informatik.uni-leipzig.de>
Hi Michael and Alex,
sorry to answer so late, I was in holiday in France.
I looked at the three provided resources [1,2,3] and there are still 
some comments and questions I have.

1. The part after the # is actually not sent to the server. Are there 
any solutions for this? It is not really LinkedData friendly.
Compare 
|http://linkedgeodata.org/triplify/near/*51*.*033333*,*13*.*733333*/*1000*/class/Amenity
(Currently not working, but it gives all points within a 1000m radius)
|
The client would be required to calculate the subset of triples from the 
resource, that are addressed.

2. [1] is quite basic and they are basically using position and lines. I 
made a qualitative comparison of different fragment id approaches for 
text in [4] slide 7.
I was wondering if anybody has researched such properties of URI 
fragments. Currently, I am benchmarking stability of these uris using 
Wikipedia changes.
Has such work been done before?

3. @Alex: In my opinion, your proposed fragment ontology can  only be 
used to provide documentation for different fragments.
I would rather propose to just use one triple:
<http://www.w3.org/DesignIssues/LinkedData.html#offset__14406-14418> a 
<http://nlp2rdf.lod2.eu/schema/string/OffsetBasedString>
The ontology I made for Strings might be generalized for formats other 
than text based [5]
One triple is much shorter. As you can see I also tried to encode the 
type of fragment right into the fragment "offset", although a notation 
like "type=offset"  might be better.

4.  @Michael: is there some standardisation respective URIs for text  
going on?
I heard there would be a Language Technology W3C group. The approach by 
Wilde and Dürst[1] seems to lack stability.
Do you think we could do such standardisation for document fragments and 
text fragments within the Media Fragments Group[3] ?
I really thought the liveUrl project was quite good, but it seems dead[6].


In LOD2[7] and NIF[8] we will need some fragment identifiers to 
Standardize NLP tools for the LOD2 stack.
It would be great to reuse stuff instead of starting from scratch. I had 
to extend [1] for example, because it did not produce stable uris and 
also it did not contain the type of algorithm used to produce the URI.

All the best,
Sebastian


[1] http://tools.ietf.org/html/rfc5147
[2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment
[3] http://www.w3.org/TR/media-frags/
[4] http://www.slideshare.net/kurzum/nif-nlp-interchange-format
[5] http://nlp2rdf.lod2.eu/schema/string/
[6] http://liveurls.mozdev.org/index.html
[7] http://lod2.eu
[8] http://aksw.org/Projects/NIF

Am 04.08.2011 22:37, schrieb Michael Hausenblas:
>
> Alex,
>
>> Has something already done this? Is it even (mostly?) sane?
>
> Sane yes, IMO. Done, sort of, see:
>
> + URI Fragment Identifiers for the text/plain [1]
> + URI Fragment Identifiers for the text/csv [2]
>
> Cheers,
>     Michael
>
> [1] http://tools.ietf.org/html/rfc5147
> [2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment
>
> -- 
> Dr. Michael Hausenblas, Research Fellow
> LiDRC - Linked Data Research Centre
> DERI - Digital Enterprise Research Institute
> NUIG - National University of Ireland, Galway
> Ireland, Europe
> Tel. +353 91 495730
> http://linkeddata.deri.ie/
> http://sw-app.org/about.html
>
> On 4 Aug 2011, at 14:22, Alexander Dutton wrote:
>
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi all,
>>
>> Say I have an XML document, <http://example.org/something.xml>, and I
>> want to talk about about some part of it in RDF. As this is XML, being
>> able to point into it using XPath sounds ideal, leading to something 
>> like:
>>
>> <#fragment> a fragment:Fragment ;
>>  fragment:within <http://example.org/something.xml> ;
>>  fragment:locator "/some/path[1]"^^fragment:xpath .
>>
>> (For now we can ignore whether we wanted a nodeset or a single node,
>> and how to handle XML namespaces.)
>>
>> More generally, we might want other ways of locating fragments
>> (probably with a datatype for each):
>>
>> * character offsets / ranges
>> * byte offsets / ranges
>> * line numbers / ranges
>> * some sub-rectangle of an image
>> * XML node IDs
>> * page ranges of a paginated document
>>
>> Some of these will be IMT-specific and may need some more thinking
>> about, but the idea is there.
>>
>>
>> Has something already done this? Is it even (mostly?) sane?
>>
>>
>> Yours,
>>
>> Alex
>>
>>
>> NB. Our actual use-case is having pointers into an NLM XML file
>> (embodying a journal article) so we can hook up our in-text reference
>> pointer¹ URIs to the original XML elements (<xref/>s) they were
>> generated from. This will allow us to work out the context of each
>> citation for use in further analysis of the relationship between the
>> citing and cited articles.
>>
>> ¹ See
>> <http://opencitations.wordpress.com/2011/07/01/nomenclature-for-citations-and-references/> 
>>
>> for an explanation of the terminology.
>>
>> - -- 
>> Alexander Dutton
>> Developer, data.ox.ac.uk, InfoDev, Oxford University Computing Services
>>           Open Citations Project, Department of Zoology, University
>> of Oxford
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.11 (GNU/Linux)
>> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAk46nS4ACgkQS0pRIabRbjDVZQCdGblvoMgNqEietlE5EwAkPJY8
>> pikAn2KApM0HjcXj6TZegA+Dek/DJIQX
>> =UcCr
>> -----END PGP SIGNATURE-----
>>
>>
>
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Tuesday, 16 August 2011 04:45:54 UTC