Re: Document fragment vocabulary from Alexander Dutton on 2011-08-04 (public-lod@w3.org from August 2011)

From: Alexander Dutton <alexander.dutton@oucs.ox.ac.uk>
Date: Thu, 04 Aug 2011 15:41:55 +0100
To: Michael Hausenblas <michael.hausenblas@deri.org>
CC: Linked Open Data <public-lod@w3.org>
Message-ID: <4E3AAFB3.7@oucs.ox.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Michael,

I'm not sure that the URI-style fragment identifiers are expressive or
generalisable enough.

There are a lot of cases where IMTs don't have defined fragment
resolution schemes. I may also want to choose between various ways to
point inside a file of a given IMT. For example, I may want to use XPath
instead of an @id to pick out a node in an HTML file (particularly if
the original author didn't provide any @id attributes).

As a further generalisation one could imagine chaining:

<#fragment> a fragment:Fragment ;
    fragment:within [
        fragment:within <http://example.org/some/archive.zip> ;
        fragment:locator "foo/bar.html"^^fragment:path
    ] ;
    fragment:locator "some-div"^^fragment:html-id .

Were this notional vocab to exist, it'd be the datatype that would
determine the process by which the fragment is extracted from the
original document, not the document's media type.


All the best,

Alex

On 04/08/11 14:37, Michael Hausenblas wrote:
> 
> Alex,
> 
>> Has something already done this? Is it even (mostly?) sane?
> 
> Sane yes, IMO. Done, sort of, see:
> 
> + URI Fragment Identifiers for the text/plain [1]
> + URI Fragment Identifiers for the text/csv [2]
> 
> Cheers,
>  Michael
> 
> [1] http://tools.ietf.org/html/rfc5147
> [2] http://tools.ietf.org/html/draft-hausenblas-csv-fragment
> 
> --
> Dr. Michael Hausenblas, Research Fellow
> LiDRC - Linked Data Research Centre
> DERI - Digital Enterprise Research Institute
> NUIG - National University of Ireland, Galway
> Ireland, Europe
> Tel. +353 91 495730
> http://linkeddata.deri.ie/
> http://sw-app.org/about.html
> 
> On 4 Aug 2011, at 14:22, Alexander Dutton wrote:
> 
>>
> Hi all,
> 
> Say I have an XML document, <http://example.org/something.xml>, and I
> want to talk about about some part of it in RDF. As this is XML, being
> able to point into it using XPath sounds ideal, leading to something  
> like:
> 
> <#fragment> a fragment:Fragment ;
>  fragment:within <http://example.org/something.xml> ;
>  fragment:locator "/some/path[1]"^^fragment:xpath .
> 
> (For now we can ignore whether we wanted a nodeset or a single node,
> and how to handle XML namespaces.)
> 
> More generally, we might want other ways of locating fragments
> (probably with a datatype for each):
> 
> * character offsets / ranges
> * byte offsets / ranges
> * line numbers / ranges
> * some sub-rectangle of an image
> * XML node IDs
> * page ranges of a paginated document
> 
> Some of these will be IMT-specific and may need some more thinking
> about, but the idea is there.
> 
> 
> Has something already done this? Is it even (mostly?) sane?
> 
> 
> Yours,
> 
> Alex
> 
> 
> NB. Our actual use-case is having pointers into an NLM XML file
> (embodying a journal article) so we can hook up our in-text reference
> pointer¹ URIs to the original XML elements (<xref/>s) they were
> generated from. This will allow us to work out the context of each
> citation for use in further analysis of the relationship between the
> citing and cited articles.
> 
> ¹ See
> <http://opencitations.wordpress.com/2011/07/01/nomenclature-for-citations-and-references/ 
>>>>
> for an explanation of the terminology.
> 
>>
>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk46r7MACgkQS0pRIabRbjCogQCfXz+d18G0ChICLY8ubU+g6ngV
IIwAnA8kuLavXHYFIKKXvFzAGi3ONe/r
=k/jm
-----END PGP SIGNATURE-----
Received on Thursday, 4 August 2011 14:42:18 UTC