Document fragment vocabulary

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

Say I have an XML document, <http://example.org/something.xml>, and I
want to talk about about some part of it in RDF. As this is XML, being
able to point into it using XPath sounds ideal, leading to something like:

<#fragment> a fragment:Fragment ;
  fragment:within <http://example.org/something.xml> ;
  fragment:locator "/some/path[1]"^^fragment:xpath .

(For now we can ignore whether we wanted a nodeset or a single node,
and how to handle XML namespaces.)

More generally, we might want other ways of locating fragments
(probably with a datatype for each):

* character offsets / ranges
* byte offsets / ranges
* line numbers / ranges
* some sub-rectangle of an image
* XML node IDs
* page ranges of a paginated document

Some of these will be IMT-specific and may need some more thinking
about, but the idea is there.


Has something already done this? Is it even (mostly?) sane?


Yours,

Alex


NB. Our actual use-case is having pointers into an NLM XML file
(embodying a journal article) so we can hook up our in-text reference
pointer¹ URIs to the original XML elements (<xref/>s) they were
generated from. This will allow us to work out the context of each
citation for use in further analysis of the relationship between the
citing and cited articles.

¹ See
<http://opencitations.wordpress.com/2011/07/01/nomenclature-for-citations-and-references/>
for an explanation of the terminology.

- --
Alexander Dutton
Developer, data.ox.ac.uk, InfoDev, Oxford University Computing Services
           Open Citations Project, Department of Zoology, University
of Oxford
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk46nS4ACgkQS0pRIabRbjDVZQCdGblvoMgNqEietlE5EwAkPJY8
pikAn2KApM0HjcXj6TZegA+Dek/DJIQX
=UcCr
-----END PGP SIGNATURE-----

Received on Thursday, 4 August 2011 13:23:17 UTC