- From: Phillip Lord <phillip.lord@newcastle.ac.uk>
- Date: Thu, 09 Oct 2014 12:56:14 +0100
- To: Simon Spero <sesuncedu@gmail.com>
- Cc: "Gray\, Alasdair" <A.J.G.Gray@hw.ac.uk>, <semantic-web@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Mark Diggory <mdiggory@gmail.com>, W3C LOD Mailing List <public-lod@w3.org>
Simon Spero <sesuncedu@gmail.com> writes: > On Oct 8, 2014 10:15 AM, "Gray, Alasdair" <A.J.G.Gray@hw.ac.uk> wrote: > >> Or is that because they want to import it into their own reference > management system, e.g. Mendeley, which does not support the HTML version? > > 1. It is quite easy to embedded metadata in HTML pages in forms designed > for accurate importing into reference managers (Hellman 2009). Mendeley has > been known to have problems with imports in cases where a proxy server is > involved. Myself and Lindsay Marshall have done a fair amount of work extracing metadata from HTML for purposes of citation. With a fair amount of heuristics, we can get enough metadata for a full citation from about 60% of what you might call serious websites (i.e. those with technical content). The general web is lower (about 1%) but most of the web appears to be chinese pornography. This is available as a tool at http://greycite.knowledgeblog.org/. And fuller description is available at http://arxiv.org/abs/1304.7151. Phil
Received on Thursday, 9 October 2014 11:56:45 UTC