Re: Reference management from Phillip Lord on 2014-10-09 (semantic-web@w3.org from October 2014)

From: Phillip Lord <phillip.lord@newcastle.ac.uk>
Date: Thu, 09 Oct 2014 12:56:14 +0100
To: Simon Spero <sesuncedu@gmail.com>
Cc: "Gray\, Alasdair" <A.J.G.Gray@hw.ac.uk>, <semantic-web@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Mark Diggory <mdiggory@gmail.com>, W3C LOD Mailing List <public-lod@w3.org>
Message-ID: <87h9zdcv8h.fsf@newcastle.ac.uk>

Simon Spero <sesuncedu@gmail.com> writes:

> On Oct 8, 2014 10:15 AM, "Gray, Alasdair" <A.J.G.Gray@hw.ac.uk> wrote:
>
>> Or is that because they want to import it into their own reference
> management system, e.g. Mendeley, which does not support the HTML version?
>
> 1. It is quite easy to embedded metadata in HTML pages in forms designed
> for accurate importing into reference managers (Hellman 2009). Mendeley has
> been known to have problems with imports in cases where a proxy server is
> involved.

Myself and Lindsay Marshall have done a fair amount of work extracing
metadata from HTML for purposes of citation. With a fair amount of
heuristics, we can get enough metadata for a full citation from about
60% of what you might call serious websites (i.e. those with technical
content). The general web is lower (about 1%) but most of the web
appears to be chinese pornography.

This is available as a tool at http://greycite.knowledgeblog.org/.

And fuller description is available at http://arxiv.org/abs/1304.7151.

Phil

Received on Thursday, 9 October 2014 11:56:47 UTC