W3C home > Mailing lists > Public > public-vocabs@w3.org > October 2013

Re: SKOS and Freebase

From: Mo McRoberts <Mo.McRoberts@bbc.co.uk>
Date: Mon, 21 Oct 2013 15:17:19 +0000
To: Dan Brickley <danbri@google.com>
CC: Karen Coyle <kcoyle@kcoyle.net>, W3C Web Schemas Task Force <public-vocabs@w3.org>
Message-ID: <4D2E79A1-FFC3-4CC9-939E-37536884A77D@bbc.co.uk>

On  2013-Oct-21, at 15:30, Dan Brickley <danbri@google.com> wrote:

> ps. http://en.wikipedia.org/wiki/Lonclass has the example,
> "656.881:301.162.721:32.007THATCHER: 654.192.731TV-AM" supposedly
> composed from these parts,
>
> 656.881:301.162.721 “LETTERS OF APOLOGY”
> 656.881 “LETTERS (POSTAL SERVICES)”
> 656.881:06.022.6 “RESIGNATION LETTERS”
> 654.192.731TV-AM “TV AM (TELEVISION AM)”

This is probably veering way off topic, but for info the fuller form of that decomposition would be:—

656.881:301.162.721:32.007THATCHER:654.192.731TV-AM
  656.881:301.162.721 → LETTERS OF APOLOGY
    656.881 → LETTERS (POSTAL SERVICES)
      656 → TRANSPORT SERVICES
    301.162.721 → APOLOGIES (CONCILIATION)
      301.162 → SOCIAL RELATIONS
        301 → SOCIOLOGY
  32.007THATCHER → MARGARET THATCHER (POLITICIAN) **
    32.007 → POLITICIANS
      32 → POLITICS
  654.192.731TV-AM → TV AM (TELEVISION AM)
    654.192.731 → ITV COMPANIES (TV NETWORKS)
      654.192 → TELEVISION
        654 → TELECOMMUNICATION SERVICES

** To complicate things slightly, 32.007THATCHER doesn’t actually exist as a standalone concept in the LONCLASS tree, but it used consistently as part of composite terms to refer to Margaret Thatcher, including "32.007THATCHER.002.692:676.6” which is defined as "CARDBOARD CUTOUTS OF MARGARET THATCHER”.

It’s been a longstanding issue (or non-issue, depending upon whether you’re a human or a machine, traditionally) that LONCLASS’s colon operator doesn’t provide any means to indicate the nature of the relationship; the usual line was that there was really only one way that you could read it which made sense, but even here that isn’t strictly true.

You’ll note in the decomposition above that “Letters of apology” exists as a composite concept in and of itself, which aids slightly in the interpretation — generally, when decomposing these terms, you find the longest match which is shorter than the term you’re starting with. Orders of precedence thus become rather fluid and dependent upon historical use.

Meanwhile, would a human reading “Letter, Apology, Margaret Thatcher, TV-AM” interpret it as a letter from Margaret Thatcher to TV-AM, a letter from TV-AM to Margaret Thatcher, a letter of apology from Margaret Thatcher about TV-AM, or a letter of apology from TV-AM about Margaret Thatcher?

Fortunately, in practice, surrounding context and the long memories of archivists has meant actual problems resulting from the ambiguity are rare, but it makes things much harder for machines to deal intelligently with.

(All of this is in part why the BBC doesn’t use LONCLASS for cataloguing new things any more!)

M.

>
> ... though it doesn't formally afaik indicate who was the apologist
>
> see also http://www.udcds.com/seminar/2011/media/slides/UDCSeminar2011_AndyHeather.pdf

>
>
>> kc
>> [1] http://experimental.worldcat.org/fast/

>>
>>
>> On 10/20/13 6:40 PM, Thad Guidry wrote:
>>>
>>> Tom is correct.
>>>
>>> Let's be clear, the data still has to be linked for LCSH concepts. There
>>> is much work to be done on that front.
>>>
>>> I have been continually applying most high level LCSH concepts to
>>> Freebase manually, but a better interface for human curation and
>>> aligning and linking the LCSH concepts to Freebase is what is needed
>>> (but a lot of that could be done with OpenRefine and other automated
>>> tools).  It would be even more awesome for other folks to bear and share
>>> that burden and help build or refine the existing tools to help with
>>> automation.
>>>
>>>
>>>
>>> On Sun, Oct 20, 2013 at 2:05 PM, Tom Morris <tfmorris@gmail.com
>>> <mailto:tfmorris@gmail.com>> wrote:
>>>
>>>    On Sun, Oct 20, 2013 at 10:29 AM, Antoine Isaac <aisaac@few.vu.nl
>>>    <mailto:aisaac@few.vu.nl>> wrote:
>>>
>>>        I got messed up with my mail splitting: but I really want to
>>>        flag that Thad's
>>>
>>>
>>> http://lists.w3.org/Archives/__Public/public-vocabs/2013Oct/__0142.html

>>>
>>>
>>> <http://lists.w3.org/Archives/Public/public-vocabs/2013Oct/0142.html>
>>>
>>>        is really awesome.And seems a good case in favour of SKOS data,
>>>        for all those who want to do something similar but can't handle
>>>        the poliferation of namespaces.
>>>
>>>
>>>    One caution - that example isn't representative.  Of the 389,668
>>>    Library of Congress Subject Heading (LCSH) concepts in Freebase,
>>>    only 7,842 have been linked to an equivalent Freebase topic.  Also
>>>    the LCSH was  loaded in 2010 and, as far as I'm aware, hasn't been
>>>    updated since.  I suspect the hierarchy is relatively stable, but
>>>    the lack of currency is something else to be aware of.
>>>
>>>    It demonstrates interesting possibilities, but it isn't useful for
>>>    much in its current form.
>>>
>>>    Tom
>>>
>>>
>>>
>>>
>>> --
>>> -Thad
>>> Thad on Freebase.com <http://www.freebase.com/view/en/thad_guidry>
>>> Thad on LinkedIn <http://www.linkedin.com/in/thadguidry/>
>>
>>
>> --
>> Karen Coyle
>> kcoyle@kcoyle.net http://kcoyle.net

>> m: 1-510-435-8234
>> skype: kcoylenet
>>
>


--
Mo McRoberts - Analyst - BBC Archive Development,
Zone 1.08, BBC Scotland, 40 Pacific Quay, Glasgow G51 1DA,
MC3 D6, Media Centre, 201 Wood Lane, London W12 7TQ,
0141 422 6036 (Internal: 01-26036) - PGP key CEBCF03E



-----------------------------
http://www.bbc.co.uk

This e-mail (and any attachments) is confidential and
may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in
error, please delete it from your system.
Do not use, copy or disclose the
information in any way nor act in reliance on it and notify the sender
immediately.
Please note that the BBC monitors e-mails
sent or received.
Further communication will signify your consent to
this.
-----------------------------
Received on Monday, 21 October 2013 15:17:56 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:29:32 UTC