Re: New Calais proxy could grow Linked Data Cloud

LOD Group:
First, a philosophical point and then a few facts.

When your child first learns to read you don't discard that because they
haven't yet graduated from college. You know college is coming, you're
already thinking about college, you may actually be actively working on
college - but the first words are still important.

Calais is learning to read. We firmly believe in releasing building blocks
when they become available rather than waiting (and waiting and waiting) for
the entire solution to be ready.

A few specific facts to make it clearer where SemanticProxy fits in:

1) We will have de-referenceable URIs for every entity extracted by Calais
by the end of this year. The engineering is done and we're in active design
and build mode. We haven't finished the analysis yet - but this will be
millions of endpoints on the day we go live.

2) A *subset* of those entity types will absolutely have links to other
linked data sources when we go live. Right now we know there will be
substantive links for companies, geographies and a few of the easy ones like
music, books, etc. We'll expand on that set over time and have a goal of
setting up a community-based mechanism for enhancing the links over time.

3) At the end of this month (September) as part of Release 3.1 we'll be
releasing company and geography disambiguation as a component of the
metadata generation process. The company disambiguation is based on a
lexicon of over 16M company aliases + additional hinting and we have a
similar approach with geography.

Question? Ideas? Fire away.


On Tue, Sep 23, 2008 at 7:19 AM, Paul Miller <> wrote:

> Members of this list might be interested in my write-up of ThomsonReuters'
> latest beta service... which I think will prove pretty useful in growing the
> Linked Data cloud... especially for news content from the BBC et al...
> Paul
> --
> Paul Miller
> Technology Evangelist, Talis
> w:  skype: napm1971
> mobile/cell: +44 7769 740083
> **

Received on Wednesday, 24 September 2008 10:03:11 UTC