Re: word clouds in schema.org from Dan Brickley on 2013-12-01 (public-vocabs@w3.org from December 2013)

From: Dan Brickley <danbri@google.com>
Date: Sun, 1 Dec 2013 11:19:37 +0000
To: Charles McCathie Nevile <chaals@yandex-team.ru>
Cc: W3C Web Schemas Task Force <public-vocabs@w3.org>, Paola Di Maio <paola.dimaio@gmail.com>, Paola Di Maio <paoladimaio10@googlemail.com>
Message-ID: <CAK-qy=5AJx6xyxmOW=KLZNo093tj-N8C98K71=E-TJMRBATH1A@mail.gmail.com>

On 1 December 2013 08:51, Charles McCathie Nevile <chaals@yandex-team.ru> wrote:
> On Fri, 29 Nov 2013 17:35:36 +0100, Paola Di Maio <paola.dimaio@gmail.com>
> wrote:
>
>> just a note for future review: when convenient,
>>
>> kindly point out what mechanism exists if any
>>
>>  to map semantic proximity of different terms in schema.org vocabulary
>> (is there a way to say that a word has some shared meaning
>> with other words, although they are not exact synonyms
>>
>> example is emergency /crisis/ disaster maagemet
>> are different flavours of similar type of occurrences so some
>> properties/associated processes may be shared but not all
>
>
> We have partial range/domain stuff (rangeIncludes…), which can do a bit of
> that but not all of it.
>
> I think this is really difficult actually. It's the motivation for
> versioning, but as the recent thread on changing namespaces for RDF basics
> shows, versioning is actually really hard to do in practice :(

In this case it sounds like analyzing a corpus of actual schema.org
data is closer to Paola's needs. There's only so much of this notion
of similarity that you can ever expect to find in the machine readable
schema.

Although the coverage isn't all you might hope for, it might be worth
taking at look at the Web Data Commons extractions from the Common
Crawl dataset http://webdatacommons.org/ & http://commoncrawl.org/

There should be more than enough in there to do some basic analysis of
which properties and types co-occur in the same pages, or when
describing the same entities.

The use case here though doesn't sound so much about RDF vocabulary
terms, more about natural language similarity measures in general
(e.g. the likes of http://arxiv.org/abs/1309.4168 ), which is a bit
beyond the scope of this list.

cheers,

Dan

Received on Sunday, 1 December 2013 11:20:05 UTC