[Information Gathering] data augmentation, ratings (was Re: [Information Gathering] next steps: syndication, good weblocation)

On 31/03/07, Lee Feigenbaum <feigenbl@us.ibm.com> wrote:

[snip]

> What I care about and think is important for our education and outreach
> efforts is for us to do the work to identify what the cream of the crop
> SemWeb information resources are, and then organize them based on which
> ones are most useful for which types of people. To do this, I believe that
> we need to augment the existing information resources with:
>
> a/ some way to identify the best (this could be digg.com-style ratings,
> google-style rankings (don't think we need that level of complexity), or
> even just simple "best of breed" flags)

That could roughly be split into three different approaches according
to how the data's generated:

   manually ("best of breed" flags)
   algorithmically (linkrank etc - there's probably some existing open
service that could help)
   user feedback (digg etc)

I suspect that's more-or-less in order of how hard and/or
time-consuming each would be. It'd be undesirable for work on the
fancier approaches to hold up publication in the simpler form, but I
guess it could be built incrementally.

(If Tom Heath's http://revyu.com was rebranded a little it could serve
to provide user feedback, though it might take a long time to get a
useful quantity of scores in).

Hmm, "best of breed" would have to rely on someone's value judgements,
does that sound ok? Maybe there's also something fairly objective
nearby - maturity (age in years), activity (1/time since last
release)..?

> b/ appropriate predicates and editorial work to associate information
> resources with the appropriate audience that each is aimed at (both on a
> technical capability level and on a industry/domain level)

Sounds very desirable & not unreasonable, as long as the effort needed
can be kept within sane limits. Again, maybe sophistication by
increments would be a good idea. I don't see any way of avoiding the
design and/or selection of suitable predicates.

But perhaps the editorial workload could be reduced by creating a
questionnaire, asking the the tool developers to fill it in themselves
(which could even be rigged up to generate triples fed directly into
the store, say tweak DOAP-a-matic a little).

Cheers,
Danny.

-- 

http://dannyayers.com

Received on Saturday, 31 March 2007 22:18:01 UTC