RE: data format for gathered information from Uldis Bojars on 2007-02-28 (public-sweo-ig@w3.org from February 2007)

From: Uldis Bojars <uldis.bojars@deri.org>
Date: Wed, 28 Feb 2007 19:27:02 +0000
To: 'Ivan Herman' <ivan@w3.org>, 'Leo Sauermann' <leo.sauermann@dfki.de>
Cc: 'Danny Ayers' <danny.ayers@gmail.com>, 'W3C SWEO IG' <public-sweo-ig@w3.org>, 'Kingsley Idehen' <kidehen@openlinksw.com>, 'Benjamin Nowack' <bnowack@appmosphere.com>, 'Ian Davis' <Ian.Davis@talis.com>
Message-id: <01MDOD4UBIKG004CKU@beacon.nuigalway.ie>
Ivan,

SIOC as a framework can act as the 'glue'.
I agree that if deciding to reuse an ontology we should use it for what it
is meant for. Let me clarify some details about SIOC.

1) It already uses FOAF and SKOS

SIOC re-uses FOAF to express information about persons and lets you use SKOS
to describe categories and tags. The largest part of data generated by a
community site is about posts (as there are more posts than there are people
and categories) expressed in SIOC and it already acts as a 'glue' between
FOAF and SKOS.

Figure by John Breslin illustrating these relations:
http://sioc-project.org/node/158

2) Describing everything in RDF

People want to provide information and comments about real-world objects
(Events, Videos, Books, Presentations, Wiki pages, CVs, ...) not just about
forum/blog posts. People also want to be able to say that their posts
contain or are about these real-world objects. This question was recently
discussed by the SIOC community and a decision on how to do this within the
SIOC framework will be made within the next 2 weeks.

SIOC was made to be generic and some of the objects (Blog posts, Mailing
lists, Wiki pages) can be be naturally expressed as a sioc:Post. 

For other objects a sioc:Post itself is not a natural choice and there's no
need to "stretch" it. That's why we are thinking about a generic class for
these objects that will act as an "ubrella" for all kinds of things. It does
not need to contain actual properties to describe these things - there are
already ontologies out there to describe Projects, Books, etc. What we need
is a way how to talk about all these things [within sioc:Posts and in
general] and a "crystallisation point" from which to point to the different
ontologies to use. 

Some types of relations that we want to express:
 - a Post contains an Object (e.g., a review)
 - a Post is about an Object (e.g., an project)
 - a Post is categorised as category/tag/topic X  (currently expressed with
a sioc:topic and a URI which can [optionally] be a skos:Concept)

We have similar questions to solve, would probably come to similar
conclusions and can benefit from learning from the other. In fact, the
Semantic Web community is like any other community who wants to publish
information and discussions about things. 

If you have suggestions how to model this information then please send them
to SIOC-Dev list [1]. Note that when talking about a generic "umbrella"
class it does not really matter what namespace it is in as long as there is
one. If there is an existing vocabulary we can reuse it.

3) Community aspects of SIOC

Besides expressing information about things in general there are some
community site related SIOC usage patterns that can be useful:

Discussions / comments about the information gathered can be expressed as a
sioc:Post + its properties. 
sioc:has_reply property is used to link a post to its replies and comments.
That's where SIOC fits in naturally.

sioc:Community is a recent addition to ontology, introduced to describe a
collection of different things belonging to a community. Basically, anything
(website, mailing list, people) can be a part of it. It may used to describe
information about communities (a part of the gathered information) in case
when a community means more than a group of people. 

This concludes the introduction, hope it helps to clarify some questions.
SIOC is a live project and lessons learned from describing gathered
information can also feed back into its development. Please feel free to
send comments and ask any questions.

[1] http://groups.google.com/group/sioc-dev

Best,
Uldis

[ http://captsolo.net/info/ ]


-----Original Message-----
From: public-sweo-ig-request@w3.org [mailto:public-sweo-ig-request@w3.org]
On Behalf Of Ivan Herman
Sent: Wednesday, February 28, 2007 12:17 PM
To: Leo Sauermann
Cc: Danny Ayers; W3C SWEO IG; Kingsley Idehen; Benjamin Nowack; Ian Davis
Subject: Re: data format for gathered information

Leo,

it is a bit difficult to edit, because the page should reflect concensus...
so I prefer to comment and discuss here.

- Using the doap, skos, etc, is obviously the way to go. Actually, using
skos is a great idea of yours!

- I am not sure about the usage of RSS. I have the feeling that it is a
little bit of a misuse here. I wonder whether the full power of DC is not
enough here; not only the core dc terms like dc:title and such that
everybody knows but, also, the dcterm vocabulary[1] I have the impression
that those, combined with maybe some extra properties of our own may replace
your choice of RSS. (to be checked)

- For books and articles, I think we need something more strucured, like
BibTeX, in order to allow for, say, more scholarly usage. The problem is
that it is not 100% obvious how to represent bibtex in RDF, look at my
recent blog and the comments[2]. We may have to byte the bullet and choose
one or modify one).

[As an aside, it was one of you guys, I think, who drew my attention on
BibSonomy[3] which uses nice features to store bibliographical data as well,
it is a pity that the bibtex they use is broken[2] otherwise we could have
used it)

- I was looking at DOAP; its description on [4] refers to "DOAP is a project
to create an XML/RDF vocabulary to describe open source projects." I was
wondering whether it would also be suitable to describe non-commercial
projects, ie, where the 'open sourceness' is in DOAP.
Sure, there are references to repositories and copyrights, but I presume it
is all right to ignore those when we talk about commercial projects.
To be checked, nevertheless...

- Whether the core 'glue', binding all that together, should be SIOC, as
Kingsley proposes, or something else, I am not sure. I must admit I am not
familiar with all the details of SIOC in this sense. I am a little bit
afraid (just like for RSS) to reuse something just because some of the
properties and classes are around that are close to what we want, but it is
not *really* meant for that. I know there is a fuzzy line there, and may not
apply to SIOC (as I said, I am not sure about that one), but we should be
careful about that.

I am sure other issues will pop up...

Ivan


[1] http://dublincore.org/documents/dcmi-type-vocabulary/
[2] http://ivanherman.wordpress.com/2007/01/13/bibtex-in-rdf/
[3] http://www.bibsonomy.org
[4] http://usefulinc.com/doap/


Leo Sauermann wrote:
> Hi Guys,
> 
> perhaps read the wiki-page in parallel to this email thread.
> DOAP, FOAF, etc are all mentioned there already, 
> http://esw.w3.org/topic/SweoIG/TaskForces/InfoGathering/DataVocabulary
> 
> Benjamin, Ivan, you are free to edit the wiki page, just change/adapt 
> it so that it reflects your approach, please start editing.
> (no edits so far,
> this is a wiki, free speech, last change wins, anything goes, like
> wikipedia)
> 
> 
> Es begab sich aber da Benjamin Nowack zur rechten Zeit 26.02.2007
> 11:24 folgendes schrieb:
> 
>>On 22.02.2007 19:55:52, Leo Sauermann wrote:
>>[...]
>>  
>>
>>>I see two things to face, first:
>>>Describing Information items as such, such as tools, websites, 
>>>presentaitons, tutorials. This should be done using RSS 1.0, and in 
>>>some cases when needed extended using DOAP, foaf, etc. This is pretty 
>>>straightforward, please review and update this site until you agree:
>>>http://esw.w3.org/topic/SweoIG/TaskForces/InfoGathering/DataVocabular
>>>y
>>>    
>>>
>>Not sure about the RSS design decision, it pretty much restricts the 
>>resource types to documents, so we can't really use it as an 
>>"umbrella" spec. My 2 highly redundant cents:
>>- I found DOAP to work fine for most things software, DCMI provides a
>>  number of handy resource type URIs[1] which could be used to augment
>>  doap:Version resources (e.g. dctype:Collection, dctype:Dataset,
>>  dctype:InteractiveResource, dctype:Service), or owl:Ontology for
>>  projects that produce vocabularies (e.g. the FOAF project)
>>  
>>
> That was partly already there,
> please edit the wiki page so that it reflects your exact ideas, but I 
> think the current version already is like you say here.
> 
> 
>>- tags (skos:subject, or dc:subject) for more specific stuff (personal
>>  preference: the more fine-grained skos options)
>>  
>>
> ok, one more for SKOS
> 
>>- Danny's review vocab[2] for ratings/reviews
>>  
>>
> please add this to the wiki page!
> 
>>- a combination of the two rdf/iCal specs[3][4] (with and without
>>  timezone-datatyped timestamps) for events
>>  
>>
> they are rather buggy and not clear which one to use, but I would go 
> for the simpler (not-timezone-as-datatype-one).
> 
> 
> 
> Es begab sich aber da Danny Ayers zur rechten Zeit 22.02.2007 20:25 
> folgendes schrieb:
> 
>>
>> Quick thoughts: I see the motivation re. reuse, but rather than 
>> trying to use solely RSS 1.0 for the items, it might be better to use 
>> more precise terms where they exist, as_well_as the RSS terms, e.g.
>>
>> <http://example.org/doc> a rss:item; a foaf:Document .
> 
> I also thought about this, but if you require from all participants to 
> do that, it sucks.
> Why should anyone annotate two types if one is enough? This is the 
> format we expect external data to be in, inference should add the 
> additional triples.
> 
>>
>> For the taxo stuff, SKOS sounds a very good idea generally, though I 
>> wouldn't be surprised if there were existing vocabs that could be 
>> used for things like "tutorial" etc.
>> I'll cc Ian, he hangs around libraries...
>>
>> It might also be worth considering (perhaps redundantly again) the 
>> Tag Ontology at http://www.holygoat.co.uk/projects/tags/
> 
> SKOS covers this and more, so would rather use skos.
> 
>>
>> Cheers,
>> Danny.
>>
>>
>>
> 
> 
> --
> ____________________________________________________
> DI Leo Sauermann       http://www.dfki.de/~sauermann 
> 
> Deutsches Forschungszentrum fuer
> Kuenstliche Intelligenz DFKI GmbH
> Trippstadter Strasse 122
> P.O. Box 2080           Fon:   +49 631 20575-116
> D-67663 Kaiserslautern  Fax:   +49 631 20575-102
> Germany                 Mail:  leo.sauermann@dfki.de
> 
> Geschaeftsfuehrung:
> Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter 
> Olthoff Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
> Amtsgericht Kaiserslautern, HRB 2313
> ____________________________________________________
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
URL: http://www.w3.org/People/Ivan/
PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 28 February 2007 19:27:47 UTC