[INFO GATHERING] - next steps - please read (continued on list) from Leo Sauermann on 2007-03-23 (public-sweo-ig@w3.org from March 2007)

From: Leo Sauermann <leo.sauermann@dfki.de>
Date: Fri, 23 Mar 2007 10:04:08 +0100
To: Ivan Herman <ivan@w3.org>
CC: kidehen@openlinksw.com, Pasquale Popolizio <pasquale.popolizio@websemantico.org>, Lee Feigenbaum <feigenbl@us.ibm.com>, Uldis Bojars <uldis.bojars@deri.org>, Danny Ayers <danny.ayers@gmail.com>, Benjamin Nowack <bnowack@appmosphere.com>, "Paul Walsh, Segala" <paulwalsh@segala.com>, W3C SWEO IG <public-sweo-ig@w3.org>
Message-ID: <46039808.4040609@dfki.de>
Hi Info Gathering,

We use the key "[INFO GATHERING]" in our mails to identify mails for 
people interested in the information gathering task force. I started a 
mail thread addressed at individuals, but Ivan pointed out that these 
mails are not referenceable.

It was Ivan Herman who said at the right time 23.03.2007 08:28 the 
following words:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I have a procedural issue, probably tainted by W3C's obsession of having
> everything archived on the web:-) These mails are sent to individuals,
> ie, they are not referenceable later. I think we should
>
> - - either use the usual mailing list, putting, for example, an [INFO
> GATHERING] note at the beginning of the subject line, so that people can
> identify this easily, or
> - - set up a separate mailing list at w3c
>
> Either way is fine. The former is obviously simpler. But we should do
> one of the two
>   
point there,
the simpler approach is better.
> ivan
>
> Kingsley Idehen wrote:
>   
>> Leo Sauermann wrote:
>>     
>>> Hi Kingsley,
>>>
>>> ok, if you could do it, I would say its a great thing if  you start
>>> hosting the syndication service.
>>>
>>> Perhaps Ivan will later host in on W3C or not,
>>> that can change.
>>>
>>> can you guarantee to host this service for the next years?
>>>       
>> Leo,
>>
>> Of course.
>>
>> Remember, there will always be routes to an RDF Dump. In short, this
>> data should be exposed as is common practice re. the Linking Open Data
>> Project.
>>
>> I think the domain itself is what's more important such that should
>> anything happen (none of us have absolute control of the future) the
>> recovery is down to:
>> 1. Domain Server associated with Domain change
>> 2. RDF Archive reload
>>
>> I am also hoping that our of this effort re commence the construction of
>> a replications and synchronization protocol that enables others to
>> participate in the hosting. A kind of DNS for the Data Web that
>> facilitates traversal Linked Data and dereferencing of associated URIs :-)
>>
>>
>> Kingsley
>>     
>>> best
>>> Leo
>>>
>>> It was Kingsley Idehen who said at the right time 21.03.2007 20:20 the
>>> following words:
>>>       
>>>> Leo Sauermann wrote:
>>>>         
>>>>> Hi Kingsley,
>>>>>
>>>>> ok, cool.
>>>>>
>>>>> Yould you setup the information gatherer then?
>>>>>
>>>>> I would guess it should do the following:
>>>>> * a web interface with a simple form "enter your url & email
>>>>> address, confirm that the content is creative commons"
>>>>> * a database storing the web URLs to crawl & e-mail address of
>>>>> submitter
>>>>> * a sparql endpoint for the data
>>>>>
>>>>> all implemented in an "open source" way... is it open source?
>>>>>           
>>>> Virtuoso (the Database Engine for SQL, RDF. and XML) is Open Source.
>>>>
>>>> OpenLink Data Spaces (what I use to subscribe to all the Feeds) is
>>>> also Open Source.
>>>>
>>>> Here is my typical workflow:
>>>>
>>>> 1. Pre RSS1.0/RSS2.0/Atom ubiquity - bookmarked everything I came
>>>> across via my Browser
>>>> 2. Post the above - subscribed to everything using my Feed Aggregator
>>>> 3. With the advent of SPARQL and "on the fly transformation" of
>>>> RSS/Atom into RDF - simply bookmark data sources
>>>>
>>>> Steps 1-3 are interactions via Application Modules in ODS (each Tab
>>>> represents a Module and actual Container within a Data Space).
>>>>
>>>> When building ODS w took the following route (because of our dbms and
>>>> middleware heritage):
>>>> M - Hybrid Data Store for SQL, XML, RDF  (Virtuoso)
>>>> C - Application Logic for Data Space applications (data CRUD); all of
>>>> which is Web Services based (SOAP or REST)
>>>> V - The UI (where we are weakest because of the make up of our
>>>> development resources ) ; Templating is via XSLT and CSS based
>>>> Template Engine
>>>>
>>>> ODS for SWEO can sit anywhere (as I've said from the get go). During
>>>> the embryonic stages, we can host it, but note that relocation is
>>>> trivial because all of the data is in a DBMS file (a single or
>>>> stripped file).
>>>>
>>>> Links:
>>>>
>>>> 1. ODS - http://virtuoso.openlinksw.com/wiki/main/Main/Ods
>>>> 2. Virtuoso (Open Source Edition) -
>>>> http://virtuoso.openlinksw.com/wiki/main/
>>>> 3. Live ODS Instances - http://myopenlink.net:8890/ods (just register
>>>> and play with workflow items 1-3) or http://demo.openlinksw.com/ods 
>>>> (endpoints are /isparql and /sparql re. SPARQL)
>>>>
>>>> Kingsley
>>>>         
>>>>> what about updates - can the system delete crawled rdf and replace
>>>>> it with a new version?
>>>>>
>>>>> best
>>>>> Leo
>>>>>
>>>>> It was Kingsley Idehen who said at the right time 21.03.2007 13:24
>>>>> the following words:
>>>>>           
>>>>>> Leo Sauermann wrote:
>>>>>>             
>>>>>>> Hello information gatherers,
>>>>>>>
>>>>>>> I looked at our progress and try to plan the next steps.
>>>>>>> There are a few things to do and based on your previous interests
>>>>>>> and suggestions, I tried to guess who may feel responsible for
>>>>>>> these next steps.
>>>>>>>
>>>>>>> Immediate Steps (=should start soon, like this week)
>>>>>>>
>>>>>>> * We rewrite the DataVocabulary page into a "How to prepare your
>>>>>>> data in a SWEO friendly format" guide for people who run websites
>>>>>>> publishing
>>>>>>> infromation (Leo, Ivan, Pasquale, Bengee?)
>>>>>>>
>>>>>>> * We implement an information gatherer that can read the data
>>>>>>> formats listed  on the
>>>>>>> DataVocabulary page as it is now (Kingsley Idehen - you offered to
>>>>>>> do this)
>>>>>>>
>>>>>>> Kingsley, what would you need to start writing this gatherer? I
>>>>>>> would say we do not "crawl" but let users submit URLs of sources,
>>>>>>> when submitting a URL they have to guarantee that all content at
>>>>>>> this URL is creative commons and can be republished by us in CC.
>>>>>>> (to be on the safe side...?)
>>>>>>>               
>>>>>> Leo,
>>>>>>
>>>>>> This is a non issue if the data is in RDF, RSS, or Atom.
>>>>>>
>>>>>> Quick Example:
>>>>>>
>>>>>> Use the following as an RDF Data Source URI:
>>>>>> http://myopenlink.net:8890/dataspace/kidehen
>>>>>>
>>>>>> SPARQL Queries:
>>>>>>
>>>>>> //  List concepts in the data space
>>>>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>>>>
>>>>>> SELECT DISTINCT ?Concept
>>>>>> FROM <http://myopenlink.net:8890/dataspace/kidehen>
>>>>>> WHERE {
>>>>>>  [] rdf:type ?Concept .
>>>>>> }
>>>>>>
>>>>>> // Get dump from a specific bookmark data space container
>>>>>>
>>>>>> SELECT ?s ?p ?o
>>>>>> FROM
>>>>>> <http://myopenlink.net/dataspace/kidehen/bookmark/KingsleyBookmarks/sioc.rdf>
>>>>>>
>>>>>> WHERE {
>>>>>>  ?s ?p ?o .
>>>>>> }
>>>>>>
>>>>>> You can use the following tools:
>>>>>>
>>>>>> 1. http://demo.openlinksw.com/isparql or
>>>>>> http://myopenlink.net:8890/isparql (or raw /sparql endpoint in
>>>>>> either case). Note CONSTRUCT, ASK, and DESCRIBE are supported
>>>>>>
>>>>>> 2. RDF Browsers such as DISCO [1], Tabulator [2], or the OpenLink
>>>>>> RDF Browser [3]
>>>>>>
>>>>>> Links:
>>>>>>
>>>>>> 1. http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/disco/
>>>>>> 2. http://www.w3.org/2005/ajar/tab
>>>>>> 3. http://demo.openlinksw.com/DAV/JS/tests/rdfbrowser/index.html
>>>>>>
>>>>>> Kingsley
>>>>>>             
>>>>>>> * Ivan: what do you think, shouldn't the information
>>>>>>> gatherer/syndication service run on some W3C server? Its not that
>>>>>>> much processing power needed and it wont cause too much traffic,
>>>>>>> but it would be good to have an "official W3C"
>>>>>>> creative-commons-free-for-all syndication about the semantic
>>>>>>> web.... Kingsley/Ivan: can we do this?
>>>>>>>
>>>>>>> then:
>>>>>>> * We contact the people already managing data sources (listed on
>>>>>>> the DataSources page) and ask them to provide their data as RDF
>>>>>>> and submit their URL to the information gatherer. (I could do that)
>>>>>>>
>>>>>>> see
>>>>>>> http://esw.w3.org/topic/SweoIG/TaskForces/InfoGathering/DataSources
>>>>>>>
>>>>>>> Mid-Term (should start this month)
>>>>>>> * We think about how to make a portal website to show the
>>>>>>> gathered/syndicated data in a user friendly way... (we have to
>>>>>>> think who does it, what W3Cs role is, what requirements we have to
>>>>>>> such a website, etc)
>>>>>>>
>>>>>>>
>>>>>>> best
>>>>>>> Leo
>>>>>>>
>>>>>>>               
>>>>>>             
>>>>>           
>>>>         
>>>       
>>     
>
> - --
>
> Ivan Herman, W3C Semantic Web Activity Lead
> URL: http://www.w3.org/People/Ivan/
> PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.1 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGA4GgdR3hQzQ/Gj0RAmgUAJ99NlDrwnZpVKj16FtlNE2qnHQt5wCgiR4y
> mQKf0jdJvdwzIesR1fO1XiU=
> =D8p2
> -----END PGP SIGNATURE-----
>   


-- 
____________________________________________________
DI Leo Sauermann       http://www.dfki.de/~sauermann 

Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +49 631 20575-116
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:  leo.sauermann@dfki.de

Geschaeftsfuehrung:
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
____________________________________________________
Received on Friday, 23 March 2007 09:05:08 UTC