- From: Leo Sauermann <leo.sauermann@dfki.de>
- Date: Fri, 23 Mar 2007 10:04:08 +0100
- To: Ivan Herman <ivan@w3.org>
- CC: kidehen@openlinksw.com, Pasquale Popolizio <pasquale.popolizio@websemantico.org>, Lee Feigenbaum <feigenbl@us.ibm.com>, Uldis Bojars <uldis.bojars@deri.org>, Danny Ayers <danny.ayers@gmail.com>, Benjamin Nowack <bnowack@appmosphere.com>, "Paul Walsh, Segala" <paulwalsh@segala.com>, W3C SWEO IG <public-sweo-ig@w3.org>
- Message-ID: <46039808.4040609@dfki.de>
Hi Info Gathering, We use the key "[INFO GATHERING]" in our mails to identify mails for people interested in the information gathering task force. I started a mail thread addressed at individuals, but Ivan pointed out that these mails are not referenceable. It was Ivan Herman who said at the right time 23.03.2007 08:28 the following words: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I have a procedural issue, probably tainted by W3C's obsession of having > everything archived on the web:-) These mails are sent to individuals, > ie, they are not referenceable later. I think we should > > - - either use the usual mailing list, putting, for example, an [INFO > GATHERING] note at the beginning of the subject line, so that people can > identify this easily, or > - - set up a separate mailing list at w3c > > Either way is fine. The former is obviously simpler. But we should do > one of the two > point there, the simpler approach is better. > ivan > > Kingsley Idehen wrote: > >> Leo Sauermann wrote: >> >>> Hi Kingsley, >>> >>> ok, if you could do it, I would say its a great thing if you start >>> hosting the syndication service. >>> >>> Perhaps Ivan will later host in on W3C or not, >>> that can change. >>> >>> can you guarantee to host this service for the next years? >>> >> Leo, >> >> Of course. >> >> Remember, there will always be routes to an RDF Dump. In short, this >> data should be exposed as is common practice re. the Linking Open Data >> Project. >> >> I think the domain itself is what's more important such that should >> anything happen (none of us have absolute control of the future) the >> recovery is down to: >> 1. Domain Server associated with Domain change >> 2. RDF Archive reload >> >> I am also hoping that our of this effort re commence the construction of >> a replications and synchronization protocol that enables others to >> participate in the hosting. A kind of DNS for the Data Web that >> facilitates traversal Linked Data and dereferencing of associated URIs :-) >> >> >> Kingsley >> >>> best >>> Leo >>> >>> It was Kingsley Idehen who said at the right time 21.03.2007 20:20 the >>> following words: >>> >>>> Leo Sauermann wrote: >>>> >>>>> Hi Kingsley, >>>>> >>>>> ok, cool. >>>>> >>>>> Yould you setup the information gatherer then? >>>>> >>>>> I would guess it should do the following: >>>>> * a web interface with a simple form "enter your url & email >>>>> address, confirm that the content is creative commons" >>>>> * a database storing the web URLs to crawl & e-mail address of >>>>> submitter >>>>> * a sparql endpoint for the data >>>>> >>>>> all implemented in an "open source" way... is it open source? >>>>> >>>> Virtuoso (the Database Engine for SQL, RDF. and XML) is Open Source. >>>> >>>> OpenLink Data Spaces (what I use to subscribe to all the Feeds) is >>>> also Open Source. >>>> >>>> Here is my typical workflow: >>>> >>>> 1. Pre RSS1.0/RSS2.0/Atom ubiquity - bookmarked everything I came >>>> across via my Browser >>>> 2. Post the above - subscribed to everything using my Feed Aggregator >>>> 3. With the advent of SPARQL and "on the fly transformation" of >>>> RSS/Atom into RDF - simply bookmark data sources >>>> >>>> Steps 1-3 are interactions via Application Modules in ODS (each Tab >>>> represents a Module and actual Container within a Data Space). >>>> >>>> When building ODS w took the following route (because of our dbms and >>>> middleware heritage): >>>> M - Hybrid Data Store for SQL, XML, RDF (Virtuoso) >>>> C - Application Logic for Data Space applications (data CRUD); all of >>>> which is Web Services based (SOAP or REST) >>>> V - The UI (where we are weakest because of the make up of our >>>> development resources ) ; Templating is via XSLT and CSS based >>>> Template Engine >>>> >>>> ODS for SWEO can sit anywhere (as I've said from the get go). During >>>> the embryonic stages, we can host it, but note that relocation is >>>> trivial because all of the data is in a DBMS file (a single or >>>> stripped file). >>>> >>>> Links: >>>> >>>> 1. ODS - http://virtuoso.openlinksw.com/wiki/main/Main/Ods >>>> 2. Virtuoso (Open Source Edition) - >>>> http://virtuoso.openlinksw.com/wiki/main/ >>>> 3. Live ODS Instances - http://myopenlink.net:8890/ods (just register >>>> and play with workflow items 1-3) or http://demo.openlinksw.com/ods >>>> (endpoints are /isparql and /sparql re. SPARQL) >>>> >>>> Kingsley >>>> >>>>> what about updates - can the system delete crawled rdf and replace >>>>> it with a new version? >>>>> >>>>> best >>>>> Leo >>>>> >>>>> It was Kingsley Idehen who said at the right time 21.03.2007 13:24 >>>>> the following words: >>>>> >>>>>> Leo Sauermann wrote: >>>>>> >>>>>>> Hello information gatherers, >>>>>>> >>>>>>> I looked at our progress and try to plan the next steps. >>>>>>> There are a few things to do and based on your previous interests >>>>>>> and suggestions, I tried to guess who may feel responsible for >>>>>>> these next steps. >>>>>>> >>>>>>> Immediate Steps (=should start soon, like this week) >>>>>>> >>>>>>> * We rewrite the DataVocabulary page into a "How to prepare your >>>>>>> data in a SWEO friendly format" guide for people who run websites >>>>>>> publishing >>>>>>> infromation (Leo, Ivan, Pasquale, Bengee?) >>>>>>> >>>>>>> * We implement an information gatherer that can read the data >>>>>>> formats listed on the >>>>>>> DataVocabulary page as it is now (Kingsley Idehen - you offered to >>>>>>> do this) >>>>>>> >>>>>>> Kingsley, what would you need to start writing this gatherer? I >>>>>>> would say we do not "crawl" but let users submit URLs of sources, >>>>>>> when submitting a URL they have to guarantee that all content at >>>>>>> this URL is creative commons and can be republished by us in CC. >>>>>>> (to be on the safe side...?) >>>>>>> >>>>>> Leo, >>>>>> >>>>>> This is a non issue if the data is in RDF, RSS, or Atom. >>>>>> >>>>>> Quick Example: >>>>>> >>>>>> Use the following as an RDF Data Source URI: >>>>>> http://myopenlink.net:8890/dataspace/kidehen >>>>>> >>>>>> SPARQL Queries: >>>>>> >>>>>> // List concepts in the data space >>>>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>>>>> >>>>>> SELECT DISTINCT ?Concept >>>>>> FROM <http://myopenlink.net:8890/dataspace/kidehen> >>>>>> WHERE { >>>>>> [] rdf:type ?Concept . >>>>>> } >>>>>> >>>>>> // Get dump from a specific bookmark data space container >>>>>> >>>>>> SELECT ?s ?p ?o >>>>>> FROM >>>>>> <http://myopenlink.net/dataspace/kidehen/bookmark/KingsleyBookmarks/sioc.rdf> >>>>>> >>>>>> WHERE { >>>>>> ?s ?p ?o . >>>>>> } >>>>>> >>>>>> You can use the following tools: >>>>>> >>>>>> 1. http://demo.openlinksw.com/isparql or >>>>>> http://myopenlink.net:8890/isparql (or raw /sparql endpoint in >>>>>> either case). Note CONSTRUCT, ASK, and DESCRIBE are supported >>>>>> >>>>>> 2. RDF Browsers such as DISCO [1], Tabulator [2], or the OpenLink >>>>>> RDF Browser [3] >>>>>> >>>>>> Links: >>>>>> >>>>>> 1. http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/disco/ >>>>>> 2. http://www.w3.org/2005/ajar/tab >>>>>> 3. http://demo.openlinksw.com/DAV/JS/tests/rdfbrowser/index.html >>>>>> >>>>>> Kingsley >>>>>> >>>>>>> * Ivan: what do you think, shouldn't the information >>>>>>> gatherer/syndication service run on some W3C server? Its not that >>>>>>> much processing power needed and it wont cause too much traffic, >>>>>>> but it would be good to have an "official W3C" >>>>>>> creative-commons-free-for-all syndication about the semantic >>>>>>> web.... Kingsley/Ivan: can we do this? >>>>>>> >>>>>>> then: >>>>>>> * We contact the people already managing data sources (listed on >>>>>>> the DataSources page) and ask them to provide their data as RDF >>>>>>> and submit their URL to the information gatherer. (I could do that) >>>>>>> >>>>>>> see >>>>>>> http://esw.w3.org/topic/SweoIG/TaskForces/InfoGathering/DataSources >>>>>>> >>>>>>> Mid-Term (should start this month) >>>>>>> * We think about how to make a portal website to show the >>>>>>> gathered/syndicated data in a user friendly way... (we have to >>>>>>> think who does it, what W3Cs role is, what requirements we have to >>>>>>> such a website, etc) >>>>>>> >>>>>>> >>>>>>> best >>>>>>> Leo >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > - -- > > Ivan Herman, W3C Semantic Web Activity Lead > URL: http://www.w3.org/People/Ivan/ > PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html > FOAF: http://www.ivan-herman.net/foaf.rdf > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.1 (Cygwin) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGA4GgdR3hQzQ/Gj0RAmgUAJ99NlDrwnZpVKj16FtlNE2qnHQt5wCgiR4y > mQKf0jdJvdwzIesR1fO1XiU= > =D8p2 > -----END PGP SIGNATURE----- > -- ____________________________________________________ DI Leo Sauermann http://www.dfki.de/~sauermann Deutsches Forschungszentrum fuer Kuenstliche Intelligenz DFKI GmbH Trippstadter Strasse 122 P.O. Box 2080 Fon: +49 631 20575-116 D-67663 Kaiserslautern Fax: +49 631 20575-102 Germany Mail: leo.sauermann@dfki.de Geschaeftsfuehrung: Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313 ____________________________________________________
Received on Friday, 23 March 2007 09:05:08 UTC