Re: updated InfoGathering, proposing a portal as a solution - do you agree on a portal? from Ivan Herman on 2007-02-17 (public-sweo-ig@w3.org from February 2007)

From: Ivan Herman <ivan@w3.org>
Date: Sat, 17 Feb 2007 11:05:35 +0100
To: Leo Sauermann <leo.sauermann@dfki.de>
CC: W3C SWEO IG <public-sweo-ig@w3.org>
Message-ID: <45D6D36F.4030707@w3.org>
Leo Sauermann wrote:
> Hi Ivan,
> 
> Some questions I just don't understand, perhaps I miss the context, I
> will answer the ones I understand and give an example of how the portal
> may look.
> 
[snip]
> 
>>"Being the central repository for [...] has been painful to keep up
>>with.  We have had to start blocking and educating about catalogues and
>>caching." [This may not be directly relevant for us, but may raise a
>>flag nevertheless]
>>  
>>
> We will provide a portal integrating data and providing user interfaces
> to edit the most important information resources - so the pain to keep
> up to date should be forwarded to people like Dave Beckett, who keeps
> his list of Tools anyway (he just now either uses the portal to manage
> the list or publishes his data as RDF/XML)
> 

I think the pain the question was referring to was (also) related to the
fact that lots of hits come in to the server, it may slow down reaction
time, etc. We hit this issue at W3C before, so we just have to be
careful about this possible concern


>>"Which brings Swoogle [1] to mind.  One test of SWEO's thinking, Ivan,
>>is to ask them what Swoogle is not doing well enough or should be doing
>>better -- i.e. how this new proposal-under-development would improve on
>>Swoogle." [1] http://swoogle.umbc.edu/ (thinking of the possibility of
>>people just 'publishing' their data in some vocabularies, and let
>>existing crawlers pick up that data)
>>  
>>
> Swoogle provides nothing, just this:
> search.
> 
> the portal will provide things like you would typically find on portals
> such as this:
> http://www.xmlhack.com/
> 

I think the reaction came from your term of using 'crawler' on the page.
Actually, it may be an idea to use the crawler of swoogle...

> or here,  a mockup of what I have in mind
> http://www.dfki.uni-kl.de/~sauermann/2007/02/sweomockup/
> 
>>In general, once we have a somewhat clearer idea of the architecture,
>>and regardless of the details of the tools, we should certainly try to
>>make an assessment on long term on what this beast would lead to:
>>
>>- what type and size of traffic we would expect on such portal
>>  
>>
> ~ 5 people editing per day
> ~ 100 visitors per day, if it explodes, up to 10.000 visits per day.
> (depends if the semantic web is a success)
> 
> about 5.000 people subscribing to the RSS feeds! (everyone from our
> community) This could be heavy, but easy to cache.
> 

These are important figures. We should keep them before we try to
convince any physical location (w3c or otherwise) to host the service.


>>- what type of update frequency is needed
>>  
>>
> from us, not much. We should attract people to edit data themselves.
> One update per week may be enough.
> 
>>- what is the necessary manpower requirement to keep it up-to-date
>>  
>>
> probably 1 man, hopefully parttime is enough
> 

So, here comes the dirty question: who will be that person once SWEO is
over? Do we expect the W3C staff to keep that?

[Leo, sorry if I sound pushy with these remarks, but I think we have to
be *very* clear with all these details before we move on. *Nothing*
personal or against the project!]

>>- what type of extra facility is necessary (eg: you say we would have
>>some sort of a login facility for people making comments and rating;
>>what type of extra facility would we need for that? OpenID, etc?)
>>  
>>
> normal signup: give me your e-mail address, enter a username/password,
> you are in.
> like any web 2.0 app. Nobody uses openid there.
> 

Let us put OpenId aside for the time being (I would love to see OpenId
more widely used, but that is another matter). That means that the
infrastructure you will have for the site will have to have these
features as well. Just pointing that out.

>>etc. Which means that we should have a clear idea about the architecture...
>>  
>>
> Before the architecture, I would define the user experience.
> Features first, then architecture.
> 
> I made a 15 minute mockup, see here:
> http://www.dfki.uni-kl.de/~sauermann/2007/02/sweomockup/
> 
> can we collaborativly edit somethign like this?
> (does ESW allow HTML?)
> 

Unfortunately, no...

Cheers

Ivan

> best
> Leo
> 
>>Ivan-with-his-sort-of-manager's-hat-on:-)
>>
>>
>>P.S. Earlier today, for a very different reason, I quoted Hofstadter's
>>Law in another mail:
>>
>>[[[
>>Hofstadter's Law: It always takes longer than you expect, even when you
>>take into account Hofstadter's Law
>>]]]
>>(Douglas R Hofstadter: Gödel, Escher, Bach: An eternal golden braid,
>>Penguin Books, 1980)
>>
>>We should realize that the same law also applies for manpower and
>>machine requirements:-)
>>
>>
>>
>>
>>Leo Sauermann wrote:
>>  
>>
>>>Hi Ivan, SWEO,
>>>
>>>I forgot to mention:
>>>the whole idea of making a PORTAL website is a bit daring, and I think I
>>>made a move forward here that may come too fast for other SWEO members.
>>>
>>>So before going into the details, I want to make clear:
>>>* this is going to be a web2.0 like portal website, which may take
>>>considerably effort to do, but may be really useful once done. Is SWEO
>>>agreeing that we think further into this direction? *
>>>
>>>If yes, we should search for strong implementation partners, that invest
>>>more than the 1/2 day of work we do at the moment ... we would perhaps
>>>need some manpower from our institutions/w3c members or from other
>>>companies.
>>>
>>>answers below:
>>>
>>>Es begab sich aber da Ivan Herman zur rechten Zeit 16.02.2007 11:01
>>>folgendes schrieb:
>>>
>>>    
>>>
>>>>Hi Leo,
>>>>
>>>>I made some edits for items that are just facts. I prefer to discuss
>>>>others before I make edits:
>>>>
>>>>- We also started to collect references to events (conference,
>>>>workshops). What about general presentations on SW?
>>>> 
>>>>
>>>>      
>>>>
>>>YES, correct.
>>>added this to the ontology at the bottom:
>>> * conference/event - conferences or events where you can learn about
>>>the sematnic web
>>>
>>>    
>>>
>>>>- I think the crawling should also include Turtle from the start.
>>>>Actually, by the time we get there, GRDDL will be pretty much done, I
>>>>think it should be considered in the first round!
>>>> 
>>>>
>>>>      
>>>>
>>>possible, but I think thats an easy detail to add.
>>>
>>>    
>>>
>>>>- Why RSS 0.9 and not 1.0?
>>>> 
>>>>
>>>>      
>>>>
>>>which is the RDF version?
>>>I didn't look so close, I meant the one with RDF in it
>>>
>>>    
>>>
>>>>For the technical aspect:
>>>>
>>>>The idea of using a crawler may lead to all kinds of technical problems,
>>>>though: efficiency, machine usage, etc. I would think that, at least in
>>>>the first round, we should restrict ourselves to the collection and
>>>>display of data that are 'registered' to us using RDF.
>>>>
>>>>      
>>>>
>>>I intended only to crawl registered URLs, like many services do today,
>>>you have a form where you post your file URL, the file gets wgetted daily.
>>>
>>>
>>>    
>>>
>>>>I think, in this
>>>>respect, being prepared to GRDDL may be crucial: people may then
>>>>continue using their HTML pages if they want, they could then annotate
>>>>their pages directly, and we could get access to the RDF data. Caveat:
>>>>the ontology we develop will have to have a microformat version and we
>>>>would have to have a corresponding xslt script at disposal, too. The
>>>>same way, we should be prepared to RDFa in the first round, if people
>>>>prefer to use that (and RDFa becomes mature). We should not take sides
>>>>in using only one of those.
>>>> 
>>>>
>>>>      
>>>>
>>>I think this is far too complicated at the moment, XML is ok.
>>>people understand how RSS works, and cope with it.
>>>all this XML/XSLT for GRDDL is too complicated for now
>>>
>>>    
>>>
>>>>There is an issue whether our portal would regularly 'download' the
>>>>referenced RDF data into our own database (say, once a day), or whether
>>>>we would always go out and on-the-fly access those. Having a gathering
>>>>done once a day would mean that we could offer one big RDF data for the
>>>>whole collection right away, possibly with a SPARQL interface to it, too.
>>>> 
>>>>
>>>>      
>>>>
>>>yes, thats cool and exactly what I had in mind
>>>
>>>    
>>>
>>>>I will inquire by our system guys and other team members whether and how
>>>>we could host the final system on our site. It is not always obvious...
>>>> 
>>>>
>>>>      
>>>>
>>>I guess so, see the comments above.
>>>
>>>If not, I think we may opt for a "partner model" with a W3C member
>>>(like, say Oracle or DFKI) hosting this service as a donation.
>>>But I don't know what this implies polititcally
>>>
>>>best
>>>Leo
>>>
>>>    
>>>
>>>>Ivan
>>>>
>>>>
>>>>
>>>>Leo Sauermann wrote:
>>>> 
>>>>
>>>>      
>>>>
>>>>>Hi SWEO,
>>>>>
>>>>>I analysed the information gathering wiki page and have rewritten it
>>>>>completly, doing much of the long-needed editing.
>>>>>I dumped many todos and read all suggestions made. I summed up
>>>>>everything, and gave it some order.
>>>>>
>>>>>http://esw.w3.org/topic/SweoIG/TaskForces/InfoGathering
>>>>>
>>>>>As a result, I realized that we need a portal website to achieve our
>>>>>goals. The goals where to "do something useful that prolongs SWEO, where
>>>>>important information (popular, good ranked) can be found, and all
>>>>>information can be found".
>>>>>Also, several people suggested to have many people involved - and to
>>>>>reuse existing sources.
>>>>>
>>>>>I took all this and defined a "Semantic Web Information Portal" that
>>>>>gathers the Information Resources.
>>>>>
>>>>>Ivan, Pasquale, everyone in this task-force:
>>>>>!! today/tomorrow would be the perfect moment for you to look at this
>>>>>and edit freely !!
>>>>>
>>>>>SWEO: once the task force members are done, we present the result in the
>>>>>next telco.
>>>>>
>>>>>best
>>>>>Leo
>>>>>
>>>>>   
>>>>>
>>>>>        
>>>>>
>>>> 
>>>>
>>>>      
>>>>
>>>-- 
>>>____________________________________________________
>>>- DFKI bravely goes where no man has gone before -
>>>We will move to our new building by end of February 2007.
>>>
>>>The new address will be as follows:
>>>    Trippstadter Straße 122
>>>    D-67663 Kaiserslautern
>>>
>>>My phone/fax numbers will also change:
>>>Phone:    +49 (0)631 20575 - 116
>>>Secr.:    +49 (0)631 20575 - 101
>>>Fax:      +49 (0)631 20575 - 102
>>>Email remains the same
>>>____________________________________________________
>>>DI Leo Sauermann       http://www.dfki.de/~sauermann 
>>>Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>>>Trippstadter Strasse 122
>>>P.O. Box 2080          Fon:   +49 631 205-3503
>>>D-67663 Kaiserslautern Fax:   +49 631 205-3472
>>>Germany                Mail:  leo.sauermann@dfki.de
>>>____________________________________________________
>>>Geschaeftsfuehrung:
>>>Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>>>Dr. Walter Olthoff
>>>
>>>Vorsitzender des Aufsichtsrats:
>>>Prof. Dr. h.c. Hans A. Aukes
>>>
>>>Amtsgericht Kaiserslautern, HRB 2313
>>>____________________________________________________
>>>
>>>    
>>>
>>
>>  
>>
> 
> 
> -- 
> ____________________________________________________
> - DFKI bravely goes where no man has gone before -
> We will move to our new building by end of February 2007.
> 
> The new address will be as follows:
>     Trippstadter Straße 122
>     D-67663 Kaiserslautern
> 
> My phone/fax numbers will also change:
> Phone:    +49 (0)631 20575 - 116
> Secr.:    +49 (0)631 20575 - 101
> Fax:      +49 (0)631 20575 - 102
> Email remains the same
> ____________________________________________________
> DI Leo Sauermann       http://www.dfki.de/~sauermann 
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Trippstadter Strasse 122
> P.O. Box 2080          Fon:   +49 631 205-3503
> D-67663 Kaiserslautern Fax:   +49 631 205-3472
> Germany                Mail:  leo.sauermann@dfki.de
> ____________________________________________________
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
> 
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
> 
> Amtsgericht Kaiserslautern, HRB 2313
> ____________________________________________________
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
URL: http://www.w3.org/People/Ivan/
PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Saturday, 17 February 2007 10:10:50 UTC