Re: web to semantic web : an automated approach

Dear all,

Before we can make any sensible comments on the extent to which we can structure information before it is put on the web or convert into SW formats, we need to know what is on the web in terms of (raw) information and data, which type of web services access this information and who are the users.

For the Semantic Web to gain momentum and expand its user base and number of SW compliant web pages, we need to do a survey.

For all the hype the Semantic Web has been getting, we must accept the fact that some (quite a bit actually) of (raw) data, information will not lend itself to useful content creation.

The question that should be answered is what, in what form for whom would be useful to convert into SW compliant format.

On the latter issue more effort should be focused.

Some key
 words here are open archives (library exchange), open access (to digital repositories), open licenses and open source applications, open access publications, web portals.

Rainbow Warriors International, Ekolibrium Foundation, WiserEarth are just three examples of three non-profits in a specific field, namely sustainable development who believe the people-centered approach is necessary to expanding the Semantic Web.

I am glad to see Drupal embrace SW, as we last year requested that Drupal consider incorporating SW, in a long email detailing the convergence of (new) technologies being 3G and 4G GSM mobile network platforms on GSM phones, search engine technologies (we recently in a submission to the www.project10tothe100.com of Google in which we suggested the GOLDEN TIP, i.e. expanding the search heuristics in the search algorithms to include links to tagged data, which means Google would be able to filter for SW content containing pages.

Similar ideas we have bounced off to companies like Sybase and the Eclipse consortium (www.eclipse.org).

There are also barriers and obstacles to consider, specialty printed publications who cater to professionals, librarires, academic institutions and research instutes etc, stand to loose potentially as do printed newspaper and magazine publishers.

As a sustainable development organization we are simply trying to rally as much support for the broadest possible platform utilizing the semantic web for a common good.

In our case we are thrying to empower all stakeholders in sustainable development worldwide by the utilization of ICT technologies (with mobile telephony and internet as spearhead technology platforms)..

Why? Because it has the power of consensus of the UN to throw at it, which makes it easier to persuade corporate players to throw resources at it.

Companies like Microsoft, Sun Microsystems, Sybase, Oracle, and the developers of web browsers and internet applications widely used by internet users need to come onboard.

Then we will have the thrust to get things moving.

I am betting that Google and browser developing software companies and open source networks will lead the way, with the academia and libraries following closely on foot.

But before we get into anything, we need raw numbers and answers to the question at the beginning of this email.

Milton Ponson
GSM: +297 747 8280
Rainbow Warriors Core Foundation
PO Box 1154, Oranjestad
Aruba, Dutch Caribbean
www.rainbowwarriors.net (under revision)
Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide
www.projectparadigm.info (under construction)
NGO-Opensource: Creating ICT tools for NGOs worldwide for Project Paradigm
www.ngo-opensource.org (proposed project)
MetaPortal: providing online access to web sites and repositories of data and information for sustainable development
www.metaportal.info (proposed project)
SemanticWebSoftware, part of NGO-Opensource to enable SW technologies in the Metaportal project (proposed site: www.semanticwebsoftware.org)


--- On Mon, 10/20/08, Andreas Langegger <al@jku.at> wrote:
From: Andreas Langegger
 <al@jku.at>
Subject: Re: web to semantic web : an automated approach
To: "Semantic Web" <semantic-web@w3.org>, semantic_web@googlegroups.com
Date: Monday, October 20, 2008, 10:56 AM

I've always been a member of the pragmatics-camp, scepticism helps indeed, but it doesn't help to get forward. 
I think the idea of a global "Semantic Web" was, and still is tempting and many bloggers, columnists, and many smart and visionary people like to talk about web-scale reasoning. Some even said, the SW will replace the traditional Web, or the Web 2.0... This is soo stupid. The most important thing to me is the SW standards, the layer cake. The possibility to share and interlink information where it's appropriate, it's just about an open standard for data. 
The last 10 years everybody was talking about open protocols and Web services. But what's
 actually communicated between endpoints is data. XML/XML-Schema won't be replaced either. But if you want to interlink data on the Web, it's not feasible. This is where SW standards rule.
Because SW research is an open and democratic process, there are so many different viewpoints and interpretations about what it is itself. Many have stopped using ontologies and reasoners at all, they just use RDF and maybe RDF-S, even Ora Lassila - co-author of the original Scientific American article in 2001 [1] - as far as I know. Beside subsumption based on class hierachies, it's probably not working for all-day-information, but it works very very well for many applications mainly coming from live sciences. This is what Web 2.0 people and all those sceptical about reasoning usually don't see! They see blogs, foaf, vcards, etc. Here the possiblity for RDF-only interlinking is great and I'm sure it will be successfull and I'm sure that other
 CMS beside Drupal have and will adopt soon and introduce RDF features!
Nobody will ever demand for 100% of all information on the Web being RDFized... think pragmatic!
Regards,AndyL
[1] http://www.sciam.com/article.cfm?id=the-semantic-web

On Oct 20, 2008, at 12:12 PM, रविंदर ठाकुर (ravinder thakur) wrote:
>>>>This is indeed an essential point in the development of the Semantic Web. I'm
 >>>>mostly in the "it'll happen" camp with regards to people creating semantic
>>>> content. There are two main sources, one is that they say that 70% of the
>>>> data on the web is allready in some
 structured form, thus what's needed is to
>>>> clarify what that structure means. 

I have been in "it will happen" camp but nothing far reaching seems to be happening so i am out. I would say that most of the data (90%) of data out there is unstructured. Also most of the strucutred data is specific to companies and they wont share it. There are people writing blogs, wikipedia, news websites producing content continuisley, people reviewing the products, putting their opinions online, the list of unstructured data is endless and will continue to grow with increasing Internet peneratration in 3rd world conturies. To assume that all users will manually convert this data to sturcutred seems too far fetched. To assume that the information being put by these end users is of little uses than say wikipedia/dbpedia would be a horrible mistake. Even if we have large data, someone needs to club this vast amount of rdf/owl data and create a
 global graph interlinking all of that.(BTW i see some serious ontology issues anyone will likely to hit in this approach)
 
>>>>Also, I think IBM's SUKI http://www.research.ibm.com/UIMA/SUKI/ might be of
 >>>>interest.

I have used UIMA but its not a one man army's job. Its just a framework and there is hell lot of things to be done yet on this. eg. write domain specific components etc.



 >>>>A3 is cumbersome and may produce wrong links and information - a nightmare without implicit support for provenance. In corporate >>>>environments A3 is already very popular, but in the broader Web-scale I'm a bit sceptical this will work well. What do you tink?

I am hoping a lot on the progress we have made in NLP and no doubt NLP will continue to improve its performance in the near future. Currently
 to aliviate the wrong linking/information problem I think reduancy of information will play an important role. If we have 10 sources of same peice of information and 6 NLP parsers give one view and rest 4 give other view, i am pretty sure the one on which 6 are agreeing will be the right one. Also we dont have to be 100% right(that too in the begining) since ( other than your boss :) ) nobody is 100% right:) 
 

>>>>Some CMS like Drupal have already understood this and are rapidly moving towards exposing their content as RDF data

Here's the problem. Drupal are exposing _the data stored in Drupal. Do we expect everyone on web to use Drupal ? No. What happens to information on times.com, blogspot.com, googlegroups.com or kashmirtimes.com ? Semantic web is not about converting someone's data and exposing it with semantic view. Its about the _whole_ data out there on web and then building a web of semantic links on top of that and then doing reasoning on top of that etc.
 

Thanks for initiating the discussion anyways. Keep it coming :)

 

Web of Data Practitioners Days / Oct 22-23 / Viennahttp://www.webofdata.info
----------------------------------------------------------------------Dipl.-Ing.(FH) Andreas LangeggerInstitute for Applied Knowledge ProcessingJohannes Kepler University LinzA-4040 Linz, Altenberger Straße 69http://www.langegger.at


 

Received on Monday, 20 October 2008 14:18:30 UTC