W3C home > Mailing lists > Public > semantic-web@w3.org > September 2008

Re: Semantic Suggestions please ...

From: Dan Brickley <danbri@danbri.org>
Date: Tue, 30 Sep 2008 16:38:03 +0200
Message-ID: <48E239CB.5070103@danbri.org>
To: neil@oilit.com
Cc: Semantic Web <semantic-web@w3.org>

Neil McNaughton wrote:
> Dear Semantic Websters
> 
> Our website Oil IT Journal www.oilit.com has about a million words of
> reporting on oil and gas IT. It is moderately well organized stuff, but
> there are a lot of 'unstructured' items (such as company names, people and
> products) that I imagine could usefully be tagged somehow for discovery and
> reuse. I was wondering if this could be achieved by something semantic? 
> 
> In the run in to the first Semantic Technology for Energy - Oil and Gas
> (http://www.w3.org/2008/07/ogws-cfp) I would like your suggestions on the
> above in order to see if there is a small (but hopefully killer)
> contribution that we could make to advance this technology.

Hi Neil,

Interesting challenge :)

Can you say a bit more about what structures you do have behind the 
scene? Are there perhaps subsets of an SQL database that could be 
shared? How is the site built / maintained?

Looking eg at http://www.oilit.com/2journal/2index/2peo.htm

I see first of all,

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

... which suggests you don't want machines to do anything with this data.

Then each author/person (are persons topics too, or just authors?) gets 
a link,

<p><b>Select item</b>
<BR><A HREF = "2peo/21.htm">Aamodt, Finn</A>

<BR><A HREF = "2peo/22.htm">Aasheim, Hilda</A>
<BR><A HREF = "2peo/23.htm">Abbot, Dave</A>
<BR><A HREF = "2peo/24.htm">Abbott, David</A>
<BR><A HREF = "2peo/25.htm">Abdalla, Ab</A>
<BR><A HREF = "2peo/26.htm">Abel, Roger</A>
<BR><A HREF = "2peo/27.htm">Abernathy, Steve</A>
<BR><A HREF = "2peo/28.htm">Aberson, John</A>
<BR><A HREF = "2peo/29.htm">Abougoush, Mickey</A>
<BR><A HREF = "2peo/210.htm">Abou-Sayed, Ahmed</A>


If I go to one of these, eg.
http://www.oilit.com/2journal/2index/2peo/210.htm

I see a page listing article(s) by that person, so for Abou-Sayed, Ahmed
we get <BR><A HREF = "../../2article/0603_11.htm" > Sixth Middle East IM 
Forum, Kuwait (March 2006)</A> ie.

http://www.oilit.com/2journal/2article/0603_11.htm

We get basic metadata here,

     <meta name="document-date" content="28 Mar 2006 00:00:00 GMT">
     <TITLE>Sixth Middle East IM Forum, Kuwait (March 2006)</TITLE>

And what looks like an abstract/intro paragraph,

<Font Face="Arial" Size=2><b>
Data management and information management (DM/IM) in the Middle East 
countries  is different. First because it has much more of a production 
focus that in Europe or the USA. Second, because Middle East National 
Oil Companies have taken the long term view. If building a corporate 
data store for fields with hundreds of wells and decades of production 
history means a five year plan, with allocation of people, training and 
finance, then that is what happens. Kuwait Oil Co. (KOC) has over 1,000 
users of its Finder database with projects ongoing for data quality, 
SCADA integration, data mining, decision support and automated data 
capture. Finder database has cornered the data store market for Middle 
East NOCs. This is both a great achievement and a potential 
embarrassment for Schlumberger which is in the process of trying to wean 
its clients off Finder and onto Seabed. An animated debate at the close 
of the conference showed that this will not be easy.
</b></font>


The match against Ahmed Abou Sayed seems to be based on his being 
quoted.  Is the matching/indexing done by hand or machine?

	"For Ahmed Abou-Sayed (Informateks), ‘data mining is set to become a 
tough competitor for simulation.’"


You're right, there's a lot here to work with. But the current structure 
of the site (markup, frames etc) is a little daunting for the 
uninitiated. Could you give some suggestions on how semweb folk might 
explore it? eg. is it OK to crawl the entire site? Can you make some 
data dumps available, or suggest key URLs to explore from?

cheers,

Dan

--
http://danbri.org/
Received on Tuesday, 30 September 2008 14:38:42 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:25 GMT