FW: Semantic Suggestions please ... from Neil McNaughton on 2008-10-02 (semantic-web@w3.org from October 2008)

From: Neil McNaughton <neil@oilit.com>
Date: Thu, 2 Oct 2008 11:07:23 +0200
To: <semantic-web@w3.org>, <public-rdf-in-xhtml-tf-request@w3.org>
Message-ID: <0A5F294DFF74455581B357CDE52B30D5@SONY>
Sorry - should have copied this to the group...

 

Dan,

 

Subject: Re: Semantic Suggestions please ...

 

Can you say a bit more about what structures you do have behind the 

scene? Are there perhaps subsets of an SQL database that could be 

shared? How is the site built / maintained?

 

Site is rebuilt monthly with a lot of clunky VB code generating PHP, HTML -
no database just index files - but you already seem to have discovered
that...

 

Looking eg at http://www.oilit.com/2journal/2index/2peo.htm

I see first of all,

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

... which suggests you don't want machines to do anything with this data.

 

No not really. This is just so that things don't get indexed multiple times
as all the index files (by company, by person and by calendar month) all
point to the article files.

 

Then each author/person (are persons topics too, or just authors?) gets 

a link,

 

<p><b>Select item</b>

<BR><A HREF = "2peo/21.htm">Aamodt, Finn</A>

 

<BR><A HREF = "2peo/22.htm">Aasheim, Hilda</A>

<BR><A HREF = "2peo/23.htm">Abbot, Dave</A>

<BR><A HREF = "2peo/24.htm">Abbott, David</A>

<BR><A HREF = "2peo/25.htm">Abdalla, Ab</A>

<BR><A HREF = "2peo/26.htm">Abel, Roger</A>

<BR><A HREF = "2peo/27.htm">Abernathy, Steve</A>

<BR><A HREF = "2peo/28.htm">Aberson, John</A>

<BR><A HREF = "2peo/29.htm">Abougoush, Mickey</A>

<BR><A HREF = "2peo/210.htm">Abou-Sayed, Ahmed</A>

 

 

If I go to one of these, eg.

http://www.oilit.com/2journal/2index/2peo/210.htm

 

I see a page listing article(s) by that person, so for Abou-Sayed, Ahmed

we get <BR><A HREF = "../../2article/0603_11.htm" > Sixth Middle East IM 

Forum, Kuwait (March 2006)</A> ie.

 

http://www.oilit.com/2journal/2article/0603_11.htm

 

We get basic metadata here,

 

     <meta name="document-date" content="28 Mar 2006 00:00:00 GMT">

     <TITLE>Sixth Middle East IM Forum, Kuwait (March 2006)</TITLE>

 

And what looks like an abstract/intro paragraph,

 

<Font Face="Arial" Size=2><b>

Data management and information management (DM/IM) in the Middle East 

countries  is different. First because it has much more of a production 

focus that in Europe or the USA. Second, because Middle East National 

Oil Companies have taken the long term view. If building a corporate 

data store for fields with hundreds of wells and decades of production 

history means a five year plan, with allocation of people, training and 

finance, then that is what happens. Kuwait Oil Co. (KOC) has over 1,000 

users of its Finder database with projects ongoing for data quality, 

SCADA integration, data mining, decision support and automated data 

capture. Finder database has cornered the data store market for Middle 

East NOCs. This is both a great achievement and a potential 

embarrassment for Schlumberger which is in the process of trying to wean 

its clients off Finder and onto Seabed. An animated debate at the close 

of the conference showed that this will not be easy.

</b></font>

 

 

The match against Ahmed Abou Sayed seems to be based on his being 

quoted.  Is the matching/indexing done by hand or machine?

 

It's based on his being referred to in the article and is done by hand
monthly - and kept in an Access database which generates the indexes.

 

     "For Ahmed Abou-Sayed (Informateks), 'data mining is set to become a 

tough competitor for simulation.'"

 

You're right, there's a lot here to work with. But the current structure 

of the site (markup, frames etc) is a little daunting for the 

uninitiated. Could you give some suggestions on how semweb folk might 

explore it? eg. is it OK to crawl the entire site? Can you make some 

data dumps available, or suggest key URLs to explore from?

 

The site has two versions of the same text - a 'monthly' edition which you
see upfront on login with PHP and CSS which looks reasonably OK. But of more
interest to semantic stuff perhaps is the same information in individual
article files. These have the structure 

<H1>A n article title

<H2>A subtitle

The text of the article. They are all located in
www.oilit.com/2journal/2article/YYMM_NN.htm (Year/month/article number). 

The index structure you have basically figured out. For instance the
'people' index contains all the people we have ever mentioned in an article
and points to a list of such articles - which points to the articles
themselves.

 

I Have thought about - and probably will - moving this over to a mySQL
database, but not had the time to do so. What I would like to understand
from you folks is how this information - say the list of companies and
people - can be presented in a semantic way that would give them more
usefulness (discovery, reuse?) to other sites and robots. 

 

Others on the list have suggested OpenCalais - which looks interesting for
marking up the text. But is would be good to add back in my own lists of
companies and people - maybe I can do this with Calais?

 

Regards - and thanks a lot for having spent time with my frames already ;-)

 

Neil McNaughton

--

http://danbri.org/

----------------------------------------------------------------------------
-----------

Orange vous informe que cet  e-mail a ete controle par l'anti-virus mail. 

Aucun virus connu a ce jour par nos services n'a ete detecte.
Received on Thursday, 2 October 2008 09:10:29 UTC