Re: SWEO-IG flyer comments --follow-up from Susie M Stephens on 2007-07-10 (public-sweo-ig@w3.org from July 2007)

From: Susie M Stephens <STEPHENS_SUSIE_M@LILLY.COM>
Date: Tue, 10 Jul 2007 10:33:11 -0400
To: "Bassetti, Ann" <ann.bassetti@boeing.com>
Cc: public-sweo-ig@w3.org, public-sweo-ig-request@w3.org
Message-ID: <OFF1F167CA.F5240E2F-ON85257314.004FEA9A-85257314.004FF183@EliLilly.lilly.com>
Thanks Ann. :-)




                                                                           
             "Bassetti, Ann"                                               
             <ann.bassetti@boe                                             
             ing.com>                                                   To 
             Sent by:                  <public-sweo-ig@w3.org>             
             public-sweo-ig-re                                          cc 
             quest@w3.org                                                  
                                                                   Subject 
                                       SWEO-IG flyer comments --follow-up  
             07/09/2007 08:13                                              
             PM                                                            
                                                                           
                                                                           
                                                                           
                                                                           





Hello again SWEO-IG --

A week or so ago, I submitted some comments on your educational flyer
that is in work.

I'd like to also recommend these explanations, from an Oracle white
paper, "Semantic Data Integration in the Life Sciences", written by your
very own chair Susie Stephens.  (September 2005; I got it originally
from http://twp_ls_semantic_data_integration_10gr2_0905.pdf but it does
not seem to still be there).

I have referred to this paper multiple times, long before I had ever
even heard of Susie.

I would think that Oracle might be willing to allow the W3C to use some
of this material directly.

I have extracted below, the paragraphs that I think are particularly
good for beginners, and put ==[brackets]== around the specific sentences
that I understand to be the key concepts.

---------------------------
INTRODUCTION
The Semantic Web has been developed as an ==[extension of the current
Web==]. It has been ==[designed to give information well-defined
meaning, thereby better enabling computers and people to work in
cooperation==]. This is important as the mix of content on the web is
shifting from exclusively human-oriented content to more and more data
content. The Semantic Web ==[also brings the idea of having data defined
and linked in a way that it can be used for more effective discovery,
automation, integration, and re-use across various applications==].

...

Data Aggregation in the Life Sciences

Many people in the life sciences are very excited by the promise of the
semantic web. They ==[want to integrate data from many different data
sources, so that they can make well-informed decisions, yet data
integration has been challenging. The difficulties stem from data being
made available in different formats==] for example different
tab-delimited files formats, different XML schemas, and in different
relational models. The task is also made harder because the data models
frequently change as science progresses, and individuals learn that
additional data is also relevant. In addition there is acronym collision
across the data sources, and data can be in different data types for
example graphs, images, text, and chemical structures.

==[Many data integration projects currently fail. One of the most common
reasons for the failure is the inability==] to extend the data model
==[to incorporate new data, or the inability to re-use data in ways that
it was not originally intended==]. RDF provides a very flexible model
for adding new data to a data model and for re-using data in ways that
it was not originally intended. People are beginning to really
appreciate the flexible triple syntax, as it is becoming recognized that
things are always evolving, that ==[people will always want to extend
their system, or to look at data in a different way==]. Being cognizant
of this constant change will be the first step towards companies saving
money. ==[People need to be able to re-use data and re-aggregate
applications. They also like the idea of the serendipitous discovery of
new information.==]

...

<Ann:  I like this whole section below, comparing data models.  It's
geeky, but by far the best comparison I've seen.   Very helpful for
people that have heard of at least SQL and XML.  Very useful backup info
for your flyer... or maybe in the technical section.>

DATA MODEL COMPARISON

SQL/RDBMS, XQuery/XML and SPARQL/RDF offer three different ways to query
and manage information. Each of the methods serves different,
complementary purposes. By using each of these technologies in different
situations, a user can optimize the quality and efficiency of
information querying and management.

A relational database and SQL are best where concise, efficient
transactions are needed. Typically, this occurs within an enterprise
application where the user is interacting with the data through a
tightly constrained set of forms provided by the application. Given the
tightly controlled environment, the application (and the underlying
RDBMS) needs a minimal amount of input (e.g. a string, a number, a date)
to execute properly. This is because all the metadata about the
transaction is embedded or implicit in the application or database
schema itself. The benefits of SQL/RDBMS are the low overhead required
to execute a transaction and, therefore, the performance and scalability
with a known level of quality of service that can be achieved.

However, when executing a transaction across organizational boundaries,
the environment is much less tightly controlled. A supplier or customer
may use a different application and a different database schema for the
same type of transaction. In that case, SQL is at least very difficult
to use. For this environment, XQuery/XML combined with Web services is
more appropriate, which is why Oracle's products were enhanced to
support this technology. XML documents can be used to execute
transactions just as with SQL except that XML wraps the metadata about
the transaction around the data itself. When an XML document is sent
from one organization to another, an agreed upon schema can be used to
decode the metadata about the transaction. This is feasible when you
have a well-structured federation of organizations as, for example, in a
supply chain. XQuery/XML is not as efficient as SQL/RDBMS but offers
much richer transactions and more flexibility for information sharing
across applications.
But even XQuery/XML requires some agreement among parties as to the
format of documents. Users must know ahead of time how, approximately,
the information will be used. In many cases, it is impossible to know
who will be looking for information, how they may choose to use it, and
how it may be re-used at a later point. SPARQL/RDF is designed for
information sharing with ultimate flexibility. By encoding the
relationships between data, RDF enables semantics as well as syntax to
be embedded in documents. Users can apply arbitrary ontologies to the
data and semantics to discover information that may not have even been
anticipated by the original data provider. Users with little or no
technical knowledge of where the data is located or how it is structured
can also formulate queries. This can be particularly powerful for
applications on enterprise grids.

Each of the different information management models has distinct
strengths.
 ---------------------------------

Again, I hope this is helpful input.  -- Ann

Ann Bassetti
Associate Technical Fellow
Boeing Information Technology
Computing and Network Operations
telephone (desk):  +1.425.865.6603
mobile:  +1.206.218.8039
email:  ann.bassetti@boeing.com
Received on Tuesday, 10 July 2007 14:33:23 UTC