- From: Susie M Stephens <STEPHENS_SUSIE_M@LILLY.COM>
- Date: Tue, 10 Jul 2007 10:33:11 -0400
- To: "Bassetti, Ann" <ann.bassetti@boeing.com>
- Cc: public-sweo-ig@w3.org, public-sweo-ig-request@w3.org
Thanks Ann. :-) "Bassetti, Ann" <ann.bassetti@boe ing.com> To Sent by: <public-sweo-ig@w3.org> public-sweo-ig-re cc quest@w3.org Subject SWEO-IG flyer comments --follow-up 07/09/2007 08:13 PM Hello again SWEO-IG -- A week or so ago, I submitted some comments on your educational flyer that is in work. I'd like to also recommend these explanations, from an Oracle white paper, "Semantic Data Integration in the Life Sciences", written by your very own chair Susie Stephens. (September 2005; I got it originally from http://twp_ls_semantic_data_integration_10gr2_0905.pdf but it does not seem to still be there). I have referred to this paper multiple times, long before I had ever even heard of Susie. I would think that Oracle might be willing to allow the W3C to use some of this material directly. I have extracted below, the paragraphs that I think are particularly good for beginners, and put ==[brackets]== around the specific sentences that I understand to be the key concepts. --------------------------- INTRODUCTION The Semantic Web has been developed as an ==[extension of the current Web==]. It has been ==[designed to give information well-defined meaning, thereby better enabling computers and people to work in cooperation==]. This is important as the mix of content on the web is shifting from exclusively human-oriented content to more and more data content. The Semantic Web ==[also brings the idea of having data defined and linked in a way that it can be used for more effective discovery, automation, integration, and re-use across various applications==]. ... Data Aggregation in the Life Sciences Many people in the life sciences are very excited by the promise of the semantic web. They ==[want to integrate data from many different data sources, so that they can make well-informed decisions, yet data integration has been challenging. The difficulties stem from data being made available in different formats==] for example different tab-delimited files formats, different XML schemas, and in different relational models. The task is also made harder because the data models frequently change as science progresses, and individuals learn that additional data is also relevant. In addition there is acronym collision across the data sources, and data can be in different data types for example graphs, images, text, and chemical structures. ==[Many data integration projects currently fail. One of the most common reasons for the failure is the inability==] to extend the data model ==[to incorporate new data, or the inability to re-use data in ways that it was not originally intended==]. RDF provides a very flexible model for adding new data to a data model and for re-using data in ways that it was not originally intended. People are beginning to really appreciate the flexible triple syntax, as it is becoming recognized that things are always evolving, that ==[people will always want to extend their system, or to look at data in a different way==]. Being cognizant of this constant change will be the first step towards companies saving money. ==[People need to be able to re-use data and re-aggregate applications. They also like the idea of the serendipitous discovery of new information.==] ... <Ann: I like this whole section below, comparing data models. It's geeky, but by far the best comparison I've seen. Very helpful for people that have heard of at least SQL and XML. Very useful backup info for your flyer... or maybe in the technical section.> DATA MODEL COMPARISON SQL/RDBMS, XQuery/XML and SPARQL/RDF offer three different ways to query and manage information. Each of the methods serves different, complementary purposes. By using each of these technologies in different situations, a user can optimize the quality and efficiency of information querying and management. A relational database and SQL are best where concise, efficient transactions are needed. Typically, this occurs within an enterprise application where the user is interacting with the data through a tightly constrained set of forms provided by the application. Given the tightly controlled environment, the application (and the underlying RDBMS) needs a minimal amount of input (e.g. a string, a number, a date) to execute properly. This is because all the metadata about the transaction is embedded or implicit in the application or database schema itself. The benefits of SQL/RDBMS are the low overhead required to execute a transaction and, therefore, the performance and scalability with a known level of quality of service that can be achieved. However, when executing a transaction across organizational boundaries, the environment is much less tightly controlled. A supplier or customer may use a different application and a different database schema for the same type of transaction. In that case, SQL is at least very difficult to use. For this environment, XQuery/XML combined with Web services is more appropriate, which is why Oracle's products were enhanced to support this technology. XML documents can be used to execute transactions just as with SQL except that XML wraps the metadata about the transaction around the data itself. When an XML document is sent from one organization to another, an agreed upon schema can be used to decode the metadata about the transaction. This is feasible when you have a well-structured federation of organizations as, for example, in a supply chain. XQuery/XML is not as efficient as SQL/RDBMS but offers much richer transactions and more flexibility for information sharing across applications. But even XQuery/XML requires some agreement among parties as to the format of documents. Users must know ahead of time how, approximately, the information will be used. In many cases, it is impossible to know who will be looking for information, how they may choose to use it, and how it may be re-used at a later point. SPARQL/RDF is designed for information sharing with ultimate flexibility. By encoding the relationships between data, RDF enables semantics as well as syntax to be embedded in documents. Users can apply arbitrary ontologies to the data and semantics to discover information that may not have even been anticipated by the original data provider. Users with little or no technical knowledge of where the data is located or how it is structured can also formulate queries. This can be particularly powerful for applications on enterprise grids. Each of the different information management models has distinct strengths. --------------------------------- Again, I hope this is helpful input. -- Ann Ann Bassetti Associate Technical Fellow Boeing Information Technology Computing and Network Operations telephone (desk): +1.425.865.6603 mobile: +1.206.218.8039 email: ann.bassetti@boeing.com
Received on Tuesday, 10 July 2007 14:33:23 UTC