Re: Draft of Use Case document on Wiki (as promised) from Eric Prud'hommeaux on 2010-04-19 (public-rdb2rdf-wg@w3.org from April 2010)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 19 Apr 2010 18:02:41 -0400
To: Daniel Miranker <miranker@cs.utexas.edu>
Cc: Juan Sequeda <juanfederico@gmail.com>, Michael Hausenblas <michael.hausenblas@deri.org>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <20100419220239.GM4508@w3.org>
* Daniel Miranker <miranker@cs.utexas.edu> [2010-04-19 15:21-0500]
> 
> 
> 
> On Apr 19, 2010, at 12:21 PM, Eric Prud'hommeaux wrote:
> 
> >* Juan Sequeda <juanfederico@gmail.com> [2010-04-19 10:48-0500]
> >>Michael
> >>
> >>I just updated [1] again. I moved the section Expressivity,
> >>written by Eric
> >>to [2] . In the spreadsheet, we have the following taxonomy per
> >>Expressivity:
> >>
> >>
> >>   1. Expressivity
> >>      - Node Label Generation: Graph node names are synthesized
> >>from a
> >>      function of database attributes
> >>      - Datatype expressions
> >>         - Simple
> >>         - Relational data (cells) are mapped to rdf datatypes
> >>per SQL XSD
> >>         mapping.
> >>         - Micorparsing: Relational data are parsed and mapped to rdf
> >>         graphs.
> >>
> >>
> >>But now that we dig into it I think it is redundant with what is
> >>already in
> >>[1] . In particular,
> >>
> >>“b Node Label Generation” appears to be the same as the role of
> >>the ontology
> >>being putative,
> >
> >I don't see the connection. If I am exporting a human resources
> >database in say FOAF and vcal, I am likely to generate resources based
> >on some function of attribute names:
> >  http://myco.example/Employee?id=218 foaf:givenName "Bob" .
> >Likewise, if I accept sort of a default ontology from the database
> >structure, I may want to do the same:
> >  http://myco.example/Employee?id=218 Employee:fname "Bob" .
> >
> >The mapping language could be much simpler if it did not handle graph
> >transformations (simple mapping of attributes to predicates), but I
> >know that several use cases are not met by that.
> 
> 
> I just looked up the (or at least a) FOAF Spec.
> 
> http://xmlns.com/foaf/spec/
> "This document is created by combining the RDFS/OWL machine-readable
> FOAF ontology"
> 
> So if you are exporting a data to FOAF then per the taxonomy you are
> mapping relational data wrt an existing domain ontology.
> 
> I won't be surprised if there is confusion/ambiguity about this.
> Recall when I first emerged with the taxonomy I qualified that I was
> more interested in making
> sure we mean the same thing we are talking with each other than any
> proprietary interest I have in the taxonomy.

I believe I provided a counter example to the assertion that
functional label generation was captured by the "putative ontology".
FOAF is a pre-existing ontology, but I still may want to label the
nodes in the generated graph.

> >>“c Data type expression” and its three cases appear to be the same as
> >>classifying the treatment of relational data sources at the
> >>start of the
> >>document
> >
> >Sorry, I'm not following this. Could you give examples of the
> >redundancy?
> 
> I have written
> 
>                      i.     Structured
> 
> Consider only highly structured database content. String and other
> text fields are not considered valuable.
> 
>                        ii.     Structured + Semistructured
> 
> Text fields are considered valued but are treated simply as unparsed
> strings.
> 
>                         iii.     Structured + Microparsed Tagged Text
> 
> Text fields in the database are parsed into an RDF graph per an
> existing ontology.
> 
> 
> 
> 
> 
> You have written
> 
> >Datatype expressions
> >         - Simple
> >         - Relational data (cells) are mapped to rdf datatypes per
> >SQL XSD
> >         mapping.
> >         - Micorparsing: Relational data are parsed and mapped to rdf
> >         graphs.
> 
> 
> 
> We may need to hash out/refine the subcategories and their titles,
> but, at the level of detail we are at this time I'm thinking
> 
> Simple == Structured
> Realational Data... per SQL XSD Mapping ==  Structured + Semistructured
> Microparsing == Structured + Microparsed Tagged Data

Given a row in a protein database with a primary key attribute "ID" and
another unique attribute "uniProt":
| ID | uniProt |    name | seqLength |
| 18 |   68250 | "YYHAB" | "246 AA"  |

I would like to ask the world which of the following subject mappings
they need:

  <http://mydb.example/prots/ID=18> db:name "YYHAB" .
  <http://mydb.example/prots18/more/path> db:name "YYHAB" .
  <http://www.uniprot.org/uniprot/P68250> db:name "YYHAB" .

The former uses a potentially hard-coded formula, the middle uses a
user-supplied function of the primary key and the latter uses a
function of a different attribute to produce a common proteomic node
label.

The expression of the SQL String "YYHAB" is, in the above examples,
expressed directly as an RDF Plain Literal (perhaps the SQL draft
suggests "YYHAB"^^xsd:string, I don't recall). Expressing the
seqLength would be an opportunity for micro-parsing as it encodes
both the length (and integer) and the metric.

I haven't found good examples of need for user-defined datatypes.
Perhaps there are some oddball types in SQL that don't have a defined
XSD representation. I guess any blob that could be micro-parsed could
instead be given a special type.


> Possibly a difference in our thinking is that you may be looking at
> row content as a row in a CSV file, divorced from
> the column names and SQL data types;  thus the entire content of the
> row depends on parsing.
> 
> This is why, in part, my category names are Structured + something.
> 
> 
> I'm also wondering if your designations of expressivity belong in
> the requirement section on the language.
> Note I've broken up the requirements into two parts, 1)  those
> mechanical requrements on the language, e.g. its connections to RIF,
> and requriements on syntactic convention, In other words
> requirements of languages that come from the Semantic Web community
> 2) The requirements that come from the applications/end users.
> 
> 
> 
> 
> >
> >
> >>• We have updated that section to include Eric’s requirement that
> >>microparsing produce an RDF graph.
> >
> >Do we have use cases supporting that? I merely meant to point out that
> >it was an option, but I don't think anyone has asked for it yet.
> 
> 
> No we don't have any use cases.  However, an entire half of my
> application facing life
> is with systematic biologists.  They have so many databases of
> tables of 4 or 5 columns of structured
> data, with another 3 or 4 columns of text field, it is painful.
> Even something like the geographic
> location where a specimen was collected will usually be in a text
> field that could contain anything
> from a lat/long to "50 feet in front of Tom Miller Dam in Austin",
> and everything in between.
> 
> 
> 
> >
> >
> >>• Similarly, we have penciled in the rdf datatype mapping in
> >>Section 2.
> >>
> >>This is just to let everybody know what happened to this part.
> >>
> >>We can discuss this tomorrow.
> >>
> >>In conclusion, [1] is ready (even though it still needs to be
> >>expanded)
> >>
> >>[1] http://www.w3.org/2001/sw/rdb2rdf/wiki/Use_Cases_and_Requirements
> >>[2] http://www.w3.org/2001/sw/rdb2rdf/wiki/Draft_of_Use_Cases
> >><http://www.w3.org/2001/sw/rdb2rdf/wiki/Use_Cases_and_Requirements>
> >>Juan Sequeda
> >>+1-575-SEQ-UEDA
> >>www.juansequeda.com
> >>
> >>
> >>On Mon, Apr 19, 2010 at 10:32 AM, Michael Hausenblas <
> >>michael.hausenblas@deri.org> wrote:
> >>
> >>>
> >>>Great work, Juan!
> >>>
> >>>We (Eric and I) take over for now (consider the Wiki stable
> >>>for the moment)
> >>>in order to compile a version for tomorrow's meeting at [1].
> >>>
> >>>
> >>>Cheers,
> >>>     Michael
> >>>
> >>>[1] http://www.w3.org/2001/sw/rdb2rdf/use-cases/
> >>>
> >>>--
> >>>Dr. Michael Hausenblas
> >>>LiDRC - Linked Data Research Centre
> >>>DERI - Digital Enterprise Research Institute
> >>>NUIG - National University of Ireland, Galway
> >>>Ireland, Europe
> >>>Tel. +353 91 495730
> >>>http://linkeddata.deri.ie/
> >>>http://sw-app.org/about.html
> >>>
> >>>
> >>>
> >>>>From: Juan Sequeda <juanfederico@gmail.com>
> >>>>Date: Mon, 19 Apr 2010 10:29:17 -0500
> >>>>To: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
> >>>>Subject: Draft of Use Case document on Wiki (as promised)
> >>>>Resent-From: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
> >>>>Resent-Date: Mon, 19 Apr 2010 15:29:52 +0000
> >>>>
> >>>>Hi Everybody
> >>>>
> >>>>You can find an updated version of the Use Case document
> >>>>here [1]. This
> >>>is
> >>>>the original page that we have been adding all the use
> >>>>cases. I created a
> >>>>Draft of Use Cases page here [2].
> >>>>
> >>>>Therefore, we should be focusing on [1]. I still need to add
> >>>>some of the
> >>>>UML, DDL, etc.
> >>>>
> >>>>I'm following the example that Michael once gave out from
> >>>>the RDFa Use
> >>>Case
> >>>>document [3] where they originally show HTML (before) and then show
> >>>>HTML+RDFa (after). I think this is something that we should
> >>>>do in this
> >>>>document. However, I guess this may be up for discussion.
> >>>>
> >>>>Looking forward to your comments
> >>>>
> >>>>[1] http://www.w3.org/2001/sw/rdb2rdf/wiki/
> >>>>Use_Cases_and_Requirements
> >>>>[2] http://www.w3.org/2001/sw/rdb2rdf/wiki/Draft_of_Use_Cases
> >>>><http://www.w3.org/2001/sw/rdb2rdf/wiki/
> >>>>Use_Cases_and_Requirements>[3]
> >>>>http://www.w3.org/TR/xhtml-rdfa-scenarios/#use-case-1
> >>>>
> >>>>Juan Sequeda
> >>>>+1-575-SEQ-UEDA
> >>>>www.juansequeda.com
> >>>
> >>>
> >
> >-- 
> >-ericP
> 

-- 
-ericP
Received on Monday, 19 April 2010 22:03:17 UTC