- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Tue, 2 Aug 2011 09:25:00 -0500
- To: "Eric Prud'hommeaux" <eric@w3.org>
- Cc: Michael Hausenblas <michael.hausenblas@deri.org>, rdb2RDF WG <public-rdb2rdf-wg@w3.org>
- Message-ID: <CAMVTWDxDzMftGMC4T87_Ksyo9A=YHprvovH_AAtCXzDWPj2Xqw@mail.gmail.com>
On Tue, Aug 2, 2011 at 9:19 AM, Eric Prud'hommeaux <eric@w3.org> wrote: > * Juan Sequeda <juanfederico@gmail.com> [2011-08-02 08:05-0500] > > Eric, all > > > > This is my proposal. Just a few changes, and added subsections. > > > > PROPOSAL: that the English definition of the direct mapping be defined > > as: > > > > [[ > > > > Section 3: The Direct Mapping > > > > > > The Direct Mapping is a formula for creating an RDF graph from the rows > of > > all the tables in a database. > > > > > > A base IRI defines a web space for the labels in this graph; all labels > are > > generated by appending to the base. > > > > > > The functions scalar and reference extract the scalar and reference > > attributes (those participating in a foreign key) respectively: > > > > > > dfn scalars: the attributes in a table which are NOT in any foreign key. > > > > > > dfn references: the attributes in a table's foreign keys. > > > > > > dfn non-unary references: the references for which the table's foreign > key > > is NOT composed of a single attribute. > > > > > > Section 3.1: Generating Row Identifiers > > > > > > SQL table and attribute identifiers compose RDF IRIs in the direct graph. > > These identifiers are separated by the punctuation characters '#', ',', > '/' > > and '='. All SQL identifiers are escaped following URL-encoding > > > > < > > > http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data > > > > > > > except that only the above punctuation and the characters not permitted > in > > RDF IRIs are escaped. > > > > > > In the direct graph, there is an identifier for each row in a database > > table. If the row is in a table with a primary key, this is formed from > the > > table name and the attribute names and values of each attribute in the > > primary key. If there is no primary key for the table, the row identifier > is > > a fresh blank node: > > > > > > dfn row identifier: > > > > > > if the table has a primary key with attributes, the relative IRI for > > > > the row identifier is the concatenation of the table name, '/', and > > > > a ','-separated concatenation of each attribute name, '=', and the > > > > attribute value. > > > > > > if the table has no primary key, the row identifier is a fresh blank > > > > node. > > > > > > A (potentially unary) list of attribute names in a table form a > > > > property IRI: > > > > > > dfn property IRI: the concationation of the table name, '/', and a > > > > ','-separated concatonation of each attribute name, and a '#' at > > > > the end of the property IRI. > > > > > > Section 3.2: Mapping database values to RDF Literals > > > > > > The values in a row are mapped to RDF literals: > > > > > > dfn literal map: a mapping from a SQL value with a datatype to an RDF > > > > literal with and XML Schema datatype where the RDF literal has a > > > > lexical value equivalent to the SQL lexical value and the datatype > > > > mapping is found in this table: > > > > > > SQL XSD datatype > > > > ___ ____________ > > > > INT http://www.w3.org/TR/xmlschema-2/#integer > > > > FLOAT http://www.w3.org/TR/xmlschema-2/#float > > > > DATE http://www.w3.org/TR/xmlschema-2/#date > > > > TIME http://www.w3.org/TR/xmlschema-2/#time > > > > TIMESTAMP http://www.w3.org/TR/xmlschema-2/#dateTime > > > > CHAR plain literal > > > > VARCHAR plain literal > > > > STRING plain literal > > > > > > Section 3.3: Generating RDF Triples > > > > > > The Direct Mapping is defined by a set of mapping functions from table > > > > rows to RDF triples: > > > > > > dfn direct mapping: the set of RDF triples produced by invoking the > <table > > mapping> on each table in a database. > > > > > > dfn table mapping: the set of RDF triples created by invoking the <row > > mapping> on each row in a table. > > > > > > dfn row mapping: using a Row Identifier S for each row, > > > > the type triple: > > > > (S, rdf:type, <table type>) > > > > plus the scalar triples: > > > > for each attribute in the list of <scalars> where the attribute value > is > > non-NULL: > > > > (S, the <property IRI> for the attribute, the <literal map> for the > > attribute value). > > > > plus the reference triples: > > > > for each list of attributes in the <non-unary references> where none > of > > the attribute values are NULL: > > > > (S, the <property IRI> for the attributes, the <row identifier> for > the > > referenced triple) > > > > ]] > > Thank you for the careful review and for correcting typos. > Ingoring whitespace, I see: > > added numbered section headings: > I propose that we first agree on the definition and do markup > separately. > Ok. But I think that adding the subsections is crucial > > my precious typos were corrected. > I can live without them. > > re-ordered dfn references: and dfn scalars: > sure. > > DM is for "all the tables in a database" > I debated this; I didn't want to be alarm folks who would think > they'd have to expose everything if they didn't want to. The > alternative is to parametrize; neither is terribly attractive. I > guess "all tables" is fine. > I understand. I was a bit hesitant about this too, but just wrote it to see if you would catch it :) "all tables" is fine with me. > > s/an SQL/a SQL/ > This depends on whether you call it "S Q L" or "sequal". The SQL > spec uses "an", e.g. "Effects of SQL-statements in an SQL-transaction". > > row mapping defined to be over each row. > The calling function <table mapping> already "invokes the <row > mapping> on each row in a table" so the row mapping should just be > for a single row. > > capitalize "Row Identifier" in dfn row mapping. > I suspect this wasn't an intended change proposal. > > below are the diffs and an incorporated proposal > > == white-space-normalized diffs == > @@ -1,18 +1,23 @@ > +Section 3: The Direct Mapping > + > The Direct Mapping is a formula for creating an RDF graph from the > -rows in a table. A base IRI defines a web space for the labels in > +rows of all the tables in a database. A base IRI defines a web space for > the labels in > this graph; all labels are generated by appending to the base. > > The functions scalar and reference extract the scalar and reference > attributes (those participating in a foreign key) respectively: > > -dfn references: the attributes in a table's foreign keys. > - > dfn scalars: the attributes in a table which are NOT in any foreign > key. > > +dfn references: the attributes in a table's foreign keys. > + > dfn non-unary references: the references for which the table's > foreign key is NOT composed of a single attribute. > > + > +Section 3.1: Generating Row Identifiers > + > SQL table and attribute identifiers compose RDF IRIs in the direct > graph. These identifiers are separated by the punctuation characters > '#', ',', '/' and '='. All SQL identifiers are escaped following URL- > @@ -30,8 +35,8 @@ > dfn row identifier: > > if the table has a primary key with attributes, the relative IRI for > - the row identifier is the concationation of the table name, '/', and > - a ','-separated concatonation of each attribute name, '=', and the > + the row identifier is the concatenation of the table name, '/', and > + a ','-separated concatenation of each attribute name, '=', and the > attribute value. > > if the table has no primary key, the row identifier is a fresh blank > @@ -44,9 +49,11 @@ > ','-separated concatonation of each attribute name, and a '#' at > the end of the property IRI. > > +Section 3.2: Mapping database values to RDF Literals > + > The values in a row are mapped to RDF literals: > > -dfn litaral map: a mapping from an SQL value with a datatype to an RDF > +dfn literal map: a mapping from a SQL value with a datatype to an RDF > literal with and XML Schema datatype where the RDF literal has a > lexical value equivalent to the SQL lexical value and the datatype > mapping is found in this table: > @@ -62,16 +69,18 @@ > VARCHAR plain literal > STRING plain literal > > -The Direct Maping is defined by a set of mapping functions from table > +Section 3.3: Generating RDF Triples > + > +The Direct Mapping is defined by a set of mapping functions from table > rows to RDF triples: > > -dfn direct mapping: the set of triples produced by invoking the > +dfn direct mapping: the set of RDF triples produced by invoking the > <table mapping> on each table in a database. > > dfn table mapping: the set of RDF triples created by invoking the > <row mapping> on each row in a table. > > -dfn row mapping: using a row identifier S for the row, > +dfn row mapping: using a Row Identifier S for each row, > the type triple: > (S, rdf:type, <table type>) > plus the scalar triples: > > > == incorporated proposal == > PROPOSAL: that the English definition of the direct mapping be defined as: > > [[ > The Direct Mapping is a formula for creating an RDF graph from the > rows of each table in a database. A base IRI defines a web space for > the labels in this graph; all labels are generated by appending to the > base. > > The functions scalar and reference extract the scalar and reference > attributes (those participating in a foreign key) respectively: > > dfn scalars: the attributes in a table which are NOT in any foreign > key. > > dfn references: the attributes in a table's foreign keys. > > dfn non-unary references: the references for which the table's > foreign key is NOT composed of a single attribute. > > SQL table and attribute identifiers compose RDF IRIs in the direct > graph. These identifiers are separated by the punctuation characters > '#', ',', '/' and '='. All SQL identifiers are escaped following URL- > encoding > < > http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data > > > except that only the above punctuation and the characters not > permitted in RDF IRIs are escaped. > > In the direct graph, there is an identifier for each row in a database > table. If the row is in a table with a primary key, this is formed > from the table name and the attribute names and values of each attribute > in the primary key. If there is no primary key for the table, the row > identifier is a fresh blank node: > > dfn row identifier: > > if the table has a primary key with attributes, the relative IRI for > the row identifier is the concatenation of the table name, '/', and > a ','-separated concatenation of each attribute name, '=', and the > attribute value. > > if the table has no primary key, the row identifier is a fresh blank > node. > > A (potentially unary) list of attribute names in a table form a > property IRI: > > dfn property IRI: the concationation of the table name, '/', and a > ','-separated concatonation of each attribute name, and a '#' at > the end of the property IRI. > > The values in a row are mapped to RDF literals: > > dfn literal map: a mapping from an SQL value with a datatype to an RDF > literal with and XML Schema datatype where the RDF literal has a > lexical value equivalent to the SQL lexical value and the datatype > mapping is found in this table: > > SQL XSD datatype > ___ ____________ > INT http://www.w3.org/TR/xmlschema-2/#integer > FLOAT http://www.w3.org/TR/xmlschema-2/#float > DATE http://www.w3.org/TR/xmlschema-2/#date > TIME http://www.w3.org/TR/xmlschema-2/#time > TIMESTAMP http://www.w3.org/TR/xmlschema-2/#dateTime > CHAR plain literal > VARCHAR plain literal > STRING plain literal > > The Direct Mapping is defined by a set of mapping functions from table > rows to RDF triples: > > dfn direct mapping: the set of RDF triples produced by invoking the > <table mapping> on each table in a database. > > dfn table mapping: the set of RDF triples created by invoking the > <row mapping> on each row in a table. > > dfn row mapping: using a row identifier S for the row, > the type triple: > (S, rdf:type, <table type>) > plus the scalar triples: > for each attribute in the list of <scalars> where the attribute > value is non-NULL: > (S, > the <property IRI> for the attribute, > the <literal map> for the attribute value). > plus the reference triples: > for each list of attributes in the <non-unary references> where none > of the attribute values are NULL: > (S, > the <property IRI> for the attributes, > the <row identifier> for the referenced triple) > ]] > > > > Juan Sequeda > > +1-575-SEQ-UEDA > > www.juansequeda.com > > > > > > On Tue, Aug 2, 2011 at 6:44 AM, Juan Sequeda <juanfederico@gmail.com> > wrote: > > > > > Eric, > > > > > > This is great. I was planning to write up a proposal myself, but you > saved > > > my time. I do have some comments and suggestions. I'm writing up a new > > > proposal based on what you have. I should have it done before the > meeting > > > > > > Juan Sequeda > > > www.juansequeda.com > > > > > > On Aug 2, 2011, at 2:01 AM, Michael Hausenblas < > > > michael.hausenblas@deri.org> wrote: > > > > > > > > > > > Eric, > > > > > > > >> PROPOSAL: that the English definition of the direct mapping be > defined > > > as: > > > >> [[ > > > >> The Direct Mapping is a formula for creating an RDF graph from the > > > >> rows in a table. A base IRI defines a web space for the labels in > > > > > > > > ... > > > > > > > > Thanks a lot for this proposal, Eric! I'm wondering if we're ready to > > > resolve this today or if the WG feels that we need to discuss a bit > more. In > > > any case I'm flexible to change today's agenda [1] if the WG thinks it > makes > > > sense ... > > > > > > > > Cheers, > > > > Michael > > > > > > > > [1] > > > > http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Jul/0183.html > > > > -- > > > > Dr. Michael Hausenblas, Research Fellow > > > > LiDRC - Linked Data Research Centre > > > > DERI - Digital Enterprise Research Institute > > > > NUIG - National University of Ireland, Galway > > > > Ireland, Europe > > > > Tel. +353 91 495730 > > > > http://linkeddata.deri.ie/ > > > > http://sw-app.org/about.html > > > > > > > > On 2 Aug 2011, at 00:22, Eric Prud'hommeaux wrote: > > > > > > > >> * Richard Cyganiak <richard@cyganiak.de> [2011-07-26 19:41+0100] > > > >>> Hi all, > > > >>> > > > >>> The Direct Mapping document is stuck because we have a stalemate > > > between the editors. With Last Call approaching, we need *some* way of > > > breaking the stalemate. So here's a proposal. This is a possible new > outline > > > for the document, along with assignments of separate sections to > separate > > > editors. > > > >>> > > > >>> > > > >>> 1. Introduction > > > >>> - What is this? > > > >>> - How does it relate to R2RML > > > >>> - Target audience, assumed level of knowledge > > > >>> - RDF terms and SQL/relational terms are used as defined in > > > >>> documents XXX and YYY > > > >>> > > > >>> 2. Example (Informative) > > > >>> - A simple two-table example > > > >>> - Quick explanation of foreign key handling > > > >>> - Quick explanation of tables w/o PKs > > > >>> > > > >>> 3. The Direct Mapping [in Plain English] > > > >>> - The Direct Graph of a database is the union of the Table > Graphs > > > >>> of all tables in the database. > > > >>> - The Table Graph of a table is the union of the Row > Graphs... > > > >>> - The Row Graph of a row is ... > > > >>> - ... > > > >> > > > >> PROPOSAL: that the English definition of the direct mapping be > defined > > > as: > > > >> [[ > > > >> The Direct Mapping is a formula for creating an RDF graph from the > > > >> rows in a table. A base IRI defines a web space for the labels in > > > >> this graph; all labels are generated by appending to the base. > > > >> > > > >> The functions scalar and reference extract the scalar and reference > > > >> attributes (those participating in a foreign key) respectively: > > > >> > > > >> dfn references: the attributes in a table's foreign keys. > > > >> > > > >> dfn scalars: the attributes in a table which are NOT in any foreign > > > >> key. > > > >> > > > >> dfn: non-unary references: the references for which the table's > > > >> foreign key is NOT composed of a single attribute. > > > >> > > > >> SQL table and attribute identifiers compose RDF IRIs in the direct > > > >> graph. These identifiers are separated by the punctuation characters > > > >> '#', ',', '/' and '='. All SQL identifiers are escaped following > URL- > > > >> encoding > > > >> < > > > > http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data > > > > > > > >> except that only the above punctuation and the characters not > > > >> permitted in RDF IRIs are escaped. > > > >> > > > >> In the direct graph, there is an identifier for each row in a > database > > > >> table. If the row is in a table with a primary key, this is formed > > > >> from the table name and the attribute names and values of each > attribute > > > >> in the primary key. If there is no primary key for the table, the > row > > > >> identifier is a fresh blank node: > > > >> > > > >> dfn row identifier: > > > >> > > > >> if the table has a primary key with attributes, the relative IRI > for > > > >> the row identifier is the concationation of the table name, '/', > and > > > >> a ','-separated concatonation of each attribute name, '=', and the > > > >> attribute value. > > > >> > > > >> if the table has no primary key, the row identifier is a fresh > blank > > > >> node. > > > >> > > > >> A (potentially unary) list of attribute names in a table form a > > > >> property IRI: > > > >> > > > >> dfn property IRI: the concationation of the table name, '/', and a > > > >> ','-separated concatonation of each attribute name, and a '#' at > > > >> the end of the property IRI. > > > >> > > > >> The values in a row are mapped to RDF literals: > > > >> > > > >> dfn litaral map: a mapping from an SQL value with a datatype to an > RDF > > > >> literal with and XML Schema datatype where the RDF literal has a > > > >> lexical value equivalent to the SQL lexical value and the datatype > > > >> mapping is found in this table: > > > >> > > > >> SQL XSD datatype > > > >> ___ ____________ > > > >> INT http://www.w3.org/TR/xmlschema-2/#integer > > > >> FLOAT http://www.w3.org/TR/xmlschema-2/#float > > > >> DATE http://www.w3.org/TR/xmlschema-2/#date > > > >> TIME http://www.w3.org/TR/xmlschema-2/#time > > > >> TIMESTAMP http://www.w3.org/TR/xmlschema-2/#dateTime > > > >> CHAR plain literal > > > >> VARCHAR plain literal > > > >> STRING plain literal > > > >> > > > >> The Direct Maping is defined by a set of mapping functions from > table > > > >> rows to RDF triples: > > > >> > > > >> dfn direct mapping: the set of triples produced by invoking the > > > >> <table mapping> on each table in a database. > > > >> > > > >> dfn table mapping: the set of RDF triples created by invoking the > > > >> <row mapping> on each row in a table. > > > >> > > > >> dfn row mapping: using a row identifier S for the row, > > > >> the type triple: > > > >> (S, rdf:type, <table type>) > > > >> plus the scalar triples: > > > >> for each attribute in the list of <scalars> where the attribute > > > >> value is non-NULL: > > > >> (S, > > > >> the <property IRI> for the attribute, > > > >> the <literal map> for the attribute value). > > > >> plus the reference triples: > > > >> for each list of attributes in the <non-unary references> where > none > > > >> of the attribute values are NULL: > > > >> (S, > > > >> the <property IRI> for the attributes, > > > >> the <row identifier> for the referenced triple) > > > >> ]] > > > >> > > > >>> A. Appendix: Formalisms (Informative) > > > >>> - should be crisp, short, precise, with only minimum > explanation > > > >>> and examples > > > >>> A.1 Datalog Rules > > > >>> A.2 Denotational Semantics > > > >>> A.3 Set-Style Direct Mapping > > > >>> > > > >>> B. Acknowledgements (Informative) > > > >>> > > > >>> C. References > > > >>> > > > >>> > > > >>> I see Juan and Marcelo editing A.1. > > > >>> > > > >>> I see Alexandre editing A.2. > > > >>> > > > >>> I see Eric editing 2 (which he already wrote), 3 (which *mostly* > > > exists), and A.3. > > > >>> > > > >>> I don't know about 1, B, and C. > > > >>> > > > >>> My reasoning is that there is no objective way of picking any of > the > > > formalisms over another formalism, so the normative expression should > be the > > > lowest common denominator: plain English. By making the formalisms all > > > informative, we free them from the burden of having to explain the > direct > > > mapping itself in a generally accessible way. The focus can be totally > on > > > presenting the formalisms in all their terseness to an audience that is > > > familiar with datalog/denotational semantics/whatever. > > > >>> > > > >>> I hope this proposal aids discussion. > > > >>> > > > >>> Best, > > > >>> Richard > > > >> > > > >> -- > > > >> -ericP > > > >> > > > > > > > > > > > > > -- > -ericP >
Received on Tuesday, 2 August 2011 14:25:50 UTC