Re: Comments on the SQL-based proposal

Dear All,

I would like to ask a question related to the point 2.5:Lookup table.

Does this lookup table work with range? I am aware that this lookup table is
supported in D2R, but it features only one-to-one mapping.
So, I was not able to produce a delivery lookup table with the following
entries:
- entry less than 3             value="express delivery"
- entry between 3 and 7     value="normal delivery"
- entry more than 7            value="slow delivery"

Maybe if this lookup-table is to be included in future proposal, this could
be a nice feature.

Thanks and Regards
Freddy

On Tue, Aug 3, 2010 at 11:48 PM, Richard Cyganiak <richard@cyganiak.de>wrote:

> All,
>
> Here is a collection of comments on the mapping language proposal that
> Souri presented in today's call. It's great that we now have a concrete
> proposal on the table that we can iteratively refine. I refer to the XML
> schema and example instances here:
>
>
> http://www.w3.org/2001/sw/rdb2rdf/wiki/An_XML_Schema_for_SQL-based_RDB2RDF_Mapping
>
> http://www.w3.org/2001/sw/rdb2rdf/wiki/Revised_Example_of_SQL-Query_based_Approach
>
> I have also previously studied Souri's presentation from the F2F, which
> presented a previous version of the same idea in an abstract syntax:
>
>
> http://www.w3.org/2001/sw/rdb2rdf/wiki/File:RDB2RDF_Map_viaSQL_F2F_at_SemTech2010.pdf
>
> I grouped my comments into three categories:
>
> 1. Non-functional improvements, that is, issues where I believe that the
> same effect could be achieved in a better way;
>
> 2. Proposals for adding expressivity, to make sure that the language covers
> all of RDF and is suitable for some real-world mapping tasks that currently
> are impossible or very hard to achieve;
>
> 3. Proposals related to authoring convenience, that is, cases where
> maintainability and usability can be improved by adding some syntactic sugar
> or other optional features.
>
> As a further comment, I don't believe that XML is the right choice for the
> surface syntax of this language, and I think that the mapping itself should
> be expressed as an RDF graph. But that is a separate discussion, and I will
> for now assume that an RDF representation of the mapping itself could be
> derived easily at a later time.
>
> Cheers,
> Richard
>
>
>
> 1. Non-functional improvements
> ==============================
>
> 1.1. Don't enclose generated URIs in pointy brackets
> ----------------------------------------------------
> The proposal demands the generation of URIs enclosed in pointy brackets,
> for example:
>
> SELECT '<xyz.com/emp/' || empno || '>' AS empURI
>
> This is unnecessary, the following works just as well:
>
> SELECT 'xyz.com/emp/' || empno AS empURI
>
>
> 1.2. Name of InstURIPropertyMap
> -------------------------------
> The element defines instance URIs, but has nothing to do with properties,
> it should be renamed to reflect that.
>
>
> 1.3. Issues with GraphURIPropertyMap
> ------------------------------------
> It appears that GraphURIPropertyName is unnecessary; just providing a way
> of generating a graph name URI is sufficient for placing triples into an
> appropriate graph. What is the use of this property?
>
> To selectively place different properties of a ClassMap in different
> graphs, the proposal allows referencing of the PropertyMaps from the
> GraphURIPropertyMap by specifying the RDF property name used in the
> PropertyMap. This only works if we assume that RDF properties are unique
> within a ClassMap. That's an unnecessary restriction. A cleaner design would
> be to invert the direction of the reference, pointing from the PropertyMap
> to the GraphURIPropertyMap that specifies where the generated triple should
> reside.
>
>
> 1.4. Issues with KeyMaps
> ------------------------
> It appears that the proposal requires specification of all key constraints
> in the mapping language. Often this will duplicate information that can be
> determined automatically from the database schema. This should be avoided if
> at all possible.
>
> It appears that the proposal works by expressing constraints (PK, unique,
> FK) on RDF properties. It would be better to express them as constraints on
> the columns in the result sets generated by the views/queries. This is
> because the constraint-carrying column may actually not be exposed directly
> in the generated RDF. For example, the PK column of a table is often not
> mapped to a property, but becomes part of the subject URI. Thus, in the emp
> table, the PK constraint should not be expressed on the emp:empno property,
> but on the empno (or empURI) column in the view.
>
>
> 2. Adding expressivity
> ======================
>
> 2.1. Allow blank nodes as subjects
> ----------------------------------
> The proposal only allows URIs in the subject positions of triples. RDF also
> allows blank nodes. This could be addressed by allowing an InstBNodeMap with
> one or more DB columns in place of the InstURIPropertyMap.
>
>
> 2.2. Allow language tags for literals
> -------------------------------------
> The proposal does not allow the specification of language tags on literals.
> This could be addressed by adding an attribute to the PropertyMap. It is
> worth exploring wether computed language tags, whose values come from a view
> column rather than from a constant attribute value, should be supported as
> well.
>
>
> 2.3. Allow custom datatypes for literals
> ----------------------------------------
> The proposal doesn't yet fully describe how datatypes would be handled, but
> the intention seems to be that the datatype is determined by the expression
> that defines the view column. There language should support overriding the
> datatype with an arbitrary datatype URI. It would be the mapping author's
> responsibility to ensure that the view column contains values that are legal
> lexical forms for that datatype. The datatype could be specified in an
> attribute of the PropertyMap. It is worth exploring wether computed
> datatypes should be allowed.
>
>
> 2.4. Computed property URIs
> ---------------------------
> It is worth exploring wether computed property URIs should be allowed. In
> addition to the ProperyName attribute, which provides a constant property
> URI, there could be a PropertyColumnName that refers to a view column which
> contains computed property URIs. A use case is a table with an
> entity-attribute-value schema, which is reasonably common in complex
> database schemas.
>
>
> 2.5. Lookup tables
> ------------------
> The proposal doesn't support the use of lookup tables. Often, a database
> column contains a short alphanumeric code whose meaning is described in some
> documentation outside of the database. If this is to be mapped into “nice”
> RDF, with proper meaningful URIs instead cryptic code, then a lookup table
> has to be used. If that table is not part of the database, and we cannot
> provide it as part of the database, then the lookup table has to be provided
> as part of the mapping. A sketch:
>
> <LookupTable id="expense-types">
>  <Entry code="04" value="http://example.org/expense-types/airfare" />
>  <Entry code="05" value="http://example.org/expense-types/bus-train-ferry"
> />
>  <Entry code="24" value="http://example.org/expense-types/subsistence" />
>  <Entry code="53" value="http://example.org/expense-types/hospitality" />
> </LookupTable>
>
> <ClassMap ...>
>   <SQLdefString>SELECT ... empno ...</SQLdefString>
> </ClassMap>
>
> <PropertyMap DBColumnName="exptype" LookupTable="expense-types"
> PropertyName="ex:expenseType" />
>
> In theory, this can already been done by writing a lot of nested IFF
> statements in the view definition, but it appears to be a reasonably common
> case that warrants special syntax.
>
>
> 3. Usability and authoring considerations
> =========================================
>
> 3.1. Support for URI templates
> ------------------------------
> While it is possible to construct pretty much arbitrary URIs in SQL by
> means of concatenation and string functions, SQL is not always the best
> approach. There should be a way of constructing URIs from view columns
> outside of the SQL query. Reasons include: i) The same URI-generating
> expression is often needed multiple times throughout a mapping, which leads
> to redundancy and imposes maintenance costs. ii) URI generation often
> involves string escaping, case changes etc, which quickly leads to unwieldy
> SQL expressions. iii) Standard SQL functions may not cover all needs for URL
> character escaping. A proposal would be to use a syntax inspired by the URI
> Templates draft [1] to declare URI patterns at the beginning of the file.
> The patterns can then be referenced from throughout the file. This would be
> optional; SQL expressions could still be used for URI construction.
>
> [1] http://www.ietf.org/id/draft-gregorio-uritemplate-04.txt
>
> Example:
>
> <URITemplate id="employee" value="http://example.com/emp/{empno}<http://example.com/emp/%7Bempno%7D>"
> />
>
> <ClassMap ...>
>   <SQLdefString>SELECT ... empno ...</SQLdefString>
>   <InstURIPropertyMap template="employee" />
>   ...
> </ClassMap>
>
>
> 3.2. Multiple maps for a single class
> -------------------------------------
> I believe that the proposal assumes that the value of @ClassName on the
> ClassMap element is unique throughout In many schemas there might be
> multiple tables whose records represent instances of the same domain
> vocabulary class. For example, a mapping that generates contact information
> from a university database might map records in the STAFF table and in the
> STUDENT table to the same class foaf:Person. This *can* be achieved with a
> single ClassMap, by forming a SQL UNION of the two tables after appropriate
> transformations. But this can be awkward because the two tables could have
> quite different columns, e.g., staff have OFFICE_NO and CONSULTATION_HOURS,
> while students have MAJOR and FACEBOOK_ID. For authoring convenience it
> should be allowed to specify two separate ClassMaps, rather than shoehorning
> everything into a single view. (At that point, ClassMap should probably be
> renamed to ViewMap or QueryMap or something similar, because it then
> represents the mapping from a single SQL view/query, and no longer the
> mapping to a single class.)
>
> I note that both 3.1 and 3.2 are helpful to support scenarios where an
> ontologist prepares the mapping, and a DBA fills in the SQL queries.
>
>
> 3.3. Referring to a table instead of a query/view
> -------------------------------------------------
> To make the simple case simple, there should be a <tableName> element that
> could be used in place of <SQLdefString>. The following markup
>
>   <tableName>EMP</tableName>
>
> would be a shorthand for the following:
>
>   <SQLdefString>SELECT * FROM EMP</SQLdefString>
>
>
> 3.4. Allow constant values in place of column names
> ---------------------------------------------------
> There are reasonably frequent cases where some constant value is to be
> added to all resources generated by a ClassMap. For example, in the case of
> the emp table, a triple of the shape
>
>    <http://xyz.com/emp/1234> emp:employeeOf <http://xyz.com/the_company> ..
>
> might be added to each employee resource. Note that only the subject of the
> triple depends on the query result; predicate and object are the same for
> all triples. This could be achieved by having
>
>    SELECT ... '<http://xyz.com/the_company>' AS company
>
> in the query, along with a PropertyMap
>
>    <PropertyMap PropertyName="emp:employeeOf" RDFTerm="IRI"
>                 DBColumnName="company"/>
>
> but this seems awkward and inefficient; a better alternative relying on
> syntactic sugar would be
>
>    <PropertyMap PropertyName="emp:employeeOf" RDFTerm="IRI"
>                 ConstantValue="http://xyz.com/the_company" />
>
> and omit the company column from the query.
>

Received on Wednesday, 4 August 2010 16:12:18 UTC