Re: Comments on the SQL-based proposal

On 4 Aug 2010, at 12:45, Freddy Priyatna wrote:
> I would like to ask a question related to the point 2.5:Lookup table.
>
> Does this lookup table work with range?

No.

> I am aware that this lookup table is
> supported in D2R, but it features only one-to-one mapping.
> So, I was not able to produce a delivery lookup table with the  
> following
> entries:
> - entry less than 3             value="express delivery"
> - entry between 3 and 7     value="normal delivery"
> - entry more than 7            value="slow delivery"

This can be done in D2R via a sqlExpression that checks the range on  
the PropertyBridge, something along these lines:

     d2rq:sqlExpression "IF(table.entry < 3, 'express delivery', 'slow  
delivery')";

You can nest another IF to deal with the third case. The same general  
idea works in the SQL-based approach. So you don't need lookup tables  
for this, although they would perhaps make the mapping more readable.

Best,
Richard



> Maybe if this lookup-table is to be included in future proposal,  
> this could
> be a nice feature.
>
> Thanks and Regards
> Freddy
>
> On Tue, Aug 3, 2010 at 11:48 PM, Richard Cyganiak  
> <richard@cyganiak.de>wrote:
>
>> All,
>>
>> Here is a collection of comments on the mapping language proposal  
>> that
>> Souri presented in today's call. It's great that we now have a  
>> concrete
>> proposal on the table that we can iteratively refine. I refer to  
>> the XML
>> schema and example instances here:
>>
>>
>> http://www.w3.org/2001/sw/rdb2rdf/wiki/An_XML_Schema_for_SQL-based_RDB2RDF_Mapping
>>
>> http://www.w3.org/2001/sw/rdb2rdf/wiki/Revised_Example_of_SQL-Query_based_Approach
>>
>> I have also previously studied Souri's presentation from the F2F,  
>> which
>> presented a previous version of the same idea in an abstract syntax:
>>
>>
>> http://www.w3.org/2001/sw/rdb2rdf/wiki/File:RDB2RDF_Map_viaSQL_F2F_at_SemTech2010.pdf
>>
>> I grouped my comments into three categories:
>>
>> 1. Non-functional improvements, that is, issues where I believe  
>> that the
>> same effect could be achieved in a better way;
>>
>> 2. Proposals for adding expressivity, to make sure that the  
>> language covers
>> all of RDF and is suitable for some real-world mapping tasks that  
>> currently
>> are impossible or very hard to achieve;
>>
>> 3. Proposals related to authoring convenience, that is, cases where
>> maintainability and usability can be improved by adding some  
>> syntactic sugar
>> or other optional features.
>>
>> As a further comment, I don't believe that XML is the right choice  
>> for the
>> surface syntax of this language, and I think that the mapping  
>> itself should
>> be expressed as an RDF graph. But that is a separate discussion,  
>> and I will
>> for now assume that an RDF representation of the mapping itself  
>> could be
>> derived easily at a later time.
>>
>> Cheers,
>> Richard
>>
>>
>>
>> 1. Non-functional improvements
>> ==============================
>>
>> 1.1. Don't enclose generated URIs in pointy brackets
>> ----------------------------------------------------
>> The proposal demands the generation of URIs enclosed in pointy  
>> brackets,
>> for example:
>>
>> SELECT '<xyz.com/emp/' || empno || '>' AS empURI
>>
>> This is unnecessary, the following works just as well:
>>
>> SELECT 'xyz.com/emp/' || empno AS empURI
>>
>>
>> 1.2. Name of InstURIPropertyMap
>> -------------------------------
>> The element defines instance URIs, but has nothing to do with  
>> properties,
>> it should be renamed to reflect that.
>>
>>
>> 1.3. Issues with GraphURIPropertyMap
>> ------------------------------------
>> It appears that GraphURIPropertyName is unnecessary; just providing  
>> a way
>> of generating a graph name URI is sufficient for placing triples  
>> into an
>> appropriate graph. What is the use of this property?
>>
>> To selectively place different properties of a ClassMap in different
>> graphs, the proposal allows referencing of the PropertyMaps from the
>> GraphURIPropertyMap by specifying the RDF property name used in the
>> PropertyMap. This only works if we assume that RDF properties are  
>> unique
>> within a ClassMap. That's an unnecessary restriction. A cleaner  
>> design would
>> be to invert the direction of the reference, pointing from the  
>> PropertyMap
>> to the GraphURIPropertyMap that specifies where the generated  
>> triple should
>> reside.
>>
>>
>> 1.4. Issues with KeyMaps
>> ------------------------
>> It appears that the proposal requires specification of all key  
>> constraints
>> in the mapping language. Often this will duplicate information that  
>> can be
>> determined automatically from the database schema. This should be  
>> avoided if
>> at all possible.
>>
>> It appears that the proposal works by expressing constraints (PK,  
>> unique,
>> FK) on RDF properties. It would be better to express them as  
>> constraints on
>> the columns in the result sets generated by the views/queries. This  
>> is
>> because the constraint-carrying column may actually not be exposed  
>> directly
>> in the generated RDF. For example, the PK column of a table is  
>> often not
>> mapped to a property, but becomes part of the subject URI. Thus, in  
>> the emp
>> table, the PK constraint should not be expressed on the emp:empno  
>> property,
>> but on the empno (or empURI) column in the view.
>>
>>
>> 2. Adding expressivity
>> ======================
>>
>> 2.1. Allow blank nodes as subjects
>> ----------------------------------
>> The proposal only allows URIs in the subject positions of triples.  
>> RDF also
>> allows blank nodes. This could be addressed by allowing an  
>> InstBNodeMap with
>> one or more DB columns in place of the InstURIPropertyMap.
>>
>>
>> 2.2. Allow language tags for literals
>> -------------------------------------
>> The proposal does not allow the specification of language tags on  
>> literals.
>> This could be addressed by adding an attribute to the PropertyMap.  
>> It is
>> worth exploring wether computed language tags, whose values come  
>> from a view
>> column rather than from a constant attribute value, should be  
>> supported as
>> well.
>>
>>
>> 2.3. Allow custom datatypes for literals
>> ----------------------------------------
>> The proposal doesn't yet fully describe how datatypes would be  
>> handled, but
>> the intention seems to be that the datatype is determined by the  
>> expression
>> that defines the view column. There language should support  
>> overriding the
>> datatype with an arbitrary datatype URI. It would be the mapping  
>> author's
>> responsibility to ensure that the view column contains values that  
>> are legal
>> lexical forms for that datatype. The datatype could be specified in  
>> an
>> attribute of the PropertyMap. It is worth exploring wether computed
>> datatypes should be allowed.
>>
>>
>> 2.4. Computed property URIs
>> ---------------------------
>> It is worth exploring wether computed property URIs should be  
>> allowed. In
>> addition to the ProperyName attribute, which provides a constant  
>> property
>> URI, there could be a PropertyColumnName that refers to a view  
>> column which
>> contains computed property URIs. A use case is a table with an
>> entity-attribute-value schema, which is reasonably common in complex
>> database schemas.
>>
>>
>> 2.5. Lookup tables
>> ------------------
>> The proposal doesn't support the use of lookup tables. Often, a  
>> database
>> column contains a short alphanumeric code whose meaning is  
>> described in some
>> documentation outside of the database. If this is to be mapped into  
>> “nice”
>> RDF, with proper meaningful URIs instead cryptic code, then a  
>> lookup table
>> has to be used. If that table is not part of the database, and we  
>> cannot
>> provide it as part of the database, then the lookup table has to be  
>> provided
>> as part of the mapping. A sketch:
>>
>> <LookupTable id="expense-types">
>> <Entry code="04" value="http://example.org/expense-types/airfare" />
>> <Entry code="05" value="http://example.org/expense-types/bus-train-ferry 
>> "
>> />
>> <Entry code="24" value="http://example.org/expense-types/ 
>> subsistence" />
>> <Entry code="53" value="http://example.org/expense-types/ 
>> hospitality" />
>> </LookupTable>
>>
>> <ClassMap ...>
>>  <SQLdefString>SELECT ... empno ...</SQLdefString>
>> </ClassMap>
>>
>> <PropertyMap DBColumnName="exptype" LookupTable="expense-types"
>> PropertyName="ex:expenseType" />
>>
>> In theory, this can already been done by writing a lot of nested IFF
>> statements in the view definition, but it appears to be a  
>> reasonably common
>> case that warrants special syntax.
>>
>>
>> 3. Usability and authoring considerations
>> =========================================
>>
>> 3.1. Support for URI templates
>> ------------------------------
>> While it is possible to construct pretty much arbitrary URIs in SQL  
>> by
>> means of concatenation and string functions, SQL is not always the  
>> best
>> approach. There should be a way of constructing URIs from view  
>> columns
>> outside of the SQL query. Reasons include: i) The same URI-generating
>> expression is often needed multiple times throughout a mapping,  
>> which leads
>> to redundancy and imposes maintenance costs. ii) URI generation often
>> involves string escaping, case changes etc, which quickly leads to  
>> unwieldy
>> SQL expressions. iii) Standard SQL functions may not cover all  
>> needs for URL
>> character escaping. A proposal would be to use a syntax inspired by  
>> the URI
>> Templates draft [1] to declare URI patterns at the beginning of the  
>> file.
>> The patterns can then be referenced from throughout the file. This  
>> would be
>> optional; SQL expressions could still be used for URI construction.
>>
>> [1] http://www.ietf.org/id/draft-gregorio-uritemplate-04.txt
>>
>> Example:
>>
>> <URITemplate id="employee" value="http://example.com/emp/{empno}<http://example.com/emp/%7Bempno%7D 
>> >"
>> />
>>
>> <ClassMap ...>
>>  <SQLdefString>SELECT ... empno ...</SQLdefString>
>>  <InstURIPropertyMap template="employee" />
>>  ...
>> </ClassMap>
>>
>>
>> 3.2. Multiple maps for a single class
>> -------------------------------------
>> I believe that the proposal assumes that the value of @ClassName on  
>> the
>> ClassMap element is unique throughout In many schemas there might be
>> multiple tables whose records represent instances of the same domain
>> vocabulary class. For example, a mapping that generates contact  
>> information
>> from a university database might map records in the STAFF table and  
>> in the
>> STUDENT table to the same class foaf:Person. This *can* be achieved  
>> with a
>> single ClassMap, by forming a SQL UNION of the two tables after  
>> appropriate
>> transformations. But this can be awkward because the two tables  
>> could have
>> quite different columns, e.g., staff have OFFICE_NO and  
>> CONSULTATION_HOURS,
>> while students have MAJOR and FACEBOOK_ID. For authoring  
>> convenience it
>> should be allowed to specify two separate ClassMaps, rather than  
>> shoehorning
>> everything into a single view. (At that point, ClassMap should  
>> probably be
>> renamed to ViewMap or QueryMap or something similar, because it then
>> represents the mapping from a single SQL view/query, and no longer  
>> the
>> mapping to a single class.)
>>
>> I note that both 3.1 and 3.2 are helpful to support scenarios where  
>> an
>> ontologist prepares the mapping, and a DBA fills in the SQL queries.
>>
>>
>> 3.3. Referring to a table instead of a query/view
>> -------------------------------------------------
>> To make the simple case simple, there should be a <tableName>  
>> element that
>> could be used in place of <SQLdefString>. The following markup
>>
>>  <tableName>EMP</tableName>
>>
>> would be a shorthand for the following:
>>
>>  <SQLdefString>SELECT * FROM EMP</SQLdefString>
>>
>>
>> 3.4. Allow constant values in place of column names
>> ---------------------------------------------------
>> There are reasonably frequent cases where some constant value is to  
>> be
>> added to all resources generated by a ClassMap. For example, in the  
>> case of
>> the emp table, a triple of the shape
>>
>>   <http://xyz.com/emp/1234> emp:employeeOf <http://xyz.com/the_company 
>> > .
>>
>> might be added to each employee resource. Note that only the  
>> subject of the
>> triple depends on the query result; predicate and object are the  
>> same for
>> all triples. This could be achieved by having
>>
>>   SELECT ... '<http://xyz.com/the_company>' AS company
>>
>> in the query, along with a PropertyMap
>>
>>   <PropertyMap PropertyName="emp:employeeOf" RDFTerm="IRI"
>>                DBColumnName="company"/>
>>
>> but this seems awkward and inefficient; a better alternative  
>> relying on
>> syntactic sugar would be
>>
>>   <PropertyMap PropertyName="emp:employeeOf" RDFTerm="IRI"
>>                ConstantValue="http://xyz.com/the_company" />
>>
>> and omit the company column from the query.
>>

Received on Wednesday, 4 August 2010 14:49:03 UTC