Re: SQL in R2RML mappings

David,

Thanks very much for input. Seema and I had a discussion about the 
issues you have raised.
Please see our comments inline.

Thanks,
- Souri/Seema.

David McNeil wrote:
> We have implemented the core of R2RML (as defined in the 2010/12/07 
> Editors Draft) in our relational-to-RDF product and we are working on 
> converting several mappings to use R2RML. In the course of this work 
> we have encountered the following issues/questions related to the SQL 
> statements that are embedded in R2RML mappings. A theme of this list 
> is a desire to avoid duplicating SQL in the mappings. It is possible 
> that we are not interpreting the specification properly, in which case 
> we would appreciate clarification. Also, I am happy to provide more 
> details about any of these issues and to talk about how we are working 
> around them.
>
> Thank you.
> -David McNeil
>
> ====
>
> 1) Reuse a SQL query string in multiple TriplesMaps.
>
> For example, consider the case where several tables must be joined in 
> order to identify the rows to map. If these rows are to be used to 
> generate multiple subjects, then the entire query must be copied.
>
> A possible solution would be to allow a SQL query to be represented as 
> a resource (rather than as a string) and used by multiple TriplesMaps.
>
> 2) Reuse a SQL query as a sub-query under multiple TriplesMaps.
>
> For example, there may be a repeated sub-query that is used by several 
> queries. Rather than copy-and-paste the query multiple times, it is 
> desirable to write it once and join it into several other queries. In 
> particular, consider the case where the mapping process is not allowed 
> to make changes to the database (e.g. by adding a view to the database).
>
> A possible solution is to allow a logical table (i.e. query) to be 
> built in the mapping from sub-logical-tables.
>
In general, the idea of having SQL subquery defined as value of a 
resource and then combining such resources to create a complete SQL 
query could be a feasible extension.
(A specific question about your issue #1: we are not sure what is the 
requirement for "generate multiple subjects" ? If you meant multiple 
TriplesMaps, then we can understand that the requirement is to re-use 
the SQL query in those TriplesMaps.)

> 3) Allow a predicate/object map to join in another table.
>
> RefPredicateObjectMaps allow TriplesMaps to be "joined" in very 
> specific ways. But they do not allow the expressions used to compute 
> the predicate or object to reference both queries that are joined. 
> Consider a case where the type of the relationship (and therefore the 
> specific predicate) is determined by columns of the object query. For 
> example a predicate of either "mother" or "father" might be needed 
> depending on the gender of the object. This type of mapping is not 
> supported by RefPredicateObjectMaps because the predicate cannot refer 
> to any of the columns used to produce the object. Similarly the object 
> produced cannot use columns from the subject's query. This can be 
> worked around by simply creating a new SQL query and basing the 
> subject, predicate, and object on it. However, this requires 
> undesirable copying of the SQL.
>
> This could be addressed by allowing a predicate object map to 
> explicitly define an additional query as a logical query.
>
Not clear to us. Could you please clarify this requirement with a 
concise example.
> 4) Allow two queries to be joined via either an inner or an outer join.
>
> I don't see any means to specify inner/outer joins as part of a 
> joinCondition. This would be useful in cases where null values in the 
> joined table are used to generate triples in the output.
>
This is a good idea. We will consider providing a way to specify the 
following: 1) type of join (inner or outer) 2) ON clause, 3) (already 
provided) WHERE clause.
> 5) Support for database vendor specific SQL statements.
>
> For example, this is needed if a mapping needs to use an Oracle 
> specific statement that cannot be parsed as standard SQL.
>
> A possible solution is to allow SQL statements to be flagged as 
> "opaque". This would indicate that the statements are not to be parsed 
> by the mapping tool, but simply passed down to the underlying database.
It is already assumed to be opaque. That is, mapping processor does not 
need to parse the SQL. However, for informational purposes, we could 
consider adding a way to specify the compatibility requirement (as a 
"comment" as an attribute of the SQL), but not requiring mapping 
processor to validate that.

Received on Tuesday, 25 January 2011 16:03:36 UTC