- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Tue, 3 Aug 2010 22:48:00 +0100
- To: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
- Cc: Souripriya Das <souripriya.das@oracle.com>
All, Here is a collection of comments on the mapping language proposal that Souri presented in today's call. It's great that we now have a concrete proposal on the table that we can iteratively refine. I refer to the XML schema and example instances here: http://www.w3.org/2001/sw/rdb2rdf/wiki/An_XML_Schema_for_SQL-based_RDB2RDF_Mapping http://www.w3.org/2001/sw/rdb2rdf/wiki/Revised_Example_of_SQL-Query_based_Approach I have also previously studied Souri's presentation from the F2F, which presented a previous version of the same idea in an abstract syntax: http://www.w3.org/2001/sw/rdb2rdf/wiki/File:RDB2RDF_Map_viaSQL_F2F_at_SemTech2010.pdf I grouped my comments into three categories: 1. Non-functional improvements, that is, issues where I believe that the same effect could be achieved in a better way; 2. Proposals for adding expressivity, to make sure that the language covers all of RDF and is suitable for some real-world mapping tasks that currently are impossible or very hard to achieve; 3. Proposals related to authoring convenience, that is, cases where maintainability and usability can be improved by adding some syntactic sugar or other optional features. As a further comment, I don't believe that XML is the right choice for the surface syntax of this language, and I think that the mapping itself should be expressed as an RDF graph. But that is a separate discussion, and I will for now assume that an RDF representation of the mapping itself could be derived easily at a later time. Cheers, Richard 1. Non-functional improvements ============================== 1.1. Don't enclose generated URIs in pointy brackets ---------------------------------------------------- The proposal demands the generation of URIs enclosed in pointy brackets, for example: SELECT '<xyz.com/emp/' || empno || '>' AS empURI This is unnecessary, the following works just as well: SELECT 'xyz.com/emp/' || empno AS empURI 1.2. Name of InstURIPropertyMap ------------------------------- The element defines instance URIs, but has nothing to do with properties, it should be renamed to reflect that. 1.3. Issues with GraphURIPropertyMap ------------------------------------ It appears that GraphURIPropertyName is unnecessary; just providing a way of generating a graph name URI is sufficient for placing triples into an appropriate graph. What is the use of this property? To selectively place different properties of a ClassMap in different graphs, the proposal allows referencing of the PropertyMaps from the GraphURIPropertyMap by specifying the RDF property name used in the PropertyMap. This only works if we assume that RDF properties are unique within a ClassMap. That's an unnecessary restriction. A cleaner design would be to invert the direction of the reference, pointing from the PropertyMap to the GraphURIPropertyMap that specifies where the generated triple should reside. 1.4. Issues with KeyMaps ------------------------ It appears that the proposal requires specification of all key constraints in the mapping language. Often this will duplicate information that can be determined automatically from the database schema. This should be avoided if at all possible. It appears that the proposal works by expressing constraints (PK, unique, FK) on RDF properties. It would be better to express them as constraints on the columns in the result sets generated by the views/ queries. This is because the constraint-carrying column may actually not be exposed directly in the generated RDF. For example, the PK column of a table is often not mapped to a property, but becomes part of the subject URI. Thus, in the emp table, the PK constraint should not be expressed on the emp:empno property, but on the empno (or empURI) column in the view. 2. Adding expressivity ====================== 2.1. Allow blank nodes as subjects ---------------------------------- The proposal only allows URIs in the subject positions of triples. RDF also allows blank nodes. This could be addressed by allowing an InstBNodeMap with one or more DB columns in place of the InstURIPropertyMap. 2.2. Allow language tags for literals ------------------------------------- The proposal does not allow the specification of language tags on literals. This could be addressed by adding an attribute to the PropertyMap. It is worth exploring wether computed language tags, whose values come from a view column rather than from a constant attribute value, should be supported as well. 2.3. Allow custom datatypes for literals ---------------------------------------- The proposal doesn't yet fully describe how datatypes would be handled, but the intention seems to be that the datatype is determined by the expression that defines the view column. There language should support overriding the datatype with an arbitrary datatype URI. It would be the mapping author's responsibility to ensure that the view column contains values that are legal lexical forms for that datatype. The datatype could be specified in an attribute of the PropertyMap. It is worth exploring wether computed datatypes should be allowed. 2.4. Computed property URIs --------------------------- It is worth exploring wether computed property URIs should be allowed. In addition to the ProperyName attribute, which provides a constant property URI, there could be a PropertyColumnName that refers to a view column which contains computed property URIs. A use case is a table with an entity-attribute-value schema, which is reasonably common in complex database schemas. 2.5. Lookup tables ------------------ The proposal doesn't support the use of lookup tables. Often, a database column contains a short alphanumeric code whose meaning is described in some documentation outside of the database. If this is to be mapped into “nice” RDF, with proper meaningful URIs instead cryptic code, then a lookup table has to be used. If that table is not part of the database, and we cannot provide it as part of the database, then the lookup table has to be provided as part of the mapping. A sketch: <LookupTable id="expense-types"> <Entry code="04" value="http://example.org/expense-types/airfare" /> <Entry code="05" value="http://example.org/expense-types/bus-train-ferry " /> <Entry code="24" value="http://example.org/expense-types/ subsistence" /> <Entry code="53" value="http://example.org/expense-types/ hospitality" /> </LookupTable> <ClassMap ...> <SQLdefString>SELECT ... empno ...</SQLdefString> </ClassMap> <PropertyMap DBColumnName="exptype" LookupTable="expense-types" PropertyName="ex:expenseType" /> In theory, this can already been done by writing a lot of nested IFF statements in the view definition, but it appears to be a reasonably common case that warrants special syntax. 3. Usability and authoring considerations ========================================= 3.1. Support for URI templates ------------------------------ While it is possible to construct pretty much arbitrary URIs in SQL by means of concatenation and string functions, SQL is not always the best approach. There should be a way of constructing URIs from view columns outside of the SQL query. Reasons include: i) The same URI- generating expression is often needed multiple times throughout a mapping, which leads to redundancy and imposes maintenance costs. ii) URI generation often involves string escaping, case changes etc, which quickly leads to unwieldy SQL expressions. iii) Standard SQL functions may not cover all needs for URL character escaping. A proposal would be to use a syntax inspired by the URI Templates draft [1] to declare URI patterns at the beginning of the file. The patterns can then be referenced from throughout the file. This would be optional; SQL expressions could still be used for URI construction. [1] http://www.ietf.org/id/draft-gregorio-uritemplate-04.txt Example: <URITemplate id="employee" value="http://example.com/emp/{empno}" /> <ClassMap ...> <SQLdefString>SELECT ... empno ...</SQLdefString> <InstURIPropertyMap template="employee" /> ... </ClassMap> 3.2. Multiple maps for a single class ------------------------------------- I believe that the proposal assumes that the value of @ClassName on the ClassMap element is unique throughout In many schemas there might be multiple tables whose records represent instances of the same domain vocabulary class. For example, a mapping that generates contact information from a university database might map records in the STAFF table and in the STUDENT table to the same class foaf:Person. This *can* be achieved with a single ClassMap, by forming a SQL UNION of the two tables after appropriate transformations. But this can be awkward because the two tables could have quite different columns, e.g., staff have OFFICE_NO and CONSULTATION_HOURS, while students have MAJOR and FACEBOOK_ID. For authoring convenience it should be allowed to specify two separate ClassMaps, rather than shoehorning everything into a single view. (At that point, ClassMap should probably be renamed to ViewMap or QueryMap or something similar, because it then represents the mapping from a single SQL view/query, and no longer the mapping to a single class.) I note that both 3.1 and 3.2 are helpful to support scenarios where an ontologist prepares the mapping, and a DBA fills in the SQL queries. 3.3. Referring to a table instead of a query/view ------------------------------------------------- To make the simple case simple, there should be a <tableName> element that could be used in place of <SQLdefString>. The following markup <tableName>EMP</tableName> would be a shorthand for the following: <SQLdefString>SELECT * FROM EMP</SQLdefString> 3.4. Allow constant values in place of column names --------------------------------------------------- There are reasonably frequent cases where some constant value is to be added to all resources generated by a ClassMap. For example, in the case of the emp table, a triple of the shape <http://xyz.com/emp/1234> emp:employeeOf <http://xyz.com/the_company > . might be added to each employee resource. Note that only the subject of the triple depends on the query result; predicate and object are the same for all triples. This could be achieved by having SELECT ... '<http://xyz.com/the_company>' AS company in the query, along with a PropertyMap <PropertyMap PropertyName="emp:employeeOf" RDFTerm="IRI" DBColumnName="company"/> but this seems awkward and inefficient; a better alternative relying on syntactic sugar would be <PropertyMap PropertyName="emp:employeeOf" RDFTerm="IRI" ConstantValue="http://xyz.com/the_company" /> and omit the company column from the query.
Received on Tuesday, 3 August 2010 21:48:45 UTC