Re: ISSUE-37, ACTION-138: DBMS connection details in R2RML mappings

Hi Ted,

On 12 Jul 2011, at 17:07, Ted Thibodeau Jr wrote:
> As we've been discussing, a given R2RML mapping is tied to a given
> RDB schema

Right.

> and indeed, to the data in that RDB at a point-in-time.

Not as it is specified at the moment, although that is contentious as we can see in the ongoing issue 47 thread.

(Technically speaking, an R2RML mapping is just an inert data structure. One can write it down in R2RML mapping document. An R2RML processor takes as its input an R2RML mapping, a SQL connection, and a base IRI; and produces from that a (possibly virtual) RDF dataset. One cannot make statements about the conformance of an R2RML document in the absence of a SQL connection, but at the moment it is sufficient to investigate the schema of the database it connects to.)

> Given this, there are certain identifying characteristics of the
> RDB schema that should be kept with (and I think, within) the 
> R2RML mapping.
> 
> These may include --
> 
> - DB engine identifier
>  -- as different SQL syntax may be needed against different
>     engines.  there are ways to query the engine itself for
>     identification including version, which allow tools such
>     as R2RML engines to sculpt their SQL appropriately to 
>     the RDB engine...
> - DB host
>  -- as there may be other instances on other hosts, with
>     similar or even identical schema, but different content
> - DB listening port
>  -- as there may be multiple instances on the same host, with
>     similar or even identical schema, but different content
> - authenticating user ID
>  -- as different users may see different tables, fields, 
>     rows, etc., due to differing permissions
> 
> The above may well be an incomplete list!  

I'd add:

- authenticating user password
- JDBC driver class name
- JDBC connection properties
- the base IRI

This would complete the list of things one can specify in D2RQ (all optional). A couple of them are wrapped into a JDBC URL.

> It has been suggested that a few fields might be used, possibly 
> holding a JDBC Driver ClassName and the associated JDBC Connect 
> String.  This is potentially problematic, as it assumes that 
> every DBMS will have such a JDBC driver,

I think that's not a showstopper, especially if inclusion of the information is optional. We have no requirement to design something that supports absolutely every RDBMS on the planet.

> and may suggest to users that every connection should be made thereby.

Right, it suggests a particular implementation, which is not optimal.

> It has also been suggested that this information be optional, which
> I could accept *if* the standard/spec states R2RML SHOULD include
> such data (retains validity of existing mappings which lack it, 
> and allows for justifiable exclusion in future mappings, but 
> strongly discourages such lack.)

There are a couple of reasons against including such information:

1. We have a use case in the UC&R document that calls for mapping the schema of a particular app (Wordpress in this case) to RDF; this app can be installed anywhere and the mapping should be generic

2. Sysadmins might prefer to keep connection information separate from the mapping (cf. common practice of putting database connection details into a config file rather than into application source code)

3. Mappings might be edited by one group of people and deployed by another group, with implications for version control of the mapping file

4. It's hard to specify in R2RML because there is no single standard for providing database connection information. Existing standards (JDBC, ODBC) are strongly tied to a particular implementation/platform.

5. Not all databases are accessed via a TCP/IP hostname and port. See SQLite for example.

6. One set of R2RML mappings that this WG is supposed to produce (the test cases) most definitely should not contain identifiers of a particular database

7. There might be security issues with divulging some or all of the connection information

I think there are valid and normal scenarios where one doesn't want to include such information, so I'm hesitant about strongly encouraging their inclusion with a SHOULD.

For D2RQ I'd certainly want the ability to optionally include this information in the mapping document, but given that I can't really think of any good way to do it besides our current JDBC approach, I'd be happy enough to do so in a proprietary extension vocabulary for now, and POSTPONE ISSUE-37.

Best,
Richard

Received on Tuesday, 12 July 2011 19:26:37 UTC