Re: Comments on the R2RM Editors' draft from Ivan Herman on 2010-12-15 (public-rdb2rdf-wg@w3.org from December 2010)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 15 Dec 2010 13:10:59 +0100
To: Souripriya Das <SOURIPRIYA.DAS@oracle.com>
Cc: Seema Sundara <seema.sundara@oracle.com>, Richard Cyganiak <richard.cyganiak@deri.org>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <192A32F8-50BF-424A-B7F3-44425D779A41@w3.org>
Hi Souri!

On Dec 14, 2010, at 18:00 , Souripriya Das wrote:

> Ivan,
> 
> Thanks a lot for your comments.

You are most welcome

> Please see our answers inline below.

I have removed all comments from below that you would just pick up and take into the next version, and kept only those where I have a comment...

[snip]
>> 
>> ----
>> Intro, second paragraph on direct mapping: I find the text a little bit 'incomplete' in comparing the two approaches. I changed the last sentence and added one before that:
>> 
>> [[[
>> Besides the R2RML language, this working group will also define a fixed "default mapping" from relational databases to RDF. In the default mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. To generate a graph using structures and terms that are more appropriate to the final application, graph transformation tools (e.g., SPARQL, RIF) should be used. With R2RML on the other hand, a mapping author can define highly customized views over the relational data and the full transformation is performed by the R2RML engine itself.
>> ]]]
>> 
>> This may be better...
>> 
> [TBD]

Sure. Note that Harry also commented on this section; I would propose you guys try to come up with some sort of merge of those texts and we can see further...

[snip]


>> ---
>> (This comment actually came up at a presentation I gave essentially on this version of R2RML, I am relying it here)
>> 
>> At the moment, the value of rr:termtype are strings ("BlankNode", etc.). Wouldn't it be more 'Semantic Webish' if some predefined URI-s were used there? Ie, 
>> 
>> [] rr:termtype <URI-for-the-concept-of-blank-node>
>> 
>> I do not have strong feelings about this, but I though it is worth conveying to the group...
>> 
> [TBD]

One of your responses below seem to suggest that, after all, you would want to change these for explicit URI-s... Is that correct?

[snip]
>> 
>> ---
>> Issue (maybe to be labelled as suchy in the tracker and added to the text?): what happens exactly when, say, SubjectMapClass is missing for a TriplesMapClass instance? We may not have to answer this in the document, but label that as an issue to be solved (and I guess those are the connection point to the direct mapping!)
>> 
>> 
> Good point. We propose to add text to refer to relevant section of the Direct Mapping draft.

Right, that is fine, but I think having a reference in the R2RML document would also be good.

[snip]

>> ---
>> Similar question: what happens if the user adds more than one subjectMaps? I know there is a table at the end of the document that sets maximum cardinality for things. But there is no statements on what the error response of an R2RML processor should be if those cardinality constraints are breached. Taking into account that an R2RML instance is in RDF, we cannot rely on, say, the order within the specification (ie, something like the second coming wins). 
>> 
>> We may just open an issue and label it as such in the document for now, b.t.w.
>> 
>> 
> We propose to remove the restriction about max cardinality = 1 so that for a single row, one can have multiple subjects. The set of triples generated from the row will be associated with each of the subjects.

That makes sense to me, although I believe this decision should be discussed and approved by the WG.

[snip]

> 
>> ---
>> 3.2.1.1: I am not sure what the role of table owner is. Is it some sort of a metadata? 
>> 
> Any database table is owned by a user and hence needs to be referred using the pair <tableOwner, tableName>.

O.k.


>> 
>> ---
>> 3.3.1.1: It is not clear from the text why one can have a _set_ of IRIs and blank nodes. Does it mean that all the triples in a row are, sort of, multiplied with different subjects? If this is indeed the idea, then it should be stated explicitly and maybe an example should be used in the appendix to show its usage
>> 
> The *set* (i.e., <Set of valid IRIs and blank nodes>) is the range of rr:subject. So, a subject is an element of this set and so can be a valid IRI or a blank node.
> However, if this causing confusion, we could change it from <Set of valid IRIs and blank nodes> to just <valid IRIs and blank nodes>.

One of my problem was not the fact that you used Sets, but simply that you use more than one possible URI-s. I was not sure what it meant to have several subject URI-s but you seem to say that then all triples generated for the rows are repeated with all the different subjects. Which is fine with me, it was not clear.

However... in RDF terms, what does <Set of valid IRIs and blank nodes> mean? There is no structure in RDF, unless you want to use RDF Lists (which I do not think we should). So the subject is only one URI; what you seem to suggest (in line with a remark above) is that the cardinality of the rr:subject property may be more than one.

I suggest that

rr:subject rdfs:range <Valid IRI or Blank Node> .

is enough for the specification, and the text would make it clear that there might be several triples using rr:subject, in which case the same triples are repeated with the different subjects.

Note that in Turtle that would mean that one could also write something like:

[] rr:subject exa:dummySubject1, exa:dummySubject2 .

but that is only a syntactic sugar for

[] rr:subject exa:dummySubject1 ;
   rr:subject exa:dummySubject2 .


B.t.w., I think it would be a good idea to have one of the examples use multiple subjects, just to make the situation clear.

>> 
>> ---
>> 3.3.1.1: another issue is: what does a blank node mean in this respect? What is the 'scope' of that blank node, ie, which graph does it belong to? I guess it is scoped to the dafault or named graph where all the triples are put; in which case this should be explained explicitly. But see also my question below on 3.3.1.5: what happens if there are several target graphs? (I guess the warning in the appendix apply...)
>> 
>> I think some more explanatory text is warranted here on this.
>> 
> In Section 5.1, we have pointed out the issue that may arise with use of blank nodes as subject and sending triples from same row to multiple graphs. We can add a line to state that the scope of a blank node is the graph (that is, either the named graph or the default graph) where the corresponding triple is being stored.

I think that would be a good idea.

>> 
>> ---
>> 3.3.1.2: what happens if I have both an rr:column and an rr:subject structure in the same SubjecMapClass? Will all of them be valid and will I get all the triples, or does rr:column invalidates rr:subject? It should be stated explicitly somewhere
>> 
> As pointed out in Section 4 (Table containing Summary of the Properties), rr:subject and rr:column cannot be used together for a SubjectMap. (Note that, since we are planning to allow multiple SubjectMaps, each of them could use either rr:subject or rr:column).

Ok, indeed, the table contains that info. But we still do not say what happens if the user violates that rule? More generally, the question is what happens if the R2RML file violates any rules in that table?

[snip]
>> 
>> ---
>> 3.3.1.5: same question v.a.v. sets. What does it mean if I give several graph IRI-s here?
>> 
> Same comment as in 3.3.1.1

And my comments are the same on your comments:-)


>> 
>> ---
>> 3.4.1.3 and 3.4.1.4: will the storage of that triple in a graph happen _additionally to_ or _instead of_ the graph storage defined for an entire row? With the knowledge that there might be no graph definition for the row, ie, the triples just go into a default graph by default...
>> 
> We will clarify that it is done "additionally", however, we could discuss having an option to specify "instead of".

'Additionally' is fine with me (unless there is a strong use case for the additional option). 

[snip]
>> 
>> ---
>> 3.8.1.: 
>> I am surprised that the rr:column and rr:template are not valid for RefPredicateMapClass. Any reason for that? If so, it may be worth explaining...
>> 
> Use of rr:template for foreign key property is not very practical.

True. Withdraw my comment:-)

[snip]
>> 
>> ---
>> Appendix A.2.3: I wonder whether this example (which is way simpler than the other two) brings any new aspect to the examples. If not, maybe we can drop it
>> 
>> 
> This example illustrates the use of column value as predicate (e.g., in vertical tables).

Ah! Indeed...

Thanks!

Ivan


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Wednesday, 15 December 2010 12:08:08 UTC