Re: Addressing ISSUE-47 (invalid and relative IRIs) from David McNeil on 2011-07-08 (public-rdb2rdf-wg@w3.org from July 2011)

From: David McNeil <dmcneil@revelytix.com>
Date: Fri, 8 Jul 2011 08:30:16 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <CA+8VvdxBP9p9M1mFrFY0kbhN0C4rbjeNtw1PYjgZzn6rBqeZwA@mail.gmail.com>

Richard - I thought about these questions more and my thoughts are inline
below.

On Thu, Jul 7, 2011 at 12:57 PM, Richard Cyganiak <richard@cyganiak.de>wrote:

> On 7 Jul 2011, at 16:26, David McNeil wrote:
> > On Mon, Jul 4, 2011 at 6:14 PM, Richard Cyganiak <richard@cyganiak.de>
> wrote:
> >> 3. Invalid IRIs (e.g., anything containing spaces and so on) are
> skipped, and if any triple would include such an IRI then that triple is
> skipped
> >>
> >
> > This worries me. I am uncomfortable with rows of data silently
> disappearing based on their contents.
>
> The question is, what's the alternative. The only workable other option I
> can think of is to make this an error.
>

I see two different perspectives on the mapping issue.

1) a relatively casual user wants to expose a relational database as RDF and
want it to "just work".  I can see in this mode that it could make sense to
just silently ignore rows that might cause trouble (e.g. rows with null
values or rows that produce IRIs with spaces, or rows that produce text
values that claim to be numbers.

2) a software developer building an application that includes mapping a
relational database to RDF. In this mode I think it is very troublesome for
rows to just silently disappear from the output. This is like software
silently swallowing exceptions (typically a bad practice that makes
debugging much more difficult).

Because of my background and the product I am working on I am more concerned
with the second use case. Driven by this I would say that for ISSUE-47 and
ISSUE-51 the R2RML implementation should simply generate these triples and
pass them downstream.

This thinking also causes me to reconsider silently suppressing rows null
values in template expressions.

> >> 4. rr:template is changed so that it %-encodes most characters. This
> means that rr:column "person/{NAME}" will work even if the name contains
> spaces, the result will be "http://base.uri/person/Alice%20Smith"
> >>
> >
> > * I think a consequence of what you are proposing is that the following
> two R2RML snippets would behave differently with respect to encoding:
> >   rr:subject [ rr:column "Name" ]
> >   rr:subject [ rr:template "{Name}" ]
> >
> > I think it would be less surprising to users if these two constructs had
> the same behavior.
>
> You are right, it's a bit surprising, but I think it's easy enough to learn
> and remember that "{name}" performs %-encoding when generating IRIs while
> "name" doesn't.
>
> I don't see how we could reasonably make both behave the same. Both can't
> be %-encoding, because then rr:column would %-encode already valid IRIs,
> making them invalid. If both are non-%-encoding, then we need some other
> mechanism for %-encoding, so we'd need a proposal for that.
>

What if we made the %-encoding optional in templates? So for example this
would not perform %-encoding:
    rr:template "{Name}"

But this would:
    rr:template "{%Name}"

On the last telecon we discussed defining functions for the user to invoke
to perform %-encoding but there was some concern about making it more
difficult to parse the mapping. Using a solution like {%Name} seems like it
would accomplish the objective without making the templates significantly
harder to parse.

On a related issue, I think that as we introduce %-encoding on the mapping
side we need to define how the inverse operation is performed in inverse
expressions.

-David

Received on Friday, 8 July 2011 13:30:45 UTC