Proposal for allowing URIs in CURIE-only attributes

Hello all,

On yesterday's telecon we discussed one way we might tackle the
problem that a number of people have raised about putting URIs into
@rel and @rev.

I think we should say from the outset that in many ways there is no
problem to solve; I believe Toby, Shane and Manu have pointed out a
number of reasons why confusion about the data generated, should not
arise.

However, we discussed on the call that rather than coming at the issue
from the standpoint of 'there's nothing to fix, so let's do nothing',
we should instead look at it from the point of view of an additional
feature.


FALSE POSITIVE CURIES

The scenarios that seems to be causing concern is that when a URI
begins with the same string as a prefix mapping, it creates a 'false
positive' CURIE.

Using Julian's example:

  <a xmlns:urn="http://purl.org/dc/terms/"
   rel="urn:rights urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a"
   href="http://example.com/terms_of_service.html">
  >Terms of service</a>

This would generate two triples:

  <> <http://purl.org/dc/terms/rights>
    <http://example.com/terms_of_service.html> .

  <> <http://purl.org/dc/terms/uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a>
    <http://example.com/terms_of_service.html> .

The problem with this example is that it is relying on a situation
that won't arise to make its point; as RDFa stands at the moment, why
would anyone place a URI into the @rel or @rev attribute? Nothing in
the spec gives the impression that URIs are valid in @rel, and no
other host languages seem to indicate this either.

So all that's happening here is that an extra, probably redundant,
triple is being created, but there is no _loss_ of information.


ALLOWING URIs IN @REL, @REV, @TYPEOF, @DATATYPE ETC.

But it does raise the question of why we don't support URIs in @rel,
@rev, and other attributes.

Let's look at the current state of affairs.

The advantage of using prefix mappings is that you can shorten your
mark-up when you are using lots of terms from the same vocabulary:

  <div xmlns:foaf="http://xmlns.com/foaf/0.1/"
    about="#me" typeof="foaf:Person"
    >
    <span property="foaf:name">Mark Birbeck</span>
    <a rel="foaf:weblog" href="http://internet-apps.blogspot.com/"
      >XForms and Internet Applications</a>
    <a rel="foaf:knows" href="http://www.w3.org/People/Ivan/#me"
      >Ivan Herman</a>
    <span rel="foaf:img">
      <img src="picture-11.jpg"
        alt="Picture of Mark Birbeck"/>
    </span>
  </div>

However, the downside of this is that even when you want to add one or
two properties, you need to declare a prefix mapping:

  <div xmlns:foaf="http://xmlns.com/foaf/0.1/"
    about="#me" typeof="foaf:Person"
    >
    <a rel="foaf:knows" href="http://www.w3.org/People/Ivan/#me"
      >Ivan Herman</a>
  </div>

This creates a lot of extra work, and in particular makes
cut-and-paste examples more difficult. An obvious question then is why
RDFa doesn't simply support this kind of mark-up:

  <div about="#me" typeof="http://xmlns.com/foaf/0.1/Person">
    <a rel="http://xmlns.com/foaf/0.1/knows"
      href="http://www.w3.org/People/Ivan/#me"
      >Ivan Herman</a>
  </div>


WAY BACK...

This is a pretty obvious technique, so it probably won't surprise
anyone that it was considered...way back. :)

The problem we had at the time we discussed this was that if we went
for URLs in @rel, then we'd need to go for safe-CURIEs also. And that
then raised problems with the pre-existing tokens from HTML

The issue is, how would we know when we had a relative path, and when
we had a reserved value:

  @rel="next"

  @rel="relative-url"

At the time, the only solution we could think of was to insist on
safe-CURIEs for all non-URI values:

  @rel="[next]"

  @rel="[dc:rights]"

  @rel="relative-url"

With this technique there would be no ambiguity. But of course, even
putting aside the annoying overhead of adding those square brackets,
the big problem is that it's not actually backwards compatible, since
documents can still say:

  @rel="next"

So at that point we simply agreed that there would be no URIs in @rel.


RESERVED VALUES V. RELATIVE PATHS

However, CURIE processing evolved a little, and what effectively
happens now is that any predefined tokens are checked for first,
before the whole prefix-processing steps kick in. So it's actually
possible to differentiate between a predefined token and a relative
URI, since it's either in the list, or it's not:

  @rel="next"

  @rel="relative-url"

I should say in passing that this kind of example is really unlikely
to arise, since anyone defining a vocabulary that uses itself for
predicate values (i.e., uses a relative path) would probably do it
like this:

  @rel="next"

  @rel="#my-term"

But that doesn't mean we shouldn't think through the various scenarios
that come about with relative paths.


CURIEs V. URIS

Allowing URIs into @rel and other attributes doesn't just mean that
they have to work with the reserved words, like 'next' and 'prev', in
the way we just saw, but they must also work with CURIEs. However,
since we have safe-CURIEs, it's already easy to differentiate between
the two:

  @rel="http://xmlns.com/foaf/0.1/knows"

  @rel="[foaf:knows]"

And to return to Julian's example:

  @rel="[urn:rights]"

  @rel="urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a"


BACKWARDS-COMPATIBILITY

Unfortunately, this now creates a problem of backwards compatibility;
even if we recommend that future mark-up uses the square brackets,
there is still the problem that already existing mark-up does this:

  @rel="foaf:knows"


DIFFERENTIATING BETWEEN CURIES AND URIs

Interestingly enough, it is possible to solve this problem, by simply
saying that any string of characters that begins with a predefined
prefix is a CURIE, and anything else is a URI.

This is quite a different approach to the one normally taken to
differentiation. The usual discussion is to say that it's impossible
to tell if:

  x:y

is a URI or a CURIE because 'x' could be a protocol, and 'y' could be
any part of a URI. It might seem an obvious remedy to say that if 'x'
is a protocol, then treat 'x:y' as a URI. However, this would 'break
the web', since new protocols can be added at any time, so we can
never be definitive on whether we have a URI or not.

However, the change in our approach is to say that actually you _can_
tell if 'x:y' is a CURIE, by looking at its context. Absolute URIs can
operate out of context -- on billboards, TV, newspapers, and so on --
because they contain all of the information needed to process them.

But CURIEs cannot -- CURIEs need to provide a prefix mapping, so we
can be quite strict and say that if there is no prefix mapping, then
it's not a CURIE.


PROPOSAL

So to bring everything together, the proposal is:

 (a) RDFa should add support for URIs in attributes that currently only
     support CURIEs;

 (b) authors should be encouraged to use safe-CURIEs in those
     attributes;

 (c) but since ordinary CURIEs may still be used, we should differentiate
     by saying that anything appearing before a colon, that is not a
     mapped prefix, is a protocol.

(From an implementation point of view this is extremely easy to add;
if after splitting a 'potential CURIE' you find that the prefix does
not map to anything, then just treat the 'potential CURIE' as a URI.
Current processing requires the 'potential CURIE' to be ignored
altogether.)

Note that this proposal doesn't actually solve Julian's problem:

  <a xmlns:urn="http://purl.org/dc/terms/"
   rel="urn:rights urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a"
   href="http://example.com/terms_of_service.html">
  >Terms of service</a>

because the 'urn' prefix will still cause the URI to be processed as a CURIE.

However, by allowing URIs into @rel et al., it will hopefully make the
author more conscious of the different possibilities at play, and so
make them more careful in choosing prefix mappings.

Regards,

Mark

-- 
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)

Received on Friday, 10 July 2009 12:23:43 UTC