Re: Fine-tuning CURIEs (reply #2 :-)

From: Mark Birbeck <mark.birbeck@formsPlayer.com>
Date: Wed, 12 Sep 2007 10:59:36 +0100
Message-ID: <a707f8300709120259o4f9cf75ek9dc82870b35198fd@mail.gmail.com>
To: "Ivan Herman" <ivan@w3.org>
Cc: "W3C RDFa task force" <public-rdf-in-xhtml-tf@w3.org>

Hi Ivan,

Sorry that this is getting complicated. I'll try to explain the issue
a bit better than I have!

First, setting the 'anonymous prefix' to use the 'default namespace'
was not intended to help the OpenID situation. The 'anonymous prefix'
is just the same as in SPARQL where you can define a namespace for
':', and then you can use ':x' anywhere you like.

I think we should support this, since (a) it will have no effect on
novice authors--they don't have to use it, but (b) it will be useful
for advanced authors, and it makes it easy to convert data and samples
from other formats, such as Turtle.

You don't have to agree with that, but let's leave that to one side,
since it doesn't have much effect on anything other than authoring
convenience. Instead, let's move on to the 'no prefix' situation,
i.e., when there is no colon.

This situation brings with it different issues than the 'anonymous
prefix' (AP) one. Whilst the AP situation only arises in N3-style
languages, the 'no prefix' (NP) situation is something we find in the
current use of QNames. Some languages say that the a QName with no
prefix is evaluated in the current default namespace, and some
languages say that it is treated as if there is no namespace at all.

The fact that AP and NP are distinct means that we can define a
different rule for each, if we want to. (Of course we could make them
the same, but if there is a case for diverging, we have the option to
use it.) The possibilities for NP are:

  1. Don't allow it. There is therefore no such thing as a CURIE with
no colon in, and the
     values of 'next' and 'prev' get converted by a preprocessor to
full CURIEs before
     processing continues.

  2. Allow NP, and say that the default namespace is used.

  3. Allow NP, and say that a hard-coded namespace is used, the XHTML namespace.

(You could say that there is a fourth option of saying that there is
no namespace, but since the whole point is to abbreviate URIs, there
doesn't seem to be much of a benefit there.)

1. The problem with not allowing NP, is that this would have to be an
RDFa-exception to CURIEs. The idea of CURIEs is that they are
backwards compatible with QNames, and QNames allows values with no
colon, so CURIEs must too. You might argue that we don't care in RDFa,
and could have our own version of CURIEs, but as I mentioned
yesterday, RDFa is part of a bigger 'semantic web' package, which
includes things like the role attribute. Having different rules for
our abbreviated URIs in different attributes will soon get very messy.

(I'm not implying that you are indifferent to the bigger picture,
Ivan. :) I'm just trying to anticipate one kind of objection.)

2. The problem with allowing NP, and using the default namespace is
that, as both you and Shane rightly pointed out, if the author changes
the default namespace, all of the values like 'next' and 'prev'
suddenly move into the new namespace. Although I knew that was
probably a show-stopper, I made the suggestion anyway because it did
have the side-effect of providing a way to do the
OpenID-kind-of-stuff. I thought it at least worth a mention, but I
don't have a problem with not going that way. :)

3. The problem with hard-coding the namespace is that it raises the
question of what to do with values that are not currently defined in

My view is that we should do number 3, and that all we have to do is
put all triples into the XHTML namespace.

There is an assumption that HTML defined @rel and @rev in the same way
as @class, but it didn't. Whilst @class is actually defined to allow
any value, @rel and @rev are defined to allow only a limited set of
values. Of course people do use it like @class, and we should try to
accommodate that, but the key thing is that if a future version of
HTML wanted to add a new value to @rel it could, and it would not be
breaking anything since HTML 'owns' @rel. But if that future version
of HTML tried to specify a value for @class, things would be very
different, since @class has been specifically defined to be 'owned' by
the author.

So the situation is that any value in @rel could legitimately be
regarded as being an XHTML value. The question then becomes one of
what to do with it. You could simply reject any non-recognised values,
which means that a triple like:

    http://www.w3.org/1999/xhtmlopenid.delegate> <xyz> .

would just get dropped. I'm not quite sure how we would define
that--perhaps we just say that only the values that come from XHTML 1
are accepted.

Note that this doesn't mean we can't support legacy predicates. One
way is to use the preprocessor approach to alter the
@rel="openid.delegate" to some other value (perhaps, openid:delegate)
and so a completely different triple will be generated.

Myself, I've gone off the preprocessor idea. In my demos I just let
legacy predicates get parsed into the XHTML namespace, and then test
for URLs like the one above. It allows me to look at any web-page I
like, and know that I can process every single piece of metadata on
that page, without having to edit a preprocessor, even if the page

  <link rel="valueneverseenbefore" href= "..." />

I don't intend to change that aspect of my parser, since having
'extra' triples that you can easily ignore, seems no cost at all,
compared to the converse which would be to have no way to get at
useful metadata that is in the document. But the question is whether
my parser's behaviour should just be its own feature, or whether we
adopt this approach across the board.

One way to make it feel a little more 'comfortable', might be to
define the NP namespace to be something like:


That would have the effect of placing all currently used predicates
that have not adopted a proper CURIE format, into a legacy namespace.
For example:

    http://www.w3.org/1999/xhtml/legacy#openid.delegate> <xyz> .

    http://www.w3.org/1999/xhtml/legacy#valueneverseenbefore> <xyz> .

It at least means that processors have a way of finding *all* metadata
in a page, and of course it can easily be ignored if you don't want
it. But it has the benefit of not trying to come up with hacks to make
data in a variety of formats become 'first order' triples in RDFa.
(Such as allowing a '.' as a prefix separator, and that kind of



On 12/09/2007, Ivan Herman <ivan@w3.org> wrote:
> Mark,
> I thought about this again but, I must admit, I did not change my
> opinion, even reading through the thread again...
> I was actually a bit surprised to read, in Shane's reply[1] that
> [[[
> XHTML prohibits the introduction of other values that are not namespace
> qualified into @rel / @rev (and @role).  That whole space is reserved.
> The DTD does not enforce this because it is not possible to do so, but
> conforming RDFa processors should  ignore any incorrect values IMHO.
> ]]]
> Is this an XHTML1 or and XHTML2 feature? As far as XHTML1 is concerned,
> I looked up in [2] and it says:
> [[[
> Authors may use the following recognized link types, listed here with
> their conventional interpretations. A LinkTypes value refers to a
> space-separated list of link types. White space characters are not
> permitted within link types.
> ]]]
> I am not sure how to interpret the word 'may' in this respect, but my
> impression is that this means the author can have his/her own, or use
> these predefined ones.
> However... regardless of what the spec says, the reality out there is
> that authos _do_ use @rel/@rev with values that are _not_ defined by
> XHTML. The example of openid or DC are the obvious ones. What this also
> means that the _second_ scheme you propose:-) would actually lead to new
> and somewhat unexpected triples. (By the way, one of the
> questions/comments asked by the DCMI guys in Singapore was exactly that:
> they _hope_ that RDFa will not generate extra triples if they are
> encoded using the dc.title trick...)
> Regardless of how it is described in the CURIE document, my feeling is that:
> - 1. We should obviously interpret 'a:b'
> - 2. We should interpret 'b' as 'http://www.w3.org/1999/xhtml/b'
> _provided_ that 'b' is one of the entries listed by the relevant XHTML
> document (I fully agree with Shane on that one for the reasons above)
> - 3. For ':b' we can either say that it behaves exactly like 'b' above,
> or we introduce the usage of a default namespace. I am mildly in favour
> _not_ to use the concept of default namespace at all. Yes, that would
> invalidate your openid example, but I do not think that is so important
> (and the DCMI use case is, as I said before, moot)
> Ivan
> [1] http://www.w3.org/mid/46E6BB62.9070204@aptest.com
> Mark Birbeck wrote:
> > Hello all,
> >
> > During the course of finishing off the Syntax document a couple of
> > issues have popped up. I'll deal with them in separate threads.
> >
> > This thread relates specifically to the way that we ensure that
> > mark-up like this yields the kind of triples we'd expect:
> >
> >   <link rel="next" href="o" />
> >
> > At the moment we say that some kind of preprocessor runs and that the
> > mark-up above is 'mapped' to this:
> >
> >   <link rel="xh:next" href="o" />
> >
> > This is fine, and if we're happy with that, we can just leave it.
> > However, there is another way to come at this, which I'll describe.
> >
> > Myself and Shane changed the CURIE definition recently so that *both*
> > the prefix and the colon were optional:
> >
> >   [ [ prefix ] ':' ] reference
> >
> > This is so that all of the following are valid:
> >
> >   a:b
> >   :b
> >   b
> >
> > We did this because the second format is needed in N3 and Turtle-based
> > languages such as SPARQL, whilst the third format is needed if we want
> > to be able to handle legacy QNames.
> >
> > I was therefore looking more closely at what exactly these three
> > different formats should mean since we don't have that defined clearly
> > in our specification. The most obvious route for the second format is
> > to say that it should use the current default namespace, making it
> > consistent with SPARQL, etc.
> >
> > However, there is no general practice for non-prefixed QNames--in some
> > situations the default namespace is used (such as in declarations of
> > type in XML Schema), and in some situations it is explicitly ignored
> > (such as when defining a template in XSLT). This means that we could
> > choose to use the default namespace, or define some other rule like
> > always using the XHTML namespace, or even the current value of [base].
> >
> > An interesting thing comes about though, if we were to choose to use
> > the default namespace; returning to the syntax we had earlier:
> >
> >   <link rel="next" href="o" />
> >
> > we could obtain a predicate of 'xh:next' without having to do _any_
> > preprocessing, but *only* if the default namespace was XHTML:
> >
> > <html xmlns="http://www.w3.org/1999/xhtml">
> >   <head>
> >     <title>...</title>
> >     <link rel="next" href="o" />
> >   </head>
> >   ...
> > </html>
> >
> > I like this approach since I think it gives future authors a lot of
> > flexibility. It also, quite by accident, provides a way to remove the
> > need for a lot of the preprocessing we have been discussing. For
> > example, one could mark-up OpenID using a layout like this:
> >
> >    <link rel="openid.server" xmlns="http://openid.net/"
> >       href="https://api.screenname.aol.com/auth/openidServer" />
> >
> >    <link rel="openid.delegate" xmlns="http://openid.net/"
> >        href="http://openid.aol.com/wezfurlong" />
> >
> > Note that instead of worrying about trying to make "openid." into some
> > kind of prefix, we simply use the full string as the reference.
> >
> > Anyway, there you have it. The choices seem to be:
> >
> >   * have a preprocessing step to get at 'legacy' properties and
> > short-forms, such
> >     as xh:next. In this case we'd still need to say what unprefixed CURIEs mean,
> >     but wherever we choose would make no difference to the preprocessing step;
> >     they could be in the default namespace, the current document, or some explit
> >     namespace;
> >
> >   * or, we say that CURIEs with no prefix--with or without the
> > colon--use the default
> >     namespace, and then leverage this to cope with some of the legacy properties
> >     like 'xh:next' and 'openid:openid.delegate' _without_ the need for
> > a preprocesing
> >     step.
> >
> > Myself, I can go either way; I'd prefer the second solution, since I
> > think it would be quite neat if we only used the preprocessing step
> > when it is really necessary. This is because although the
> > preprocessing seems pretty benign, we've never really discussed things
> > like the fact that the preprocessor must operate across all of the
> > attributes, in a consistent way. For example, the value of 'next'
> > would need mapping in both @rel and @about, for the following
> > statements to work:
> >
> >   <html xmlns:skos="http://www.w3.org/2004/02/skos/core#">
> >     <head>
> >       <link rel="next" href="o" />
> >       .
> >       .
> >       .
> >       <div about="[next]" instanceof="skos:Concept">
> >         <span property="skos:prefLabel">Next</span>
> >         <div property="skos:definitionl">
> >           Refers to the next document in a linear sequence of documents. User
> >           agents may choose to preload the "next" document, to reduce the
> >           perceived load time.
> >         </div>
> >       </div>
> >
> > However, if the CURIEs were using the default namespace you can see
> > that this mark-up would 'just work'.
> >
> > Your thoughts and votes please. :)
> >
> > Regards,
> >
> > Mark
> >
> --
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Wednesday, 12 September 2007 09:59:46 UTC

