Re: Fine-tuning CURIEs (reply #2 :-) from Ivan Herman on 2007-09-12 (public-rdf-in-xhtml-tf@w3.org from September 2007)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 12 Sep 2007 12:12:56 +0200
To: Mark Birbeck <mark.birbeck@formsPlayer.com>
Cc: W3C RDFa task force <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <46E7BBA8.10807@w3.org>
Mark

(Sigh...)

I keep to my original opinion which was, reworded in your language:

[[[
1. Don't allow it. There is therefore no such thing as a CURIE with no
colon in, and the values of 'next' and 'prev' get converted by a
preprocessor to full CURIEs before processing continues.
]]]

with the additional restriction that this happens only with values
defined by XHTML an not with others.

I have not seen any argument (sorry about that...) that would make my
opinion change.

Ivan


Mark Birbeck wrote:
> Hi Ivan,
> 
> Sorry that this is getting complicated. I'll try to explain the issue
> a bit better than I have!
> 
> First, setting the 'anonymous prefix' to use the 'default namespace'
> was not intended to help the OpenID situation. The 'anonymous prefix'
> is just the same as in SPARQL where you can define a namespace for
> ':', and then you can use ':x' anywhere you like.
> 
> I think we should support this, since (a) it will have no effect on
> novice authors--they don't have to use it, but (b) it will be useful
> for advanced authors, and it makes it easy to convert data and samples
> from other formats, such as Turtle.
> 
> You don't have to agree with that, but let's leave that to one side,
> since it doesn't have much effect on anything other than authoring
> convenience. Instead, let's move on to the 'no prefix' situation,
> i.e., when there is no colon.
> 
> This situation brings with it different issues than the 'anonymous
> prefix' (AP) one. Whilst the AP situation only arises in N3-style
> languages, the 'no prefix' (NP) situation is something we find in the
> current use of QNames. Some languages say that the a QName with no
> prefix is evaluated in the current default namespace, and some
> languages say that it is treated as if there is no namespace at all.
> 
> The fact that AP and NP are distinct means that we can define a
> different rule for each, if we want to. (Of course we could make them
> the same, but if there is a case for diverging, we have the option to
> use it.) The possibilities for NP are:
> 
>   1. Don't allow it. There is therefore no such thing as a CURIE with
> no colon in, and the
>      values of 'next' and 'prev' get converted by a preprocessor to
> full CURIEs before
>      processing continues.
> 
>   2. Allow NP, and say that the default namespace is used.
> 
>   3. Allow NP, and say that a hard-coded namespace is used, the XHTML namespace.
> 
> (You could say that there is a fourth option of saying that there is
> no namespace, but since the whole point is to abbreviate URIs, there
> doesn't seem to be much of a benefit there.)
> 
> 
> 1. The problem with not allowing NP, is that this would have to be an
> RDFa-exception to CURIEs. The idea of CURIEs is that they are
> backwards compatible with QNames, and QNames allows values with no
> colon, so CURIEs must too. You might argue that we don't care in RDFa,
> and could have our own version of CURIEs, but as I mentioned
> yesterday, RDFa is part of a bigger 'semantic web' package, which
> includes things like the role attribute. Having different rules for
> our abbreviated URIs in different attributes will soon get very messy.
> 
> (I'm not implying that you are indifferent to the bigger picture,
> Ivan. :) I'm just trying to anticipate one kind of objection.)
> 
> 2. The problem with allowing NP, and using the default namespace is
> that, as both you and Shane rightly pointed out, if the author changes
> the default namespace, all of the values like 'next' and 'prev'
> suddenly move into the new namespace. Although I knew that was
> probably a show-stopper, I made the suggestion anyway because it did
> have the side-effect of providing a way to do the
> OpenID-kind-of-stuff. I thought it at least worth a mention, but I
> don't have a problem with not going that way. :)
> 
> 3. The problem with hard-coding the namespace is that it raises the
> question of what to do with values that are not currently defined in
> XHTML.
> 
> My view is that we should do number 3, and that all we have to do is
> put all triples into the XHTML namespace.
> 
> There is an assumption that HTML defined @rel and @rev in the same way
> as @class, but it didn't. Whilst @class is actually defined to allow
> any value, @rel and @rev are defined to allow only a limited set of
> values. Of course people do use it like @class, and we should try to
> accommodate that, but the key thing is that if a future version of
> HTML wanted to add a new value to @rel it could, and it would not be
> breaking anything since HTML 'owns' @rel. But if that future version
> of HTML tried to specify a value for @class, things would be very
> different, since @class has been specifically defined to be 'owned' by
> the author.
> 
> So the situation is that any value in @rel could legitimately be
> regarded as being an XHTML value. The question then becomes one of
> what to do with it. You could simply reject any non-recognised values,
> which means that a triple like:
> 
>   <>
>     http://www.w3.org/1999/xhtmlopenid.delegate> <xyz> .
> 
> would just get dropped. I'm not quite sure how we would define
> that--perhaps we just say that only the values that come from XHTML 1
> are accepted.
> 
> Note that this doesn't mean we can't support legacy predicates. One
> way is to use the preprocessor approach to alter the
> @rel="openid.delegate" to some other value (perhaps, openid:delegate)
> and so a completely different triple will be generated.
> 
> Myself, I've gone off the preprocessor idea. In my demos I just let
> legacy predicates get parsed into the XHTML namespace, and then test
> for URLs like the one above. It allows me to look at any web-page I
> like, and know that I can process every single piece of metadata on
> that page, without having to edit a preprocessor, even if the page
> has:
> 
>   <link rel="valueneverseenbefore" href= "..." />
> 
> I don't intend to change that aspect of my parser, since having
> 'extra' triples that you can easily ignore, seems no cost at all,
> compared to the converse which would be to have no way to get at
> useful metadata that is in the document. But the question is whether
> my parser's behaviour should just be its own feature, or whether we
> adopt this approach across the board.
> 
> One way to make it feel a little more 'comfortable', might be to
> define the NP namespace to be something like:
> 
>   <http://www.w3.org/1999/xhtml/legacy#>
> 
> That would have the effect of placing all currently used predicates
> that have not adopted a proper CURIE format, into a legacy namespace.
> For example:
> 
>   <>
>     http://www.w3.org/1999/xhtml/legacy#openid.delegate> <xyz> .
> 
>   <>
>     http://www.w3.org/1999/xhtml/legacy#valueneverseenbefore> <xyz> .
> 
> It at least means that processors have a way of finding *all* metadata
> in a page, and of course it can easily be ignored if you don't want
> it. But it has the benefit of not trying to come up with hacks to make
> data in a variety of formats become 'first order' triples in RDFa.
> (Such as allowing a '.' as a prefix separator, and that kind of
> thing.)
> 
> Regards,
> 
> Mark
> 
> 
> On 12/09/2007, Ivan Herman <ivan@w3.org> wrote:
>> Mark,
>>
>> I thought about this again but, I must admit, I did not change my
>> opinion, even reading through the thread again...
>>
>> I was actually a bit surprised to read, in Shane's reply[1] that
>>
>> [[[
>> XHTML prohibits the introduction of other values that are not namespace
>> qualified into @rel / @rev (and @role).  That whole space is reserved.
>> The DTD does not enforce this because it is not possible to do so, but
>> conforming RDFa processors should  ignore any incorrect values IMHO.
>> ]]]
>>
>> Is this an XHTML1 or and XHTML2 feature? As far as XHTML1 is concerned,
>> I looked up in [2] and it says:
>>
>> [[[
>> Authors may use the following recognized link types, listed here with
>> their conventional interpretations. A LinkTypes value refers to a
>> space-separated list of link types. White space characters are not
>> permitted within link types.
>> ]]]
>>
>> I am not sure how to interpret the word 'may' in this respect, but my
>> impression is that this means the author can have his/her own, or use
>> these predefined ones.
>>
>> However... regardless of what the spec says, the reality out there is
>> that authos _do_ use @rel/@rev with values that are _not_ defined by
>> XHTML. The example of openid or DC are the obvious ones. What this also
>> means that the _second_ scheme you propose:-) would actually lead to new
>> and somewhat unexpected triples. (By the way, one of the
>> questions/comments asked by the DCMI guys in Singapore was exactly that:
>> they _hope_ that RDFa will not generate extra triples if they are
>> encoded using the dc.title trick...)
>>
>> Regardless of how it is described in the CURIE document, my feeling is that:
>>
>> - 1. We should obviously interpret 'a:b'
>> - 2. We should interpret 'b' as 'http://www.w3.org/1999/xhtml/b'
>> _provided_ that 'b' is one of the entries listed by the relevant XHTML
>> document (I fully agree with Shane on that one for the reasons above)
>> - 3. For ':b' we can either say that it behaves exactly like 'b' above,
>> or we introduce the usage of a default namespace. I am mildly in favour
>> _not_ to use the concept of default namespace at all. Yes, that would
>> invalidate your openid example, but I do not think that is so important
>> (and the DCMI use case is, as I said before, moot)
>>
>>
>>
>> Ivan
>>
>>
>>
>> [1] http://www.w3.org/mid/46E6BB62.9070204@aptest.com
>>
>>
>> Mark Birbeck wrote:
>>> Hello all,
>>>
>>> During the course of finishing off the Syntax document a couple of
>>> issues have popped up. I'll deal with them in separate threads.
>>>
>>> This thread relates specifically to the way that we ensure that
>>> mark-up like this yields the kind of triples we'd expect:
>>>
>>>   <link rel="next" href="o" />
>>>
>>> At the moment we say that some kind of preprocessor runs and that the
>>> mark-up above is 'mapped' to this:
>>>
>>>   <link rel="xh:next" href="o" />
>>>
>>> This is fine, and if we're happy with that, we can just leave it.
>>> However, there is another way to come at this, which I'll describe.
>>>
>>> Myself and Shane changed the CURIE definition recently so that *both*
>>> the prefix and the colon were optional:
>>>
>>>   [ [ prefix ] ':' ] reference
>>>
>>> This is so that all of the following are valid:
>>>
>>>   a:b
>>>   :b
>>>   b
>>>
>>> We did this because the second format is needed in N3 and Turtle-based
>>> languages such as SPARQL, whilst the third format is needed if we want
>>> to be able to handle legacy QNames.
>>>
>>> I was therefore looking more closely at what exactly these three
>>> different formats should mean since we don't have that defined clearly
>>> in our specification. The most obvious route for the second format is
>>> to say that it should use the current default namespace, making it
>>> consistent with SPARQL, etc.
>>>
>>> However, there is no general practice for non-prefixed QNames--in some
>>> situations the default namespace is used (such as in declarations of
>>> type in XML Schema), and in some situations it is explicitly ignored
>>> (such as when defining a template in XSLT). This means that we could
>>> choose to use the default namespace, or define some other rule like
>>> always using the XHTML namespace, or even the current value of [base].
>>>
>>> An interesting thing comes about though, if we were to choose to use
>>> the default namespace; returning to the syntax we had earlier:
>>>
>>>   <link rel="next" href="o" />
>>>
>>> we could obtain a predicate of 'xh:next' without having to do _any_
>>> preprocessing, but *only* if the default namespace was XHTML:
>>>
>>> <html xmlns="http://www.w3.org/1999/xhtml">
>>>   <head>
>>>     <title>...</title>
>>>     <link rel="next" href="o" />
>>>   </head>
>>>   ...
>>> </html>
>>>
>>> I like this approach since I think it gives future authors a lot of
>>> flexibility. It also, quite by accident, provides a way to remove the
>>> need for a lot of the preprocessing we have been discussing. For
>>> example, one could mark-up OpenID using a layout like this:
>>>
>>>    <link rel="openid.server" xmlns="http://openid.net/"
>>>       href="https://api.screenname.aol.com/auth/openidServer" />
>>>
>>>    <link rel="openid.delegate" xmlns="http://openid.net/"
>>>        href="http://openid.aol.com/wezfurlong" />
>>>
>>> Note that instead of worrying about trying to make "openid." into some
>>> kind of prefix, we simply use the full string as the reference.
>>>
>>> Anyway, there you have it. The choices seem to be:
>>>
>>>   * have a preprocessing step to get at 'legacy' properties and
>>> short-forms, such
>>>     as xh:next. In this case we'd still need to say what unprefixed CURIEs mean,
>>>     but wherever we choose would make no difference to the preprocessing step;
>>>     they could be in the default namespace, the current document, or some explit
>>>     namespace;
>>>
>>>   * or, we say that CURIEs with no prefix--with or without the
>>> colon--use the default
>>>     namespace, and then leverage this to cope with some of the legacy properties
>>>     like 'xh:next' and 'openid:openid.delegate' _without_ the need for
>>> a preprocesing
>>>     step.
>>>
>>> Myself, I can go either way; I'd prefer the second solution, since I
>>> think it would be quite neat if we only used the preprocessing step
>>> when it is really necessary. This is because although the
>>> preprocessing seems pretty benign, we've never really discussed things
>>> like the fact that the preprocessor must operate across all of the
>>> attributes, in a consistent way. For example, the value of 'next'
>>> would need mapping in both @rel and @about, for the following
>>> statements to work:
>>>
>>>   <html xmlns:skos="http://www.w3.org/2004/02/skos/core#">
>>>     <head>
>>>       <link rel="next" href="o" />
>>>       .
>>>       .
>>>       .
>>>       <div about="[next]" instanceof="skos:Concept">
>>>         <span property="skos:prefLabel">Next</span>
>>>         <div property="skos:definitionl">
>>>           Refers to the next document in a linear sequence of documents. User
>>>           agents may choose to preload the "next" document, to reduce the
>>>           perceived load time.
>>>         </div>
>>>       </div>
>>>
>>> However, if the CURIEs were using the default namespace you can see
>>> that this mark-up would 'just work'.
>>>
>>> Your thoughts and votes please. :)
>>>
>>> Regards,
>>>
>>> Mark
>>>
>> --
>>
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>
>>
> 
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 12 September 2007 10:12:54 UTC