Re: Official RDFa Response: ISSUE-90: CURIEorURI Value Space Collisions from Niklas Lindström on 2011-05-31 (public-rdfa-wg@w3.org from May 2011)

From: Niklas Lindström <lindstream@gmail.com>
Date: Tue, 31 May 2011 17:19:12 +0200
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <BANLkTi=s=km9e2DpOLShKf_5xKpROCRumA@mail.gmail.com>
Hi Manu, all!


2011/5/28 Manu Sporny <msporny@digitalbazaar.com>:
> Niklas,
>
> This is an official response from the RDF Web Applications Working
> Group, previously the RDFa Working Group, on your 2nd Last Call comment
> on RDFa Core 1.1. The issue has been tracked here:
>
> ISSUE-90: CURIEorURI Value Space Collisions
> http://www.w3.org/2010/02/rdfa/track/issues/90
>
> Your comment raised the point that the value-space of a CURIE is
> open-ended enough that a future Internet Protocol Scheme may conflict
> with a CURIE prefix chosen before the scheme was realized. The best
> real-world example is the 'http' vocabulary, which can be found at this URL:
>
> http://www.w3.org/2006/http#
>
> You assert that what the author intended is ambiguous when they specify
> the following text:
>
> http://example.org/
>
> Do they mean:
>
> http://example.org/
>
> or do they mean:
>
> http://www.w3.org/2006/http#//example.org/
>
> The second one is probably not what they intended, but your suggestion
> was that we may want to limit what is allowed in a CURIE to something
> more strict, such as the regex: "[A-Za-z0-9_-.]+". Doing so would ensure
> that the URI is the one that would be chosen by the RDFa Processor in
> the example above.


I thank the working group for reviewing my issue!

However, it seems I haven't quite gotten my point through. I didn't
propose to limit the lexical value space of CURIEs in general. It is
only the construct SafeCURIEorCURIEorURI I am concerned about. And
that is a new construct, hitherto used *only* in RDFa 1.1.

See my comments below. (I also elaborated on this in my reply [1]
during the discussion in April.)


> We discussed this at length and found the following:
>
> 1. Limiting the CURIE to a regex arbitrarily limits the allow-able
>   characters such that other use cases cannot be supported, such as
>   CURIE references containing "@" or ":" or any internationalized
>   character in them.

As said above, I didn't propose to reduce the current lexical space of
CURIEs everywhere, only in CURIEorSafeCURIEorURI (and not necessarily
restricted with a regex; but e.g. redefined as QNameOrSafeCURIEorURI).
If one wants to use complex CURIEs there, e.g. "dpb:resource/Concept",
SafeCURIEs would work fine, just as before.


> 2. Limiting the character set still doesn't prevent false positives
>   for very simple schemes like SIP. For example, to prevent a
>   false positive for "sip:niklas@example.org", one would have to
>   limit the "@" in all CURIEs. However, there may be some vocabularies
>   that want to utilize the "@" sign. That is, we may think we know
>   which characters are important now, but all that must happen for
>   this approach to fail is that an Internet Scheme would appear that
>   uses a character in the list of acceptable characters - for example,
>   "-" or "."

Again, my issue only concerns the collision-prone *mixing* of CURIEs
and URIs, where URIs are the norm. I do not find TERMorCURIEorAbsURI
nearly as problematic (as used for e.g. @rel and @property). That is
simply because there CURIEs are the norm and AbsURIs the exception
(since in RDFa 1.0, only CURIEs where allowed).


> 3. CURIEs are not allowed in @href and @src, so the likely-hood that
>   this will become a practical concern is lessened.

I disagree. Since @about and @resource are fundamental to RDFa (indeed
needed in certain places), I don't see how the collision risk is
substantially reduced.


> 4. There is no ambiguity as far as an RDFa Processor is concerned.
>   For example, if an "http" prefix is defined, then anything that
>   accepts a CURIE would expand the "http" prefix. That is, if the
>   prefix is defined, it is a CURIE. If it is not defined, it is an
>   IRI. Authors will discover this very quickly and vocabulary
>   maintainers are advised to avoid naming their vocabularies after
>   Internet Protocol Schemes.

Certainly. But my issue didn't concern processing ambiguities. There's
a conflation of value spaces, which as you point out authors have to
be aware of and avoid. The need for the proposed advice is what
concerns me, since vocabulary prefix naming and URI schemes evolve
completely independent of each other!

I basically don't see how this shorthand feature can be warranted when
it leads to this conflation. Especially since SafeCURIEs have been
there for this use case all along. (Now in RDFa 1.1, it seems
SafeCURIEs are effectively a legacy.)

I understand that e.g. the suggested QNameOrSafeCURIEorURI is more
complex to read though. This since the tokens in the "local name"
determines if prefix expansion would be triggered. Given the choice,
I'd probably prefer to revert to just SafeCURIEorURI for @about and
@resource!

Requiring users who want to use CURIEs in @about and @resource two
surround them with "[" and "]" just seems more wise to me than making
any prefix declared, perhaps in a profile beyond the author's control,
to expand in every subject and object supplied via these attributes,
for every RDF 1.1 document created.


> 5. It would create a backwards-incompatible change to RDFa. The
>   Working Group is not chartered to make this sort of change to
>   RDFa.

On the contrary. The SafeCURIEorCURIEorURI construct did not exist in
RDFa 1.1. In fact, the current situation does actually introduce a
backwards-incompatible change. In RDFa 1.0, no prefix defined will be
expanded for values in @about and @resource starting with it.


> In the end the group didn't think that limiting the value space of
> CURIEs would actually solve the problem you are concerned about. It may
> lessen the problem, theoretically, but nobody has demonstrated where
> this leads to a critical real-world problem with RDFa. In the worst
> case, the vocabulary prefix is changed in the RDFa document. In the end
> the Working Group decided to not place additional limitations on the
> value-space for CURIEs for the reasons listed above.

I am quite aware that by just avoiding the definition of a handful of
common URI schemes (e.g. http, https, possibly ftp, mailto, sip), and
providing nothing new comes along, this is not much of a problem
today. But CURIEorSafeCURIEorURI is  an issue of conflation (of
prefixes and schemes), and I want to emphasize that.

I can only reiterate the risk of potential rise in popularity of some
other protocol than http(s) amongst linked data users, in combination
with definition of prefixes in e.g. profiles or publishing systems
beyond the author's immediate control. It is a small but complex
problem which could cause a lot of *dynamically published* RDFa to
become problematic in a year, or 5, or 10. And this might be hard to
detect, unless publishers monitor all protocols and prefixes used in
their publishing systems. If RDFa 1.1 is published with
SafeCURIEorCURIEorURI as it is now, this would be very hard to
rectify.

It only takes for *one* prefix in the (decentrally) growing list of
common prefixes to become a popular protocol for this to become a real
problem.

As I also said in [1], I am also worried that this practice may be
carelessly adopted in other scenarios. Particularly RDF APIs, where
one may want to define lots of prefixes for authors' convenience, and
where it may very well be desirable to make statements about resources
identified with protocols other than http. (And we've already found
cases where "http" is defined as a prefix in code libraries.
Furthermore, it's not uncommon for prefixes to be automatically
generated.)


> We discussed it during two telecons:
>
> http://www.w3.org/2010/02/rdfa/meetings/2011-05-05#CURIEorURI_Value_Space_Collisions
> http://www.w3.org/2010/02/rdfa/meetings/2011-05-19#ISSUE__2d_90__3a__CURIEorURI_Value_Space_Collisions
>
> The decision is recorded here:
>
> http://www.w3.org/2010/02/rdfa/meetings/2011-05-19#resolution_2
>
> Since this is an official Last Call response, could you please respond
> as soon as possible and let us know whether or not the Working Group has
> considered your request and responded accordingly. Please let us know if
> this is an acceptable outcome and whether you can live with the
> decision. Thank you for reviewing the RDFa specification and sending in
> your comments. :)

I could live with it, if it comes to that. :) But I cannot really agree.

Have you discussed this combination of CURIEorURI in e.g. the RDF
working group, or the RDF community in general? I'd be somewhat
surprised if I'm the only one feeling uneasy about it..

I know this might seem like an innocent issue with little real world
problems, but I hope I've made my view clearer of the potential risks
and difficulties of managing those. I genuinely wish for RDFa 1.1 to
succeed, and I have the utmost respect for your work on it!

Best regards,
Niklas
--
<http://neverspace.net/>


[1]: http://lists.w3.org/Archives/Public/public-rdfa-wg/2011Apr/0095.html



> -- manu
>
> --
> Manu Sporny (skype: msporny, twitter: manusporny)
> President/CEO - Digital Bazaar, Inc.
> blog: PaySwarm Developer Tools and Demo Released
> http://digitalbazaar.com/2011/05/05/payswarm-sandbox/
>
>
Received on Tuesday, 31 May 2011 15:27:49 UTC