Re: ISSUE-143 (Prefixes too complicated): Use of prefixes is too complicated for a Web technology [RDFa 1.1 in HTML5]

On Wed, Oct 24, 2012 at 10:37 AM, Manu Sporny <msporny@digitalbazaar.com> wrote:
> On 10/24/12 12:30, RDFa Working Group Issue Tracker wrote:
>> ISSUE-143 (Prefixes too complicated): Use of prefixes is too
>> complicated for a Web technology [RDFa 1.1 in HTML5]
>>
>> http://www.w3.org/2010/02/rdfa/track/issues/143
>
> Hi Tab,
>
> The RDFa WG has officially recorded your formal objection for the
> HTML+RDFa 1.1 specification. We're tracking it in our issue tracker now.
> Could you please outline one or more proposals that would result in the
> withdrawal of your formal objection?

Yes.

As outlined in the original threads that introduced this issue, usage
in the wild shows that authors very commonly author "invalid" markup
which uses a common prefix without specifying the prefix.  Consumers
have evolved to recognize these common prefixes without the
declaration, and in some (most?) cases may actually ignore the
declaration entirely and simply always assume that the common prefix
translates to the common URL.

This presents us with several problems:

1. Authors appear to usually use only a handful of common prefixes,
and assign intrinsic meaning to these prefixes.  This suggests that
the indirection of prefixes may be too complex and unnecessary in the
first place, and we would be better served by just treating the
prefixes themselves as meaningful, rather than as a shortener for the
"real" meaningful things, the URLs.

2. The developers of consumers either *also* share this
misunderstanding, or just don't find it worthwhile to be correct when
they can do just as well in practice by treating the prefix as
meaningful.  This suggests that there may be a real interoperability
danger if an author *properly* declares a prefix where the prefix is a
common one, but the URL is to something other than what common use
points to - in "correct" consumers the document will be interpreted as
the author intended, but in many common consumers it will instead be
misinterpreted to be using the common vocabulary rather than what the
author intended.

3. In addition to the theoretical interop problem above, we have a
real interop problem already - many consumers will happily consume
pages that don't declare their prefix, as long as they use a
"well-known" prefix for it.  A conformant consumer, on the other hand,
would *not* do so, and would find no valid data on the pages.  You
have to reverse-engineer the web to find out which prefixes need to be
supported without a declaration, and what URL they should be bound to.
 This is an obvious failure mode of a standard.

There are two possible changes that would resolve my objection:

1. Discover and document the common prefixes in use, define them to
always be bound to the URL they're commonly bound to, even without an
actual declaration, and don't allow them to be bound to a URL other
than that predefined one.

2. Drop the indirection of prefixes entirely, and simply declare that
prefixes themselves are meaningful.  Predefine the common prefixes in
use.

Either would be acceptable, though I greatly prefer #2.  I argue that
#2 is perfectly acceptable for two reasons:

1. If people adopted the convention of simply using their domain name
(quite reasonable, I think, and likely more-or-less what people will
naturally use anyway), it would convey the exact same meaning and
uniqueness as a full URL, but with less typing - "http://foo.com" is
11 characters longer than "foo".

2. This does not harm the ability of generic consumers to process
data.  The URL that a prefix is bound to has no official meaning
anyway - it's solely a uniquifing mechanism - so generic consumers can
infer nothing from it in the general case.  They can do exactly as
much with a non-URL prefix.  When a consumer *does* know what the URL
means (it's a vocabularly it recognizes), it can do something special
(inferring defaults, etc.), but it can do the exact same thing when it
knows what a particular prefix means (which is what consumers do
today).

However, if #2 is for whatever reason unacceptable, #1 is the *bare
minimum* that needs to be done for the RDFa spec to document reality,
such that a consumer can follow the spec and reasonably expect to
correctly consume content already on the web.  If this is not done,
the RDFa spec is vastly less useful, and shouldn't be pursued.

~TJ

Received on Tuesday, 6 November 2012 00:17:20 UTC