Regarding @vocab and terms (ISSUE-129)

(Mail originally from Niklas, redirecting it to the mailing list with a new Issue header)

Hi all!

In applying RDFa 1.1 to various use cases lately (working with legal,
educational and library data, and tinkering with examples in the
wild), I have made some observations regarding @vocab and terms which
I need to address.


## The effect of @vocab on undefined terms ##

It should be noted that the following:

   <div vocab="http://schema.org/">
       <a rel="nofollow"
href="http://www.seo-blog.com/rel-nofollow.php">Nofollow in Google,
Yahoo and MSN</a>
   </div>

produces:

   <> <http://schema.org/nofollow> <http://www.seo-blog.com/rel-nofollow.php> .

To turn that triple off, you can clear the @vocab with:

   <div vocab="http://schema.org/">
       <a vocab="" rel="nofollow"
          href="http://www.seo-blog.com/rel-nofollow.php">Nofollow in
Google, Yahoo and MSN</a>
   </div>

Note also that this happens with e.g. 'stylesheet', which is not a
reserved term by default (and not for RDFa 1.1 in HTML5). So in
general, one should not use @vocab until in <body>. If you have more
demanding requirements, you should use the same pattern as above,
e.g.:

   <html vocab="http://example.org/ns#">
       <head>
           <link vocab="" rel="stylesheet" href="/style.css" />
           ... lots of link and meta elements relying on the parent @vocab ...
       </head>
   </html>

While I believe that we fully accept the above, I want to make it
abundantly clear. And we may want to explain this in the primer.

Perhaps some of this should also be noted in RDFa 1.1 Lite. Especially
the behaviour of @rel, since it is in effect, but not mentioned.

(By the way, is it fully understood that the terms defined in the
XHTML context do *not* apply in XHTML5? We must make it very clear
that the XHTML initial context is for 1.0 and 1.1, and *not* XHTML5.
That is a different host language, and it has the same limited set of
predefined terms as HTML (i.e. only the default rdfa initial
context).)


## Predefined terms ##

The following is more troubling.

I wonder whether an HTML author in general will understand that this:

   <div vocab="http://purl.org/dc/terms/">
       <a rel="license" href="/cc-by">CC-BY</a>
   </div>

actually produces:

   <> xhv:license </cc-by> .

and *not*:

   <> dc:license </cc-by> .

To work around that, one *have* to use:

   <a rel="dc:license" href="/cc-by">CC-BY</a>

One very clear indication that this is *not* fully understood can be
found in the RDFa 1.1 Primer itself! The last example in section "2.4
Setting a Default Vocabulary" reads:

   <p vocab="http://creativecommons.org/ns#">All content on this site
is licensed under
       <a property="license"
href="http://creativecommons.org/licenses/by/3.0/">
       a Creative Commons License</a>.</p>

, with the clear intent to produce cc:license, not xhv:license. Either
this example must be changed (along with the expectations that caused
it), or the Core rules for terms must.

> From a general design perspective, this effect of predefined terms in
conjunction with @vocab is problematic. It's more complex for authors
to remember that some terms (even three) are *reserved* and are never
resolved against the active @vocab.

More crucially, these terms differ between host languages, at least
between HTML5 and XHTML 1.1. Note that in XHTML 1.1, one cannot use
BIBO exclusively with @vocab, since 'chapter' is a predefined term
there and thus must be written as 'bibo:chapter' if that's the intent.

Perhaps most authors and vocabulary publishers, e.g. Schema.org, are
well aware of and accept this fact. However, if this makes anybody
else but me concerned, please consider the following suggestion.


## Changing the power of @vocab ##

The term mechanism can be changed so that terms only provide
*defaults*, used if *no* @vocab is active.

It would make @vocab behave predictably, by uniformly capturing any
regular terms. It would still ensure that by default, if no @vocab is
used, @rel="license" means xhv:license.

In practise, the change would be in section "7.4.3 General Use of
Terms in Attributes". The specific rules today are:

[[[
* Check if the term matches an item in the list of local term
mappings. First compare against the list case-sensitively, and if
there is no match then compare case-insensitively. If there is a
match, use the associated IRI.
* If there is a local default vocabulary the IRI is obtained by
concatenating that value and the term.
* If there is no local default vocabulary, the term has no associated
IRI and must be ignored.
]]]

I propose to change this to:

[[[
* If there is a local default vocabulary, the IRI is obtained by
concatenating that value and the term.
* Otherwise, check if the term matches an item in the list of local
term mappings. First compare against the list case-sensitively, and if
there is no match then compare case-insensitively. If there is a
match, use the associated IRI.
* Otherwise, the term has no associated IRI and must be ignored.
]]]

This would not affect any currently deployed RDFa 1.0 of course (since
@vocab is new). It would *only* affect any currently used RDFa 1.1
where @vocab is used *and* the markup relies on predefined terms
within those regions. Considering the examples given above, I would
say that all bets are off regarding what people in general would
expect from such markup. This suggestion attempts to simplify all
these expectations.


## Reserved terms left behind ##

Now, this does reduce the power of the term mechanism substantially.
It does so for the sake of authors and for simplicity, since it makes
@vocab work uniformly.

*If*, in the future, direct mixing of vocabularies were to become
desirable, the term definition mechanism can very well be extended
without breaking backwards compatibility. This would be done by adding
a means for declaring certain terms as "reserved" (say by typing them
as rdfa:ReservedTermMapping). Those, and only those, would behave as
all terms do right now, i.e. being fully reserved, regardless of
@vocab.

But I don't think we should do that now. We backed away from the term
mechanism in general, and mostly use it to preserve backwards
compatibility. For mixing vocabularies, we rely on the use of CURIEs,
@vocab and vocabulary expansion. Which is good.


What do you think?

Kind regards,
Niklas

Received on Monday, 13 February 2012 08:27:40 UTC