Re: Short briefing/background doc't regarding RDFa, prefixes and HTML from Nathan on 2011-02-03 (www-tag@w3.org from February 2011)

From: Nathan <nathan@webr3.org>
Date: Thu, 03 Feb 2011 12:02:38 +0000
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
CC: www-tag@w3.org
Message-ID: <4D4A995E.8060906@webr3.org>
Hi Henry,

/Personal/ comments indented, note that I'm not speaking on behalf of 
the RDFa WG here.

Henry S. Thompson wrote:
> I've prepared a short introduction [1] to this issue in preparation
> for possible discussion at the upcoming TAG f2f.  Comments welcome.
> 
> [1] http://www.w3.org/2001/tag/doc/RDFa_HTML_prefix_issue.html


RDFa uses CURIEs extensively as the values of attributes. The 
non-empty prefixes in those CURIEs are interpreted relative to

   You've missed @profile which allows the importing of prefix and
   term definitions from an external profile (read: hosted on the web
   somewhere), also provides a default RDFa processor profile (meaning
   some prefixes are always supported and mapped to a specific uri),
   and also means that if a specified profile can't be dereferenced and
   read correctly (yes at parse / process time) then the element on
   which the @profile is declared, and all child elements, are skipped
   (no triples will be extracted from these elements).


Neither proposal contains any concrete evidence about the utility, or 
lack of it, of prefixes for authors, or the importance, or lack of it, 
of that utility to authors and consequently to uptake.

   Very roughly, RDFa is at the convergence point between XML/XHTML,
   RDF and HTML, in both the RDF and HTML worlds authors find it very
   beneficial to be able to write simple string tokens like "foaf:name"
   or "title" rather than using full URIs (which people often get wrong
   and which increases the bandwidth required to author, send and
   receive RDF(a)).

   HTML typically uses well defined simple tokens in @rel like "author"
   and "stylesheet" which have universal meaning (in html and Web
   Linking at least), and both authors and consumers treat these as
   simple tokens not paired to URIs.

   RDF uses web names (URIs) for properties / relations, so that they
   can be looked up (via dereferencing) by humans / machines to get
   a description of the property. This allows new vocabularies and
   properties to be created at any time, and for the range of things
   that can be described with RDF to be unbounded in a webized and
   extensible manner.

   RDF authors practically need to be able to write "foaf:name" in
   serializations and have it resolve to a full URI when being
   processed.

   Due to RDF's XML heritage qnames and the XML prefix approach was
   adopted, likewise RDFa+XHTML which has heritage in both RDF and
   XHTML, both of which take the same prefix approach.

   What authors require, is for a string token "foaf:name" to be
   paired correctly with a distinct URI.

   The prefix based method does this pairing by splitting "foaf:name"
   on the colon, resolving "foaf" to a string, and then concatenating
   "name" to that string.

   In other words, they don't get "foaf:name" paired to a URI, they
   get "foaf" paired to another string. This is the indirection being
   referred to by Hickson (I believe).

   The drawbacks of this approach are:
     - non existent properties can easily be referenced
       "foaf:foobarbaz" will still expand to a URI
     - missing prefix/xmlns declarations result in no URI being
       created. (example: people copy and paste RDFa snippets, missing
       the xmlns or prefix declaration, and consequently the copied
       RDFa doesn't produce RDF triples, or worse, incorrect triples.)

   Additionally, there could be an issue in that XML Namespaces are not
   URIs, they are a pair of (namespace, token), and that no
   specification defines the concatenation of namespace to token in
   order to produce a single string URI - I certainly can't find it
   anyway, Harry Halpin wrote this up well here:
     http://xml.coverpages.org/HHalpinXMLVS-Extreme.html

   The requirements for RDFa authors are:
     to be able to use "namespaced" terms, there are too many
     vocabularies to be able to refer to a property as simply
     "name" and authors require to be able to say "foaf:name"
     or "bar:name" in their RDFa.
     to be able to write "foaf:name" and have it paired to a URI.

   The requirements for HTML consumers are:
     to be able to treat all properties as simple string tokens

   The requirements for RDF(a) consumers are:
     to be able to pair tokens back up to the correct URIs.
     to reference those URIs by their own preferred string tokens.

   Related RDFa 1.1 features

   @vocab and terms
   RDFa 1.1 introduces a new method, whereby a URI can be defined on an
   element, and simple terms (not containing a colon) can be used as
   properties:
     <p vocab="http://example.org/foo#">
       <a href="baz.html" rel="bar">something</a>
     </p>
   The full URI of the property is produced by concatenating the term
   (in @rel "bar") to the vocab string:
      http://example.org/foo#bar

   This addresses all issues other than:
    1- the case where you have two properties which are both "name"
       ("foaf:name" and "foo:name")
    2- the case where authors want to "mash" a set of vocabularies
       together to have their own suite of terms (profiles allow this)

   Personally (stressing personally):
   From someone who understands the space quite well, I believe
   Hicksons concerns are valid, and that neither change proposal
   properly addresses the issues and requirements of both parties
   suitably. Hicksons proposal sticks with how things are in HTML.
   Inksters proposal sticks with how things are in RDF. Neither
   proposal merges the RDF and HTML worlds suitably / perfectly.
   The introduction of profiles (in the current form) only adds more
   issues.
   The introduction of @vocab and terms partially addresses the issues.
   Microdata caters for some of the HTML needs.
   RDFa caters for most of the RDF needs
   Neither is an optimal solution for the combined RDF+HTML+Metadata
   space.

   I do not have a change proposal at this time (and would be wary of
   introducing a third option in to the mix that conflicts with both
   the stances of the HTML and RDFa WGs), but feel that the
   combination of "vocab" and "profile" should be considered to address
   issue 2, and that curies/xmlns/prefixes could be abandoned in favour
   of terms which could include colons to address issue 1 and the long
   standing issues of CURIEs, xmlns and prefix based indirection.

Hope that helps a bit,

Best,

Nathan
Received on Thursday, 3 February 2011 12:03:43 UTC