Re: RDFa in HTML issues wiki page created

HI Julian,

> So, as the lists of reserved terms are different for HTML4, XHTML+RDF, and
> HTML5, is RDFa extraction supposed to return different results? Is it, for
> that matter, supposed to check media type and doc type of the containing
> element before generating triples for non-CURIE values?

The 'architecture' is that the RDFa parser 'receives' values from the
CURIE-processing step. So if you want to create a generic RDFa parser,
the 'pluggable' language-specific part would go into the
CURIE-processing code.

(And all that involves is loading a list of predefined tokens at the
beginning of processing -- see below.)


>> Also, although the RDFa spec doesn't currently use the CURIE spec,
>> they are in sync, and you'll see that the latter also specifically
>> allows for predefined tokens to be processed first, before doing
>> normal CURIE-processing.
>
> ...which makes it impossible to process IRIs as-is (without workarounds), as
> they use the colon character.

I think 'workaround' is the wrong term.

As with anything there is often ambiguity without context.

Is the following a URL?:

  http://www.google.com

Well, yes it is here:

  <http://www.google.com> a foaf:Document .

but no it isn't here:

  <> foaf:name "http://www.google.com" .

Is the use of quotes and greater than/less than symbols a
'workaround'? Of course not; it's the necessary context that helps to
disambiguate.

There is no universal rule that will always work which would allow you
to differentiate CURIEs from URIs, but in context you can tell. You
could do that at the level of the attribute -- you can say that 'this
attribute only takes CURIEs, and no URIs', or 'this attribute takes
both, but CURIEs must have square brackets around them' -- but you
could of course come up with some other technique that fits the
language that is using CURIEs.


> Allowing both short names and longer names based on URIs is indeed a good
> compromise, which I support. But I'm not convinced that per-language
> registries for short names are a good idea.

But that isn't the end of the story -- that's just the foundation.

If you look at it from a processor point of view, it might be a little
easier to see.

Imagine an RDFa processor starts up, and the first thing it does is to
load a CURIE processor, which it then initialises with a set of
predefined tokens. In HTML-based languages that list might be 'next',
'prev' and so on.

Now, imagine also during the course of processing, some other list of
tokens is added. These might come from @profile, or some other
extension mechanism yet to be invented; tokens loaded in this way
would override the language-specific tokens.

In this way we've allowed the host language to define a few defaults
that are useful in their domain, but we've also left open the
possibility that we can provide a mechanism for adding more tokens.

The domain-specific tokens are important, because through them we
maintained backwards-compatibility with @rel/@rev in HTML/XHTML. But
by making it part of the more generic mechanism, we haven't stifled
the possibilities for other solutions.


> Any new syntax will have to compete for followers with existing systems like
> RDFa, DC-HTML, RDFa, or (gasp) "microdata". So I personally think it makes
> more sense to get RDFa specified for HTML the way it is (using xmlns-based
> prefixes).

I'm not talking about a new syntax for defining a prefix, but about
how to provide additional _tokens_. So it's not really about
'competing'.

I want be able to do something along these lines:

  <html
   token="
    Person=http://xmlns.com/foaf/0.1/Person
    title=http://xmlns.com/foaf/0.1/title
    fn=http://xmlns.com/foaf/0.1/name
   "
  >
    <head>
      <title>Ivan's homepage</title>
    </head>
    <body>
      <div about="http://www.ivan-herman.net/me"
       typeof="Person"
      >
        <span property="title">Dr</span>
        <span property="fn">Ivan Herman</span>
      </div>
    </body>
  </html>

Of course the tokens would ideally be in an external file, which would
mean that interest-groups could create and share tokens without
needing the central registry that is generally discussed.

Regards,

Mark

-- 
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)

Received on Tuesday, 26 May 2009 12:17:13 UTC