Re: ISSUE-67 (Core - Henri Sivonen): RDFa Core 1.1 LC comments from Henri Sivonen [LC Comment - RDFa Core 1.1] from Toby Inkster on 2010-12-09 (public-rdfa-wg@w3.org from December 2010)

From: Toby Inkster <tai@g5n.co.uk>
Date: Thu, 9 Dec 2010 00:18:21 +0000
To: RDFa Working Group WG <public-rdfa-wg@w3.org>
Cc: sysbot+tracker@w3.org
Message-ID: <20101209001821.1fe146b2@miranda.g5n.co.uk>
On Tue, 07 Dec 2010 15:25:53 +0000
RDFa Working Group Issue Tracker <sysbot+tracker@w3.org> wrote:

> ISSUE-67 (Core - Henri Sivonen): RDFa Core 1.1 LC comments from Henri
> Sivonen [LC Comment - RDFa Core 1.1]

This does not constitute an official WG reply, but just a few of my
jumbled thoughts...

> To be Compatible with Existing Content, RDFa 1.1 doesn't need to be
> backwards compatible in the sense of parsing the same triples out of
> any valid RDFa 1.0 input as RDFa 1.0. Instead, it needs only to
> produce the right triples for the content that's already out there.
> Thus, Compatibility with Existing Content could be mostly achieved by
> performing by hard-coding the meanings of the common prefixes used in
> deployed content that purports to use RDFa.

While I can see the advantages of this, I do not think this would work.
There are plenty of CURIE prefixes which are customarily used as
abbreviations for different URI prefixes. While many of them might only
occur in very small sets of existing RDFa content, two spring to mind
that seem very common:

  * "dc" which sometimes refers to http://purl.org/dc/elements/1.1/
    and at other times http://purl.org/dc/terms/. (Occasionally also
    the /1.0/ version but that is vanishingly rare.) While both are
    versions of the same vocabulary, the differences between them were
    considered great enough to assign them different URIs. Conflating
    the two could cause potential harm.

  * "v" is often used to refer to the W3C's vCard vocab, but also
    to Google's Rich Snippets vocab. There is a small overlap between
    these two, but they're substantially different.

>  * It seems questionable that formsplayer.com (site of a product that
> one of the Editors has a commercial interest in) is used in an
> example.

Agreed. I imagine that this was just copied and pasted from an example
in one of the tutorials or demonstrations Mark has published, but it
doesn't look good to be in the spec.

>  * The Creative Commons license example in section 2.2 uses the
> anti-pattern of saying "a Creative Commons license" (instead of
> saying which one of the numerous licenses) in the human-readable
> prose.

+1

>  * I reiterate my previous comment that prefix-based indirection
> confuses authors and complicates implementation. Please use absolute
> URLs only instead of CURIEs. I'm not going to elaborate on this
> point, because I realize that the WG isn't going to change this.

I disagree that this is inherently confusing for authors. In other
technologies indirection (not prefix-based I grant you) is positively
encouraged.

Authors for example are typically advised to use:

	<style type="text/css">
		.important { color: red; }
	</style>
	<span class="important">Red!</span>

Rather than:

	<span style="color:red">Red</span>

The latter is simpler for new authors to grok - they don't have to
learn about the tangled webs of CSS inheritance (which changed BTW
between CSS 1 and 2). But the former does end up as a more flexible and
efficient way of working once you're used to it.

Something I think we should consider is moving the full URI example to
before the CURIE examples in section 2 - right now CURIEs are shown
first. That would make it clearer that CURIEs are just syntactic sugar,
and not meaningful in their own right.

>  * Loading prefix definitions from an external file seems to make
> RDFa brittle in case the external file can't be loaded. Also,
> blocking RDFa processing in order to do IO to fetch the prefix
> definitions complicates implementation.

My initial stance on external profiles was much the same. And I agree
that these are valid concerns. I was swayed by the rest of the WG
though - profiles do seem overall beneficial for RDFa, even if they do
have their downsides.

Performance problems can be offset by performing profile fetches in
parallel with each other where multiples are provided on a single
element, and by using in-memory caches of profiles, and even
hard-coding common ones. In my own RDFa parser I've not noticed
significant performance issues for those profiles I've hard-coded.

As far as brittleness is concerned, while in theory I'm with you, in
practice the web relies on linking to external data sources all over
the place. If a flickr page was marked up in RDFa using a profile, we
should probably be more concerned over the brittleness of <img src>
than we should over the brittleness of the RDFa profile.

It may be useful to add an informative note though that people who are
more than averagely concerned about the longevity of their data may wish
to avoid using profiles.

>  * (This is a general RDF problem but...) It seems author-hostile to
> require authors to specify the datatype of e.g. date literals instead
> of making the datatype of a property a characteristic of the property
> in the vocabulary/ontology.

This is a problem to be addressed at a different level of the RDF
stack. Rumour has it that a new RDF Working Group will be formed, so
you could try addressing this concern to them. There are already
solutions for this using RIF, but that's a rather heavy-weight solution
to a simple problem.

>  * It seems unfortunate to use XML Schema Datatype as an example
> considering how much weird variability XML Schema Datatypes allow.

xsd datatypes are incredibly commonly used in practise though. In much
RDF data, those (and rdf:XMLLiteral) will be the only datatypes you see.

>  * Under 4.1 the statement about whitespace seems to say that authors
> should assume non-conforming processors.

Hmmm... I hadn't noticed this paragraph before. Does anyone have any
specific examples to justify this?

	"However, it may be the case that the architecture in which a
	 processor operates does not make all white space available." 

>  * xmlns:prefix is marked as an optional feature. Please remove the
> feature altogether, because xmlns:prefix parses differently in
> text/html and application/xhtml+xml which are the media types most
> likely to be used to transfer RDFa.

I've not heard of anyone else who has found this feature technically
problematic. (It may be tricky in XSLT, but then again, most things are
tricky in XSLT.)

My RDFa implementation, Ivan's implementation and Damien Steer's (not
sure if he subscribes to this list) all make use of HTML5 parsers for
text/html content. When I switched to making use of an HTML5 parser I
didn't need to change my xmlns-handling code from how it had worked
under XHTML. I'm not aware of any problems suffered by Ivan of Damien.

Personally I'd categorise any unnecessary differences between HTML and
XHTML with regards to how syntax is parsed into a DOM as a bug in HTML5.

-1 to removing the feature altogether as we're chartered to maintain
compatibility with RDFa 1.0. However I'd be in favour of deprecating it
in order to leave a path to an xmlns-free version of RDFa open for a
future revision of the spec.

>  * It's weird that the prefix attribute requires a single space
> between the colon following the prefix and the URI but allows
> multiple spaces between the URI and the next prefix.

This probably needs adjusting slightly.

>  * If the spec contains rules for how to extract a set of prefix to
> URI mappings from the prefix attribute, the rules are hard to locate.

We did have a regex for it... why has this not made it into the spec?

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>
Received on Thursday, 9 December 2010 00:18:47 UTC