[whatwg] Trying to work out the problems solved by RDFa from Benjamin Hawkes-Lewis on 2009-02-04 (public-whatwg-archive@w3.org from February 2009)

From: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
Date: Wed, 04 Feb 2009 08:13:05 +0000
Message-ID: <49894E11.6080304@googlemail.com>
On 4/2/09 03:15, Calogero Alex Baldacchino wrote:
> For what concerns XHTML, I disagree with the introduction of RDFa
> attribute into the basic namespace, and I wouldn't encourage the same in
> HTML5 spec. In first place, I think there is a possible conflict with
> respect to the "content" attribute semantics, because it now requires a
> different processing when used as an RDFa attribute and as a <meta>
> attribute associated to an "http-equiv" or a "name" value (for instance).

What conflict?

1. Attributes in XHTML can be distinguished by the elements they apply 
to as well as their name (e.g. the "name" attribute).

2. In XHTML+RDFa, "content" actually means the same thing on "meta" as 
on any other element in XHTML, which is presumably why they reused that 
attribute rather than introducing a new (better-named?) one:

http://www.w3.org/TR/rdfa-syntax/#rdfa-attributes

> In second place, it might be confusing for authors and lead to the
> misconception that every xhtml 1.x processor is also capable to process
> rdfa metadata (this is a limit of namespace + dtd/schema based
> modularization, because one can define the structure of a document, but
> not "orthogonal" behaviours requiring a specific support, not covered by
> the basic document model - such as collecting rdf triples declared by
> rdfa attributes, or calling a plugin and embedding its output - however,
> defining a proper namespace, maybe including its creation date somehow,
> may suggest what to expect from UAs).

There's no way to query a user agent about support for the 
specifications associated with a particular namespace, and namespaces 
are an unreliable guide to what user agents actually support, so I don't 
buy this concern.

Existing XHTML 1.x user agents don't always implement all the features 
of XHTML 1.x (e.g. exposing "longdesc" and "cite" to the user). HTML5 is 
introducing new elements and attributes into the same namespace, and 
authors would be wrong to assume that any XHTML-supporting browser will 
know what to do with them beyond inserting them into the DOM. XHTML 
modularization means you can't count on an XHTML user agent to implement 
any particular feature in the XHTML namespace.

A more reliable guide to what user agents support is looking at the list 
of supported features (as opposed to namespaces or modules or any other 
proxy) in their documentation.

> In third place, creating a different namespace would have resulted in a
> far easier introduction of RDFa attributes into other xml languages
> without having to change the language to host them (by the way, the
> xhtml namespace and a related prefix can be used, but this require a
> more specific support due to the "content" attribute issue, especially
> by UAs not supporting DTDs or schemata - that is, what should happen if
> an element were declared with both xhtml:name or xhtml:http-equiv,
> xhtml:content and xhtml:datatype, in an xml document accepting any
> attributes from external namespaces?

I cannot understand how RDFa attributes in a different namespace would 
be easier to reuse either in another language or a XML document where 
the host is not XHTML.

"content" and "datatype" mean the same on all elements, so your 
particular example seems like a non-problem to me - at least from the 
perspective of RDFa, which doesn't define processing for "name" or 
"http-equiv".

In so far as there is a problem, it's already a problem with 
bog-standard XHTML. How should <myml:bar xhtml:name="foo" 
xhtml:http-equiv="baz" xhtml:content="quux"> be processed?

> of course, this is solvable, but
> rdfa:content, rdfa:datatype and so on would make things easier, or at
> least _cleaner_ and less confusing for authors having to understand that
> an XML and RDF processor can/must support the xhtml namespace and its
> _whole_ semantics, not just dom-related structures, but limited to RDFa
> attributes, so that no <meta> or <object> or <link> can be used hoping
> their semantics is supported, despite the support for the xhtml
> namespace...).

An "XML and RDF processor" doesn't have to support XHTML or RDFA - XML 
and RDF are independent specifications.

A conforming XHTML+RDFa UA "user agent MUST support all of the features 
required in this specification. A conforming user agent must also 
support the User Agent conformance requirements as defined in XHTML 
Modularization [XHTMLMOD] section on "XHTML Family User Agent Conformance".

http://www.w3.org/TR/rdfa-syntax/#uaconf

Those further requirements can be read at:

http://www.w3.org/TR/xhtml-modularization/conformance.html#s_conform_user_agent

An XHTML+RDFa conforming user agent does not have to implement "meta", 
"object", or "link", and as a explained above, authors cannot assume 
support for particular features based on namespaces.

> Also there might have been fewer attributes, each one
> with a different semantic (assuming someone might not find useful to
> have a link with rel="stylesheet" representing a triple, for instance).

I don't follow. link with rel="stylesheet" _does_ represent information 
expressible as a triple, why would it be useful to pretend otherwise? 
And how would doing so make for fewer attributes?

> If there were a general agreement, a new element/attribute would be
> introduced as a result of a "bottom up" process (starting from
> experimentations) integrated with a "top down" community evaluation -
> for specific purposes, not generic machine exposure, I mean.

There is no general agreement to that AFAICT, and I doubt think using 
unstandardized elements or attributes or using data-* for public use 
would be good approaches to extending HTML: the former blocks potential 
extension points (e.g. "canvas") and the later pointlessly introduces 
the risk that a private use might be confused with a public one.

> (I'm not sure a generic machine data attribute - in general, not just
> referring to rdfa - would solve that, because each new occurrence of the
> problem might require a "brand new" datatype that only newer, updated
> UAs would understand (older ones would just parse the attribute and
> provide it as a string for further elaboration by a script, at most, but
> this might not be much better than using a data-* attribute for private
> script consumption), therefore, that wouldn't be necessarily different
> than creating a new appropriate attribute/element as needed and
> providing such new feature in newer, compliant UAs).

It would be very different in practice, because (like new "class" 
names), new "content" values wouldn't need to go through the W3C/WHATWG 
standards process.

That has a cost of course. You might end up with a worse design, 
especially if you don't go through a community like microformats. But 
that cost arguably isn't so bad when you're talking about embedding 
arbitrary data rather than features like "canvas" or "datagrid" that 
require new parsing, DOM APIs, and user interface from popular user 
agents. This cost appears to be acceptable in the case of microformat 
"class" names, for example. Now, you could already embed data with a bad 
design using HTML5's other extension mechanisms (e.g. "script"). It's 
just that microformats choose to abuse other attributes ("title") 
instead, partly because they allow you to wrap some human-readable 
content with its machine-readable equivalent (i.e. it's a more 
"markup-like" way of doing things). My feeling is that the cost of bad 
designs for embedded data is (1) unavoidable and (2) less than the 
benefits of avoiding misuse of other (X)HTML features for embedding data.

--
Benjamin Hawkes-Lewis
Received on Wednesday, 4 February 2009 00:13:05 UTC