Re: HTML+RDFa (3.1 Document Conformance) from Dr. Olaf Hoffmann on 2009-09-17 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
Date: Thu, 17 Sep 2009 12:30:20 +0200
To: public-rdf-in-xhtml-tf@w3.org
Message-Id: <200909171230.20728.Dr.O.Hoffmann@gmx.de>
Manu Sporny:

>Dr. Olaf Hoffmann wrote:
>> I think, it would be more consistent to use
>> something like 'HTML5+RDFa1.0' as a value
>> for the HTML5 superset to avoid confusion, because
>> the semantics, content models and element and attribute
>> collection of HTML5 are different (and partly incompatible) from
>> XHTML1.1 and none of them is a superset of the other,
>> therefore another version indication for the HTML5 variant
>> seems to be essential to be able to distinguish.
>
>Hi Olaf,
>
>This is one of the items that will be covered on one of the upcoming
>RDFa telecons, so the language in that spec is preliminary and hasn't
>been discussed in detail.
>
>I was attempting to find a balance between backwards compatibility and
>semantic accuracy when writing that section.
>
>My reading is that the WHAT WG and HTML WG do not want to be able to
>distinguish between HTML4 and HTML5, as HTML5 is meant to be a
>backwards-compatible, natural progression to HTML4.

I agree with this observation basically, but as far as I understand this,
they do not care about XHTML1.1, only about HTML4 and XHTML1.0.
And obviously they failed to get really semantically backwards 
compatible drafts, because meanings and content models of 
several elements have been changed in the current drafts.
In some cases this might have good reasons and combined with
new elements and attributes is finally a big progress, however 
surely not compatible with HTML4 or XHTML1.x on a semantical
and structural level.

Personally I think it is a major problem for careful authors, because
they cannot indicate the (X)HTML-version for 'HTML5', they cannot 
use it at all, just because it is not known how to indicate it. 
For HTML4, XHTML1.0, XHTML1.1 or XHTML+RDFa 1.0 they can 
indicate what they use (partly with strings and constructions, they
do not really understand, but with a clear relation to some 
specification) and therefore they are practically usable.

With a version indication for HTML5+RDFa1.0 this becomes usable
too, what is especially even more important for the RDFa variant, 
because with authors using this, the probability increases, that they
care, that there is a well defined relation to a specification, what the
elements mean. They can use it in a similar way as they use RDF(a) to
indicate a relation to other specification to get a well defined 
meaning of there constructions. 

Semantical and author issues are not very relevant for the current 
HTML5 drafts at all, therefore a version indication seems to be
irrelevant for the HTML5-tag-soup variant for many people. 
But authors using RDFa at all can be expected to care about 
semantics and well defined relations. Therefore such a relation 
indication becomes much more relevant for the HTML5+RDFa variant 
than for the HTML5-tag-soup variant.


>Several people have 
>asserted that HTML5 shouldn't be versioned, so if we were to put "HTML5"
>in the @version attribute, there would probably be push-back.

Well authors, who do not care about a defined relation to the meaning
of elements can simply not use the version attribute. This will happen
anyway, independent on the question, if some RDFa appears within
the document by copy and paste techniques or intentionally ;o)
Authors using RDF(a), because they understand the mechanism 
maybe want to care about the version they use and can do this with 
this extension.
If this RDFa draft is designed to be an extension to the HTML5 draft
concerning semantics and readability for simple programs, it looks
like a progress/extension, that a simple program can be enabled to 
identify the HTML version, currently used in the document too.


>
>The first draft of the HTML+RDFa spec actually specified only one
>version: "XHTML+RDFa 1.0", which seemed inaccurate when the version
>attribute was specified in a non-XML mode document (HTML5 vs. XHTML5).
>

It is more inaccurate, because this is already preserved to indicate the
relation to XHTML1.1, maybe the string for this variant could have been
better 'XHTML1.1 + RDFa 1.0', but this is now a little bit late to change.
As far as I understand the XHTML+RDFa 1.0 recommendation
version="XHTML+RDFa 1.0" simply identifies XHTML1.1 + RDFa 1.0.
Currently a program can for example use it to validate the document
and to indicate errors and so on. Everything in (X)HTML5 not fitting
to XHTML1.1 + RDFa 1.0 should be indicated as wrong, for example
new elements introduced in HTML5. This can only be avoided with
another string for (X)HTML5+RDFa.

>So, we could have two acceptable @version attribute values for RDFa (one
>for non-XML mode and another for XML mode):
>
>version="HTML+RDFa 1.0"
>version="XHTML+RDFa 1.0"
>
>This is a bit annoying because we really only care about the "RDFa 1.0"
>part of the version string. 

Another option could be an URI (with fragment identifier) as a value
of the version attribute pointing to the definition of the used variant -
or a whitespace separated list of such pointers, indicating all used
versions and formats, then one does not always have to update the
RDFa extension, if a new format or version appears ...
This fits almost to the URI/CURIE approach of RDFa itself ;o)


>So, to be backwards-compatible with 
>XHTML+RDFa 1.0 and to provide some degree of future-proofing, we could
>say that the @version string should contain the text "RDFa 1.0" in it
>somewhere. The following regular expression could be used to detect the
>string in the @version attribute in any language employing RDFa:
>
>\+?RDFa 1\.0(\+|\+.*|)$
>
>Basically, if the string "RDFa 1.0" exists in the @version attribute
>(either surrounded by '+' characters or not), then the document contains
>RDFa 1.0 syntax. This allows people to do stuff like:
>
>version="SVGTiny 1.2+RDFa 1.0" or

This is not necessary, because SVGT1.2 already has the 
necessary attributes defined. One simply can use it - would
be another good approach for HTML5, especially because
it is a not modularised version, different from XHTML1.1.

SVGT1.2 defines only:
version = "1.0" | "1.1" | "1.2"
and
baseProfile = "none" | "full" | "basic" | "tiny"
RDFa attributes are already covered by version="1.2" baseProfile="tiny".
version="SVGTiny 1.2+RDFa 1.0" would be an unsupported value,
what means almost the same, as if the attribute had not been 
specified at all.

>version="HTML+RDFa 1.0+CoolLanguageExtension 2.1"

>This is beneficial because we do want RDFa to be easily mixed-in with
>future element/attribute-based languages.

Then the URI/CURIE-list approach for the version attribute looks
perfect to do this. Typically URIs of specifications are unique and
persistent, therefore there is a meaningful (bijective) relation between 
the string and the definitions.


>> By the way, 2.1 explains, that the document structure
>> can be changed. Maybe it could be useful to add a
>> note for authors, that they can avoid this, if they note
>> implied elements and other HTML artefacts explicitly
>> to ensure, that such a modification does not change
>> their intents...
>
>I'm not quite sure I understand what you mean completely, so the
>following may not address your concern.
>
>AFAIK, there is currently no way to signal that a document's elements
>shouldn't be re-arranged by an HTML5 parser. Henri, Ian, is this correct?

I think, there is no way to prevent this, but if authors do not use wrong
nested elements, do not leave out elements, which are added then to
DOM automatically (appears for example in tables, I think with tbody,
what caused already some confusion with CSS), they can avoid, that the 
parser manipulates  something in practice. This manipulation does not
happen randomly, it has defined reasons, one can avoid.
If authors do something stupid, such a DOM manipulation due to
implied elements or error fixing may change the intended structure
to extract the RDFa issues.
At least an informational note/hint to authors might help, that they are
more careful to avoid such nonsense, because they can do this, if
they want and if there is a danger, that something is misinterpreted, 
if such RDFa structures are extracted by a simple program.
And from a HTML5-tag-soup-parser they get no hint, that something
went wrong or that the structure had to be manipulated to fix nonsense
or to add implied elements before something is extracted.



>Since RDFa usually sits on top of the DOM layer in XHTML and HTML, or is
>provided a SAX-based interface to the document, the RDFa Processor
>doesn't know if the input stream was or wasn't modified by a document
>parser. In short, I don't think there is any way for us to say that
>authors can avoid document restructuring as that happens outside of the
>RDFa Processor's purview.
>
>I've added your primary concern to the wiki:
>
>http://rdfa.info/wiki/Html5-rdfa-wd-issues#3.1_Document_Conformance
>
>-- manu
Received on Thursday, 17 September 2009 10:42:15 UTC