Re: RDFa in HTML 5

Philip Taylor wrote:
> Seeing as people are implementing RDFa parsers for text/html, I guess 
> it would be good to have a specification that says how they should work.
>
> http://www3.aptest.com/standards/rdfa-html/ doesn't answer the 
> questions I'd want answered (e.g. in 
> http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0102.html), 
> and HTML 4 seems to make it impossible to express an answer. Some 
> existing RDFa-in-text/html parsers are based on document models that 
> closely match the DOM-like model used by HTML 5 (e.g. browser-based JS 
> implementations, and some Python ones using an html5lib DOM, and maybe 
> others), and the model used by HTML 5 can be implemented in a variety 
> of other ways (e.g. unbuffered SAX) so it's not too restrictive, and 
> so it seems like the most useful way to define RDFa-in-text/html 
> processing.
>
> I've not seen anyone else working on this, so I started writing a 
> rough draft at <http://philip.html5.org/docs/rdfa/>. Some of it is 
> copied from the RDFa-in-XHTML specification, and just tweaked to use 
> some new definitions and to share concepts (like base and lang) with 
> HTML 5 and to cope with text/html parsing (for xmlns:* attributes). 
> The CURIE definitions are new, since I didn't see any existing 
> document that defined them in an appropriate way.
>
> There are several unresolved design issues (e.g. handling of 
> case-sensitivity, use of xmlns:* vs other mechanisms that cause fewer 
> problems, etc) - I haven't intended to make any decisions on such 
> issues, I've just attempted to define the behaviour with sufficient 
> detail that it should make those issues visible.
>
> The current draft is far from complete or correct, but it shows 
> roughly the way I'd like to have things defined (and I hope it's 
> roughly the way that HTML5/WHATWG people would like it to be defined, 
> in order to support implementers and to be testable), and maybe it 
> could end up being useful for something, so I'm just throwing it out 
> here for discussion.
>
Philip and I started an email exchange because of some postings on 
Twitter.  I wanted to replicate the discussion here, with Philip's 
permission. Some is unimportant, but I wanted to preserve context. Note 
that these are from my perspective, so quoted material is from Philip, 
none quoted is mine.

First email from Philip and my reply:

Philip Taylor wrote:
> I saw some discussion on Twitter, so just to clarify what the 
> situation is (as far as I'm aware of it):
>
> I wrote the draft without having talked about it to anybody at all, 
> because I thought (and still think) it might lead to something useful, 
> and it seemed easier to just write something concrete rather than 
> discuss it first. I posted about it to public-html and 
> public-rdf-in-xhtml-tf, since that seems the easiest way to contact 
> people who might be interested. A few people from the RDF side replied 
> privately, including Manu (expressing a desire to discuss things 
> further). Sam replied in public. That's about all there is.
>
> Re "My input was not sought"/"This wasn't a party I was invited to" - 
> I haven't sought input from anybody (except the public-* lists). If 
> this triggered some internal conversation in the RDFa world that you 
> were excluded from, I know nothing about it. If I continue working on 
> this, I'd be happy to hear technical comments about the content from 
> anywhere.
>
> Re "a better chance of getting RDFa into HTML5" - that's not my aim at 
> all; I'm not currently convinced that RDFa is a good solution that 
> ought to be part of the language. But that's largely irrelevant - if 
> people are going to use it anyway (which it looks like they are, at 
> least to some extent) then I'd prefer it to be specified based on 
> HTML5 rather than on XHTML1.1/HTML4, so that it's easier to implement 
> correctly and so that it doesn't conflict with HTML5's requirements, 
> and I'm not aware that anyone else is planning to specify it that way 
> (but I'd be happy if someone else did so).
>
> I don't care much about the politics of where the text ends up - it 
> just seems easier to do it as a separate document, effectively 
> defining a new "HTML5+RDFa" language rather than modifying the 
> original HTML5 language definition, which achieves the goal of making 
> sure the precise behaviour of RDFa-in-text/html is actually specified 
> somewhere (regardless of whether it's a part of HTML5 or not).
>
Sam specifically mentioned me working with you. I checked with the RDFa 
folks, and they'd already initiated discussions with you.

Sam asked about Manu, Ben et al, and my answer was for him to ask. My 
further response was that discussions are, or will be, underway, but I 
am not part of the effort, and I'm the wrong person to ask.

I agree with you in a way that this shouldn't be 'part' of HTML5. 
Neither should any of the predefined vocabularies, or microdata, either. 
The only reason they are, is because HTML5 is not extensible.

The confused concept of "validation" associated with HTML5, though, 
makes it important to at least reference RDFa in such a way that a) 
attributes are not redefined and b) people know how to use RDFa in a 
"conforming" manner with HTML5 -- based on the condition that people 
can't use one version of annotation for RDFa for XHTML 1.1, and another 
for HTML5. The whole @prefix thing was foolish. Sorry, but that's my 
opinion.

So a document as an addendum, or complementary proposal issued by some 
organization that describes how RDFa works with HTML5 (without impacting 
on how it works with HTML4, or XHTML), is good.  It allows people to use 
RDFa with HTML5, without adverse impact on the underlying RDF model, and 
without requiring changes in behavior or syntax from what currently 
works with XHTML (including XHTML5). And it sounds like you're going to 
be working with the RDFa folks moving forward on this. That's what I 
meant by "RDFa into HTML5". And I hope you all succeed.

I don't have a part in this, and that's cool. I'll continue to do my own 
thing, which is primarily writing in my own space.

You know, the biggest problem with all of this is that you have 
processing people and you have data people, but you don't necessarily 
have a lot of people who understand both worlds.

Anyway, good luck with your efforts.

---

A second email I sent based on Philip's original email:

PS I will say one thing, and I'm parroting Henri in this regard, to me a 
conforming implementation of RDFa in HTML5 is not necessarily one that 
only meets what's required for HTML5 -- it has to meet a conformance 
requirement for RDF, too. How would we know if the document is 
conforming? Because the same annotation in a document served up as 
XHTML5, should generate the exact same RDF graph, as would be generated 
if the document is served up as HTML5. To ensure this, how the 
annotation is interpreted from a data perspective must be defined in a 
single document, such as RDFa-in-XHTML.

If you have two separate documents providing rules about how triples are 
to be formed based on the same annotation, you have a failed system. You 
would be better off just ignoring RDFa and let folks generate 
"non-conforming HTML5" documents, with foreign annotation. At least 
then, RDFa extrators would have only one set of rules to apply when it 
comes to building the underlying RDF graph.

The reason why Shane's document is "sparse" on parsing  (processing) 
information (according to the WhatWG IRC entries) is that Shane was 
deferring the RDFa processor conformance to the RDFa-XHTML syntax and 
processing document. This was right and proper. He was using good 
technique.

If you cross over the boundaries that define the markup specification 
from other specifications, you leave the potential for conflicting 
conformance requirements. An example is the color section in the HTML5 
document. What if how colors are defined is changed in CSS? Well, then, 
you'd have to two sets of differing conformance requirements. I still 
can't figure out why there's a section on processing color values in 
HTML, when there shouldn't even color values within the HTML markup, 
directly. Legacy, I suppose.

Philip, you specify the attributes, which is good, because that ensures 
they're reserved, and Ian doesn't do something like @property again. 
Working through issues of existing shared attributes is also a goodness.

Then you copy the RDFaSyntax document bits, and redefine them into HTML5 
speak, which opens the door for conflicting conformance requirements, 
and worse, differing underlying RDF graphs. I can understand noting 
where specific terms in the RDFaSyntax document map to other terms in 
the HTML5 document, but providing a separate processing model...

I have to assume this was to generate a dialog, not based on actually 
delivering the document in this way -- with a "separate" processing 
model section.

There's my initial notes. I'd put it into the email lists, but frankly, 
I'm tired of everything I write or say being joked over on the WhatWG IRC.

---

Some of the correspondence was irrelevant to this group. I'm only 
duplicating it to be consistently public. Philip's follow up reply and 
mine are much more relevant to a larger discussion. In my opinion at least:




First, clarification: when I respond, I'm responding only for myself, 
not the RDF/RDFa folks.
>
> The problem in that document is it doesn't define how to map from the 
> syntax onto the RDFa-in-XHTML processing model, which leaves a gap 
> where the behaviour is undefined. E.g. I can write <div xmlns:="..."> 
> in HTML, and I don't know whether that attribute should be ignored or 
> should redefine the default prefix mapping, because it's impossible in 
> XHTML and so the RDFa-in-XHTML specification doesn't explain how to 
> handle it.
But you don't have to re-specify a section to explain gaps. Or you don't 
have to re-state those sections with which you're in agreement.

The RDFa document, itself, falls back on certain processing rules -- 
defined both in XHTML, and indirectly, in XML. I don't think there's any 
conflict by specifying in the RDFa in HTML5 document that where such 
rules exist implicitly in the RDFa in XHTML document, they're explicitly 
given in the HTML5 document.

>
> One idea for fixing the gap is to produce a more detailed mapping from 
> text/html onto the RDFa-in-XHTML processing model. But that seems like 
> an unpleasantly difficult solution, since RDFa-in-XHTML wasn't really 
> designed to be used like that and there lots of small mismatches and 
> edge cases that make it tricky.
But if you create a _new_ processing model, there will eventually be two 
set of rules to follow, which introduces corruption in the underlying 
data models (RDF graphs).

You keep talking about processing the data _within_ the document using 
JS, and I'm trying to make a point that the majority of RDF ends up 
merged with other RDF from other documents in much larger pools of data. 
Personally I don't give a damn about processing RDF in my pages with JS. 
And I don't think I'm necessarily an exception. I can tell that most of 
the work being done with Drupal 7 is based on the data being consumed 
outside the pages, rather than within.

So from a mindset perspective, we have to get away from this JS/Ajax, 
in-page view of the data and look at it from a broader perspective. It 
would be better not to have any data, than to have "bad" data.

I'm assuming you've worked with databases created by other entities 
where you've not had control over the creation of the data model 
underlying the database, or the validation of the data going into the 
database. If you've participated in any kind of a data clean up 
operation, you must know that no data is all is actually easier to 
manage, than not being able to tell what is good data, from "bad". Once 
that's happened, good and bad mixed, with no clear clue as to which is 
which, the database is completely corrupted, and has to be discarded.


>
> Since HTML 5 already defines how to handle text/html and 
> application/xhtml+xml in a common processing model, ...
Has it, though? I've looked through the document, and if you are talking 
about processing, how do we handle xmlns in HTML5 land? How do we deal 
with <svg:svg in HTML5 land?

I really don't think the current HTML5 document really has dealt with a 
"common processing model" for both HTML5 and XHTML5. That's just my 
opinion, though.

> I think redefining the RDFa processing model on top of the HTML 5 
> processing model is possibly the best way to get well-defined, 
> consistent behaviour between HTML and XHTML. So it would entirely 
> replace the current RDFa-in-XHTML spec, ensuring there's only a single 
> document telling people how to parse RDFa in both HTML and XHTML. 
> Maybe it should be thought of as a new edition of the existing spec, 
> rather than a totally new spec.
>
Again, I cannot agree. The microdata model generates RDF triples that 
don't map to what the supposed equivalent RDFa annotation would provide. 
Even with the new additions of rdf:type and about. I don't feel sanguine 
that things would improve if the HTML5's document actually replaces the 
RDFa-in-XHTML spec -- in fact I think you better have a heart to heart 
with Manu et al about that one, right away.

I admire the confidence of the WhatWG group, but I don't think that the 
way into the future of the web is to have every specification washed 
through the HTML5 group, just because that's the only way to _ensure_ 
that it's "processed properly". Sometimes I come away from reading the 
WhatWG IRC absolutely astonished that the web we have today actually 
exists, because all of it is so darn crappy.

Regardless of what Manu, Ben, et al say, I feel confident in saying that 
the RDFa-in-XHTML spec is not going to be replaced by the HTML5 working 
group. I believe that compromise and cooperate rather than replace is a 
better way forward.

> I guess there are lots of political/process issues with doing that, 
> but it'd be nice to have a technically sound solution before getting 
> blocked by those issues.
>
Well, I think you have more than political issues going now. Google just 
took RDFa and exploded it all over the place. This in addition to the 
other uses of RDFa that will be introduced in Drupal 7, and elsewhere. 
Uses that will probably incorporate more sophisticated uses of RDFa than 
Google's use. RDFa, as documented in the RDFaSyntax document will 
continue to exist, regardless of what happens with HTML5. I believe it 
would be in everyone's best interest to assume this is so.

Either we all come to some kind of agreement (with supporting 
documentation) to live and let live, or we just ignore each other, and 
go on like we are now. Amicably, hopefully. One subsuming the other is 
not going to happen.

But then, that's just my opinion. I'm not a member of the RDFa group, 
and can't speak for their opinions.

---

Sorry for the length of posting, typos, asides and so on. Hopefully 
there might be something of interest to folks in the exchange.

Shelley

Received on Friday, 22 May 2009 13:47:25 UTC