- From: Shelley Powers <shelleyp@burningbird.net>
- Date: Fri, 22 May 2009 08:46:40 -0500
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Philip Taylor wrote: > Seeing as people are implementing RDFa parsers for text/html, I guess > it would be good to have a specification that says how they should work. > > http://www3.aptest.com/standards/rdfa-html/ doesn't answer the > questions I'd want answered (e.g. in > http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0102.html), > and HTML 4 seems to make it impossible to express an answer. Some > existing RDFa-in-text/html parsers are based on document models that > closely match the DOM-like model used by HTML 5 (e.g. browser-based JS > implementations, and some Python ones using an html5lib DOM, and maybe > others), and the model used by HTML 5 can be implemented in a variety > of other ways (e.g. unbuffered SAX) so it's not too restrictive, and > so it seems like the most useful way to define RDFa-in-text/html > processing. > > I've not seen anyone else working on this, so I started writing a > rough draft at <http://philip.html5.org/docs/rdfa/>. Some of it is > copied from the RDFa-in-XHTML specification, and just tweaked to use > some new definitions and to share concepts (like base and lang) with > HTML 5 and to cope with text/html parsing (for xmlns:* attributes). > The CURIE definitions are new, since I didn't see any existing > document that defined them in an appropriate way. > > There are several unresolved design issues (e.g. handling of > case-sensitivity, use of xmlns:* vs other mechanisms that cause fewer > problems, etc) - I haven't intended to make any decisions on such > issues, I've just attempted to define the behaviour with sufficient > detail that it should make those issues visible. > > The current draft is far from complete or correct, but it shows > roughly the way I'd like to have things defined (and I hope it's > roughly the way that HTML5/WHATWG people would like it to be defined, > in order to support implementers and to be testable), and maybe it > could end up being useful for something, so I'm just throwing it out > here for discussion. > Philip and I started an email exchange because of some postings on Twitter. I wanted to replicate the discussion here, with Philip's permission. Some is unimportant, but I wanted to preserve context. Note that these are from my perspective, so quoted material is from Philip, none quoted is mine. First email from Philip and my reply: Philip Taylor wrote: > I saw some discussion on Twitter, so just to clarify what the > situation is (as far as I'm aware of it): > > I wrote the draft without having talked about it to anybody at all, > because I thought (and still think) it might lead to something useful, > and it seemed easier to just write something concrete rather than > discuss it first. I posted about it to public-html and > public-rdf-in-xhtml-tf, since that seems the easiest way to contact > people who might be interested. A few people from the RDF side replied > privately, including Manu (expressing a desire to discuss things > further). Sam replied in public. That's about all there is. > > Re "My input was not sought"/"This wasn't a party I was invited to" - > I haven't sought input from anybody (except the public-* lists). If > this triggered some internal conversation in the RDFa world that you > were excluded from, I know nothing about it. If I continue working on > this, I'd be happy to hear technical comments about the content from > anywhere. > > Re "a better chance of getting RDFa into HTML5" - that's not my aim at > all; I'm not currently convinced that RDFa is a good solution that > ought to be part of the language. But that's largely irrelevant - if > people are going to use it anyway (which it looks like they are, at > least to some extent) then I'd prefer it to be specified based on > HTML5 rather than on XHTML1.1/HTML4, so that it's easier to implement > correctly and so that it doesn't conflict with HTML5's requirements, > and I'm not aware that anyone else is planning to specify it that way > (but I'd be happy if someone else did so). > > I don't care much about the politics of where the text ends up - it > just seems easier to do it as a separate document, effectively > defining a new "HTML5+RDFa" language rather than modifying the > original HTML5 language definition, which achieves the goal of making > sure the precise behaviour of RDFa-in-text/html is actually specified > somewhere (regardless of whether it's a part of HTML5 or not). > Sam specifically mentioned me working with you. I checked with the RDFa folks, and they'd already initiated discussions with you. Sam asked about Manu, Ben et al, and my answer was for him to ask. My further response was that discussions are, or will be, underway, but I am not part of the effort, and I'm the wrong person to ask. I agree with you in a way that this shouldn't be 'part' of HTML5. Neither should any of the predefined vocabularies, or microdata, either. The only reason they are, is because HTML5 is not extensible. The confused concept of "validation" associated with HTML5, though, makes it important to at least reference RDFa in such a way that a) attributes are not redefined and b) people know how to use RDFa in a "conforming" manner with HTML5 -- based on the condition that people can't use one version of annotation for RDFa for XHTML 1.1, and another for HTML5. The whole @prefix thing was foolish. Sorry, but that's my opinion. So a document as an addendum, or complementary proposal issued by some organization that describes how RDFa works with HTML5 (without impacting on how it works with HTML4, or XHTML), is good. It allows people to use RDFa with HTML5, without adverse impact on the underlying RDF model, and without requiring changes in behavior or syntax from what currently works with XHTML (including XHTML5). And it sounds like you're going to be working with the RDFa folks moving forward on this. That's what I meant by "RDFa into HTML5". And I hope you all succeed. I don't have a part in this, and that's cool. I'll continue to do my own thing, which is primarily writing in my own space. You know, the biggest problem with all of this is that you have processing people and you have data people, but you don't necessarily have a lot of people who understand both worlds. Anyway, good luck with your efforts. --- A second email I sent based on Philip's original email: PS I will say one thing, and I'm parroting Henri in this regard, to me a conforming implementation of RDFa in HTML5 is not necessarily one that only meets what's required for HTML5 -- it has to meet a conformance requirement for RDF, too. How would we know if the document is conforming? Because the same annotation in a document served up as XHTML5, should generate the exact same RDF graph, as would be generated if the document is served up as HTML5. To ensure this, how the annotation is interpreted from a data perspective must be defined in a single document, such as RDFa-in-XHTML. If you have two separate documents providing rules about how triples are to be formed based on the same annotation, you have a failed system. You would be better off just ignoring RDFa and let folks generate "non-conforming HTML5" documents, with foreign annotation. At least then, RDFa extrators would have only one set of rules to apply when it comes to building the underlying RDF graph. The reason why Shane's document is "sparse" on parsing (processing) information (according to the WhatWG IRC entries) is that Shane was deferring the RDFa processor conformance to the RDFa-XHTML syntax and processing document. This was right and proper. He was using good technique. If you cross over the boundaries that define the markup specification from other specifications, you leave the potential for conflicting conformance requirements. An example is the color section in the HTML5 document. What if how colors are defined is changed in CSS? Well, then, you'd have to two sets of differing conformance requirements. I still can't figure out why there's a section on processing color values in HTML, when there shouldn't even color values within the HTML markup, directly. Legacy, I suppose. Philip, you specify the attributes, which is good, because that ensures they're reserved, and Ian doesn't do something like @property again. Working through issues of existing shared attributes is also a goodness. Then you copy the RDFaSyntax document bits, and redefine them into HTML5 speak, which opens the door for conflicting conformance requirements, and worse, differing underlying RDF graphs. I can understand noting where specific terms in the RDFaSyntax document map to other terms in the HTML5 document, but providing a separate processing model... I have to assume this was to generate a dialog, not based on actually delivering the document in this way -- with a "separate" processing model section. There's my initial notes. I'd put it into the email lists, but frankly, I'm tired of everything I write or say being joked over on the WhatWG IRC. --- Some of the correspondence was irrelevant to this group. I'm only duplicating it to be consistently public. Philip's follow up reply and mine are much more relevant to a larger discussion. In my opinion at least: First, clarification: when I respond, I'm responding only for myself, not the RDF/RDFa folks. > > The problem in that document is it doesn't define how to map from the > syntax onto the RDFa-in-XHTML processing model, which leaves a gap > where the behaviour is undefined. E.g. I can write <div xmlns:="..."> > in HTML, and I don't know whether that attribute should be ignored or > should redefine the default prefix mapping, because it's impossible in > XHTML and so the RDFa-in-XHTML specification doesn't explain how to > handle it. But you don't have to re-specify a section to explain gaps. Or you don't have to re-state those sections with which you're in agreement. The RDFa document, itself, falls back on certain processing rules -- defined both in XHTML, and indirectly, in XML. I don't think there's any conflict by specifying in the RDFa in HTML5 document that where such rules exist implicitly in the RDFa in XHTML document, they're explicitly given in the HTML5 document. > > One idea for fixing the gap is to produce a more detailed mapping from > text/html onto the RDFa-in-XHTML processing model. But that seems like > an unpleasantly difficult solution, since RDFa-in-XHTML wasn't really > designed to be used like that and there lots of small mismatches and > edge cases that make it tricky. But if you create a _new_ processing model, there will eventually be two set of rules to follow, which introduces corruption in the underlying data models (RDF graphs). You keep talking about processing the data _within_ the document using JS, and I'm trying to make a point that the majority of RDF ends up merged with other RDF from other documents in much larger pools of data. Personally I don't give a damn about processing RDF in my pages with JS. And I don't think I'm necessarily an exception. I can tell that most of the work being done with Drupal 7 is based on the data being consumed outside the pages, rather than within. So from a mindset perspective, we have to get away from this JS/Ajax, in-page view of the data and look at it from a broader perspective. It would be better not to have any data, than to have "bad" data. I'm assuming you've worked with databases created by other entities where you've not had control over the creation of the data model underlying the database, or the validation of the data going into the database. If you've participated in any kind of a data clean up operation, you must know that no data is all is actually easier to manage, than not being able to tell what is good data, from "bad". Once that's happened, good and bad mixed, with no clear clue as to which is which, the database is completely corrupted, and has to be discarded. > > Since HTML 5 already defines how to handle text/html and > application/xhtml+xml in a common processing model, ... Has it, though? I've looked through the document, and if you are talking about processing, how do we handle xmlns in HTML5 land? How do we deal with <svg:svg in HTML5 land? I really don't think the current HTML5 document really has dealt with a "common processing model" for both HTML5 and XHTML5. That's just my opinion, though. > I think redefining the RDFa processing model on top of the HTML 5 > processing model is possibly the best way to get well-defined, > consistent behaviour between HTML and XHTML. So it would entirely > replace the current RDFa-in-XHTML spec, ensuring there's only a single > document telling people how to parse RDFa in both HTML and XHTML. > Maybe it should be thought of as a new edition of the existing spec, > rather than a totally new spec. > Again, I cannot agree. The microdata model generates RDF triples that don't map to what the supposed equivalent RDFa annotation would provide. Even with the new additions of rdf:type and about. I don't feel sanguine that things would improve if the HTML5's document actually replaces the RDFa-in-XHTML spec -- in fact I think you better have a heart to heart with Manu et al about that one, right away. I admire the confidence of the WhatWG group, but I don't think that the way into the future of the web is to have every specification washed through the HTML5 group, just because that's the only way to _ensure_ that it's "processed properly". Sometimes I come away from reading the WhatWG IRC absolutely astonished that the web we have today actually exists, because all of it is so darn crappy. Regardless of what Manu, Ben, et al say, I feel confident in saying that the RDFa-in-XHTML spec is not going to be replaced by the HTML5 working group. I believe that compromise and cooperate rather than replace is a better way forward. > I guess there are lots of political/process issues with doing that, > but it'd be nice to have a technically sound solution before getting > blocked by those issues. > Well, I think you have more than political issues going now. Google just took RDFa and exploded it all over the place. This in addition to the other uses of RDFa that will be introduced in Drupal 7, and elsewhere. Uses that will probably incorporate more sophisticated uses of RDFa than Google's use. RDFa, as documented in the RDFaSyntax document will continue to exist, regardless of what happens with HTML5. I believe it would be in everyone's best interest to assume this is so. Either we all come to some kind of agreement (with supporting documentation) to live and let live, or we just ignore each other, and go on like we are now. Amicably, hopefully. One subsuming the other is not going to happen. But then, that's just my opinion. I'm not a member of the RDFa group, and can't speak for their opinions. --- Sorry for the length of posting, typos, asides and so on. Hopefully there might be something of interest to folks in the exchange. Shelley
Received on Friday, 22 May 2009 13:47:25 UTC