- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 9 Jun 2009 23:29:15 +0000 (UTC)
On Thu, 14 May 2009, Eduard Pascual wrote: > > I have put online a document that describes my idea/proposal for a > selector-based solution to metadata. The document can be found at > http://herenvardo.googlepages.com/CRDF.pdf Feel free to copy and/or link > the file wherever you deem appropriate. > > Needless to say, feedback and constructive criticism to the proposal is > always welcome. (Note: if discussion about this proposal should take > place somewhere else, please let me know.) This proposal is very similar to RDF EASE. While I sympathise with the goal of making semantic extraction easier, I feel this approach has several fundamental problems which make it inappropriate for the specific use cases that were brought up and which resulted in the microdata proposal: * It separates (by design) the semantics from the data with those semantics. I think this is a level of indirection too far -- when something is a heading, it should _be_ a heading, it shouldn't be labeled opaquely with a transformation sheet elsewhere defining that is maps to the heading semantic. * It is even more brittle in the face of copy-and-paste and regular maintenance than, say, namespace prefixes. It is very easy to forget to copy the semantic transformation rules. It is very easy to edit the document such that the selectors no longer match what they used to match. It's not at all obvious from looking at the page that there are semantics there. * It relies on selectors to do something subtle. Authors have a great deal of trouble understanding selectors -- if you watch a typical Web authors writing CSS, he will either use just class selectors, or he will write selectors by trial and error until he gets the style he wants. This isn't fatal for CSS because you can see the results right there; for something as subtle as semantic data mining, it is extremely likely that authors will make mistakes that turn their data into garbage, which would make the feature impractical for large-scale use. I say this despite really wanting Selectors to succeed (disclosure: I'm one of the editors of the Selectors specification and spent years working on its test suite). I think CRDF has a bright future in doing the kind of thing GRDDL does, and in extracting data from pages that were written by authors who did not want to provide semantic data (i.e. screen scraping). It's an interesting way of converting, say, Microformats to RDF. Having said that, I do agree that the repetition of microdata requires in common scenarios with blocks of repeated data is unfortunate. It is worse than the repetition one has just from the basic HTML markup. e.g. this: <table> <tr> <td> Hedral <td> Black <tr> <td> Pillar <td> White </table> ...becomes this: <table> <tr item> <td itemprop=name> Hedral <td itemprop=color> Black <tr item> <td itemprop=name> Pillar <td itemprop=color> White </table> ...or even: <table> <tr item=com.example.cat> <td itemprop=com.example.name> Hedral <td itemprop=com.example.color> Black <tr item> <td itemprop=com.example.name> Pillar <td itemprop=com.example.color> White </table> ...which is far more verbose than ideal. I considered special casing tables (using <col itemprop> to set itemprop="" for all cells in a column) but it would require quite a lot of complexity in processors since they'd additionally have to implement the table model, and having seen the quality of some of the implementations of metadata extractors used on Web content, I fear that that will be far too much complexity. (I fear even subject="" might already be too much.) The simpler we make it the more reliable it will be. It also wouldn't solve the problem with other patterns, e.g. <dl> (which approaches like CRDF's handle fine). I don't have a good answer for the repetition problem. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 9 June 2009 16:29:15 UTC