- From: Eduard Pascual <herenvardo@gmail.com>
- Date: Wed, 10 Jun 2009 12:03:50 +0200
First of all, Ian, thank for your reply. I appreciate any opinions on this subject. On Wed, Jun 10, 2009 at 1:29 AM, Ian Hickson<ian at hixie.ch> wrote: > This proposal is very similar to RDF EASE. Indeed, they are both CSS-based, and they fulfill similar purposes. Let me, however, highlight some differences: 1st, EASE is tighly bound to RDFa. However, RDFa is meant for embeeding metadata, and was built with that purpose on mind; while EASE is meant for linked metadata, so builiding it on top of RDFa's embeeding constructs is quite unnatural. In contrast, CRDF is build from CSS's syntax and RDF's (not RDFa's) concepts: it only shares with RDFa what they both inherit from RDF: the concepts and data model. 2nd, EASE is meant to be complimentary to RDFa: they address (or attempt to address) different use cases / needs (embeeding vs. linking). On the other hand (more on this below), CRDF attempts to address both cases, plus the case where an hybrid approach is appropriate (inlining some metadata, and linking other). > While I sympathise with the > goal of making semantic extraction easier, I feel this approach has > several fundamental problems which make it inappropriate for the specific > use cases that were brought up and which resulted in the microdata > proposal: > > ?* It separates (by design) the semantics from the data with those > ? semantics. That's not accurate. CRDF *allows* separating the semantics, but doesn't require to do so. Everything could be inlined, and the possibility of separation is just for when it is needed. > I think this is a level of indirection too far -- when > ? something is a heading, it should _be_ a heading, it shouldn't be > ? labeled opaquely with a transformation sheet elsewhere defining that is > ? maps to the heading semantic. That doesn't make much sense. When something is a heading, it *is* a heading. What do you mean by "should be a heading?". CRDF (as well as many other syntaxes for RDF) allow parsers that don't know the specific semantics of the markup language to find out that something is actually a heading anyway; and allows expressing semantics that the markup language has no direct support for (for example, is it a site-section heading? a news heading? an iguana's name (used as the main title for each iguana's page on the iguana collection example)? something else?). > ?* It is even more brittle in the face of copy-and-paste and regular > ? maintenance than, say, namespace prefixes. It is very easy to forget to > ? copy the semantic transformation rules. It is very easy to edit the > ? document such that the selectors no longer match what they used to > ? match. It's not at all obvious from looking at the page that there are > ? semantics there. I think the whole copy-paste thing should be broken on two separate scenarios: Copy-pasting source code: with the next version of the document (which I'm already cleaning up, and will allow "@namespace" rules inside the inlining attribute), this will be as brittle (and as resillient) as prefixes are: when a fragment that includes the "@namespace"s or prefixes it needs is copy-pasted, it will work as expected; OTOH, if a rule relies on a namespace that is not available (declared outside of the copy-pasted fragment), the rule will just be ignored. The risk of the copied code clashing with declarations on its new location is lower than it may seem: an author who is already adding CRDF code to his pages is quite likely to review the code he's copying for the semantics that may be there; and authoring tools that automatically add semantic code should review whether things make sense or not when pasting code on them (for example, invalid/redundant properties could/should be notified to the author). Copy-pasting content: currently, browser support for copy-pasting CSS styled content is mediocre and inconsistent (some browsers do it right, some don't, some don't even try), but this is already more than what is supported for RDFa, Microdata, or other semantic formats. With a bit of luck, pressure for browsers to include CRDF properties when copying content could help to get decent support for CSS properties as well (since most of the code for these tasks would be shared). > ?* It relies on selectors to do something subtle. Authors have a great > ? deal of trouble understanding selectors -- if you watch a typical Web > ? authors writing CSS, he will either use just class selectors, or he > ? will write selectors by trial and error until he gets the style he > ? wants. This isn't fatal for CSS because you can see the results right > ? there; for something as subtle as semantic data mining, it is extremely > ? likely that authors will make mistakes that turn their data into > ? garbage, which would make the feature impractical for large-scale use. It relies on selectors to do what they do: select things. Nobody is *asking* authors to make use of over-complicated selectors for each piece of metadata they want to add; but CRDF tries to *allow* using any valid selector for each case that needs it. For what I have seen, most "newbies" can handle both class and element selectors, and the descendant combinator. A good portion of them is even aware that they can use the descendant to combine selectors of either type, and even that they can chain this. In general, they use the few selectors they can manage when it's enough for the job, and rely on classes and inline styles to simplify their life. Even in the case some users went for trial and error when crafting their selectors for CRDF, trial implies trying, and trying implies somehow checking the result to match expectations: even if it requires a bit more of attention to catch the errors, even this may be simplified: just take a look at this sample (only relevant parts of the markup are included, for simplicity): myFile.crdf: @namespace foo "http://www.example.com/" myOverComplexSelector1 { color: red; /* the CRDF parser will ignore this rule */ foo|myProperty: someValue; } myOverComplexSelector2 { color: blue; foo|myProperty: someOtherValue; } myFile.html: <link href="myFile.crdf" type="text/css"> <link href="myFile.crdf" type="text/crdf"> ... For most content, a red or blue color will denote each of the foo|myProperty values. Mixing CSS with CRDF is ugly (but safe), and this is not 100% reliable: something may be red or blue without having the relevant value, or have such value but the color being overriden by a more specific style, but looking at patterns (and CRDF is intended to use selectors essentially for clearly patterned cases, such as tables or <dl>'s) it should allow to check, on most cases, whether the selector is doing what's expected or not. For example, if the selector takes the form td:nth-of-type(<whatever>), this method allows checking it at a quick glance: even if in some cells the contents override the color, it may be seen on the general if the column as a whole is getting the relevant color, and hence the relevant property value. One could go further, and disable (commenting out in the HTML, or adding a "~" to the file name for its fetch to fail) the actual CSS while testing the CRDF, and then there'll be no risk of style overrides. And, if I try, I bet I may find other tricks to visually test CRDF rules. Of course, this doesn't replace accurate testing, which should be always done, but is an example of how the similarities between CRDF and CSS can be exploited to make author's life easier. There is one case left: those authors who overcomplicate themselves and just don't test things at all. I have already stated that, while I am ok with some foolproofing, I'm totally against suicide-proofing: if someone wants to doom their own page, they don't need the specs' help to do so, there are plenty of ways; so adding complexity to specs to deal with such cases is just a waste. > I say this despite really wanting Selectors to succeed (disclosure: I'm > one of the editors of the Selectors specification and spent years working > on its test suite). I am aware of your work on the Selectors spec and tests, and that's why I value your feedback on this area so much. > I think CRDF has a bright future in doing the kind of thing GRDDL does, I'm not sure about what GRDDL does: I just took a look through the spec, and it seems to me that it's just an overcomplication of what XSLT can already do; so I'm not sure if I should take that statement as a good or a bad thing. > and in extracting data from pages that were written by authors who did not > want to provide semantic data (i.e. screen scraping). Now that you mention it, I have to agree... I haven't noticed that before, but I'll probably make some way for this detail on the document's intro and/or the examples > It's an interesting way of converting, say, Microformats to RDF. The ability to convert Microformats to RDF was intended (although not fully achieved: some "bad" content would be treated differently between CRDF and Microformats); and in the same way CRDF also provides the ability to define de-centralized Microformats.org-like vocabularies (I'm not sure if referring to these as "microformats" would still be appropiate). Once again, I want to thank you for your feedback. Besides some fixes, your mail has also convinced me to add some clarifications to the document for some recurrent missconceptions (for example, CRDF doesn't require, nor even encourages, taking all the semantics out of the main document: semantics should be kept as close as possible to the content as long as this doesn't force redundance/repetition). Regards, Eduard Pascual
Received on Wednesday, 10 June 2009 03:03:50 UTC