- From: Eduard Pascual <herenvardo@gmail.com>
- Date: Thu, 21 May 2009 13:26:54 +0200
On Wed, May 20, 2009 at 6:56 PM, Toby Inkster <mail at tobyinkster.co.uk> wrote: > Given that one of the objections people cite with RDFa is complexity, > I'm not sure how this resolves things. It seems twice as complicated to > me. It creates fewer new attributes, true, but number of attributes > themselves don't create much confusion. The reduced number of attributes in CRDF is not aimed to deal with complexity; but with a separate issue: it is easier for a host language to add a rel value for <link>s and an extra attribute with no predefined name, than the bunch of attributes RDFa defines. Actually, there have been some complains [1] about why should HTML5 restraint itself from using quite useful attribute names such as "content" or "resource", just because RDFa decided to use them, without giving non-X HTML a thought. In other words: currently, RDFa parsers should have enough to ignore non-X HTML content (or, more specifically, documents with no default xmlns in <body>, so they can also cope with the XHTML1.1+RDFa served as text/html aberration, which is wrong no matter how you look at it). If RDFa was taken into HTML5, then parsers should also care about non-X documents, which binds HTML to not use these attribute names for any future extension (actually, as pointed on Ian's mail referenced above, @content is already used on <meta> since HTML4, so this can't even be fulfilled). CRDF takes a *less intrussive* approach: it minimizes number of attributes, and even lets the host language to chose the name for them; with the only requirement of them being defined in the spec as "CRDF inline stuff" (the document suggest one wording for this, but explicitly allows for any equivalent wording). The goal of the fewer attributes, hence, is not to be *simpler*, but to be *less intrussive*; and the referenced mail is the main reason to want to be so. On the simplicity/complexity debate, I'll point out that the isn't a goal to make things too much simpler with CRDF (although there is a goal to not make them more complex than needed). Also, keep in mind that the document is in a quite early stage: many things are too vaguely defined yet, and will become clearer once it matures. Again, let me insist that the goal of these early versions is to describe the idea and concept, and to draw feedback about it: it is *not* a spec, and there are many details that are just left implicit or even undefined yet. I'll make sure to clearly state the design goals of CRDF on the next iteration of the document, to avoid confusion. > e.g. which is a simpler syntax: > > <a href="http://foo.example.com/" > ? ping="http://tracker.example.com/">Foo</a> > > or: > > <a href="primary:url(?'http://foo.example.com/'); > ? ? ? ? secondary:url('?http://tracker.example.com/');">Foo</a> Like Tab, I think this example is completely unrelated. Fortunately, I could understand your point without it ;-) > Stuffing multiple discrete pieces of information makes things harder for > parsing, harder for authoring tools and harder for authors. Parsing a @crdf (or whatever it gets named) attribute shouldn't be much harder than parsing a @style attribute. Furthermore, a good deal of CSS parsing code can be reused to build CRDF parsers. Similarly, authoring tools that already handle CSS styling may reuse a good deal of the code to enable handling of CRDF metadata; and authors may apply most concepts from CSS (such as Selectors, or the property:value syntax) to CRDF. > In RDFa, each attribute performs a simple role - e.g. @rel specifies the > relationship between two resources; @rev specifies the relationship in > the reverse direction; @content allows you to override the > human-readable text of an element. Combining these into a single > attribute would not make things simpler. Nop, it doesn't. It doesn't have to make things too much more complex either. But it makes the format easier to integrate in the host language, since it requires less changes and such changes are more flexible. Now, let's speak about simplicity: CRDF gives you two ways to define the values of properties (or, actually, two variants of the same way): the CSSish property:value syntax and the short-hand syntax that just omits the value and defaults it to the "contents" keyword. RDFa can be defining a value with href, or with contents, or have it implicit; which depends on whether the rel/rev or property attributes are used, and whether the contents attribute is present or not. That makes three ways of defining property values, which depend on up to four different attributes, against CRDF two ways which only vary on the value being given or not. Next, RDFa may be defining the property itself on @property, on @rel, or on @rev; while CRDF always define them the same way. For subjects, OMG, both @about and @src may be defining them, or they may be "inherited" from parent elements; while on CSS they are always defined via @|subject or inherited through the well-defined CSS cascading rules (actually, they may also be re-defined for "reversed" properties, but that part is being considered for removal since it's unclear that it is really needed and it's too complex). On resources, I'd prefer to hold this, because implicit and explicit types will completely handle that on CRDF (the resource() notation will disappear on the next iteration of the document), but they are still to vaguely defined on the document. In summary, anything that is defined as a URI (either by the implicit type rules or by user's explicit typing) will be a resource; after all a URI is a Universal *Resource* Identifier. On datatypes, it seems quite like a tie: RDFa's typeof and datatype vs. CRDF's @|typeof and explicit type syntax; maybe a bit more complex in CRDF due to the implicit type rules, but this is just a convenience: you can explicitly define the type of everything if you want to, and most cases will be quite obvious anyways. OTOH, RDFa also has its own default type rules (for example, values are normally string literals, but when they are taken from @href they are URIs/resources). Simplicity is actually just a secondary goal for CRDF, but it's pursued whenever it can be achieved without compromising the primary goals. > Looking at the comparison given in section 4.2, CRDF appears to suffer > from several disadvantages compared to RDFa: > > 1. It's pretty ugly. This is completely subjective, so I won't go any deeper into it. If you are capable of translating that into specific, objective issues, then I'll be eager to hear about and try to address them. > 2. It's more verbose - though only by eleven bytes by my reckoning, so > this isn't a major issue. This is a small cost. It is small because it's only eleven characters (the actual ammount of bytes depends on the actual encoding chosen); and because it only applies in quite specific scenarios: when all metadata needs to be inlined. In exchange for that cost, we get other benefits (primarily the iguana collection case: take the example in 4.3 and try to write the RDFa equivalent; then check how much the RDFa code grows when adding the 4th, 5th, etc specimens to the collection). > 3. It divorces the CURIE prefix definitions from the use of CURIEs in > the markup. This makes it more vulnerable to copy-paste problems. (As I > understand <link rel="metadata"> in the proposal, CURIE prefix > definitions can even be separated out into an external file. This > obscures them greatly and will certainly be a cause of copy-paste > issues!) You either missunderstood the way namespaces are handled in CRDF, or are deliberately missrepresenting it. I prefer to asume the former rather than the later, so let me clarify some things: 1st: RDFa allows you to define all the prefixes in <body> if you want. This is not too different from CRDF prefixes defined within <head>. 2nd: CRDF allows you to put some prefixes in a <script> as close as you need to the code that will use them. There is a whole subsection in the document (3.4. Additional considerations, very likely to be renamed in further versions to something more descriptive) that deals with the scoping of such <script> elements so, if you were copying content from multiple sources, it would be unlikely for > 4. It's ugly. I'm sorry, I just can't emphasise that enough. That's subjective. I just can't emphasize that enough. Again, if you can turn that into specific issues, I'll do my best to give them the propper attention. > Apart from the fact that *sometimes* RDFa involves a bit of repetition, > I don't see what problems this proposal is actually supposed to solve. I'll make sure to clearly state the goals and use cases CRDF attempts to address in the next version of the document. Until then, here is a summary: RDFa *sometimes* involves *a bit* of repetition; but on several cases, it involves *a lot* of *error-prone* repetition. I want to emphasize the "error-prone" aspect because, IMO, it is critical: if there are errors in the metadata, then automated extraction is not reliable, so manual extraction is required and the whole purpose of machine-readable metadata is lost. In other words, RDFa's error-prowness for some scenarios make it a too brittle solution. Also, this makes maintenance of those pages terribly painful: a minor change in structure may require dozens of identical changes across the document, for example. There is also an issue of overbloating the markup, even if that's more secundary. After reviewing many examples of RDFa vs CRDF vs many other options, I have come up with an abstraction that I think is quite accurate: - When the semantics are conveyed to the user via prose, the best way to convey these semantics to the machine is with inline metadata. On these cases, RDFa does a good enough job because it's entirely focused on inline metadata. - When semantics are conveyed to the user via structure/layout (for example, with tables or even lists), inline metadata becomes inadequate, due to the error-prowness and brittleness issue described above. If a table only needs to state its column headers (which are the element that conveys the semantics to the user) once per column, it seems obvious that the best approach would be to allow the author to state these semantics also only once. A selector-based approach like CRDF or EASE is one way to achieve this. There may be other ways, but I don't know of any which doesn't heavily rely on HTML's semantics. - On the huge web, both cases should be taken in account. RDFa handles the first (semantics-in-prose) case, but can't properly deal with the second (semantics-in-structure) one. EASE, on the other hand, can do an acceptable job on the second case, but can't handle the first at all. This is why CRDF attempts to deal with both scenarios. Another problem is intrussion in the host language. In the RDFa case, the issue directly comes from its X-centrism (not to blame the RDFa guys, keeping in mind that the format was built on a time when the "XML is the future" idea was a quite common belief), to the point that it's a non-issue for XHTML itself: with XHTML1.1 and 2.0 modularization, the many attributes in RDFa can be defined in a separate module and, voil?, no conflict: you can use the RDFa module in your DTD, or you can use some other module that makes a different use of these attribute names. For tag-soup HTML, RDFa attributes are too many croutons for the already loaded soup: they prevent the reuse of these names for future uses, and add extra crowding to the already crowded content model. Still within the intrussion problem, there is the prefix issue: RDFa requires the host language to define namespace prefixes: again, something reasonable in a "XML is the future" world, since XML-based languages already have a built-in mechanism for this task; but this raises too many problems on non-X HTML. For these reasons, CRDF attempts to minimize intrussion: it only requires one attribute, which the host language is free to name however they please, and takes the prefix definition task on itself, so it doesn't burden the host. Actually, I find it quite beautiful how the prefix thing has, on itself, shielded CRDF against mixing it with CSS: CRDF parsers would ignore non-prefixed properties; while CSS parsers would puke at and ignore the prefixed ones. > Repetition in practise seems to be something that page authors can deal > with. We don't provide a mechanism for setting the src or alt attributes > of multiple <img> elements which need to load the external image; or > setting the class attribute of the third cell in every row of a table. No, but we provide mechanisms to just avoiding multiple <img> elements for the same image, or repeating the class attribute for the third row in every table, to follow your examples. For the table (simpler case), why would you repeat the class once and again when you can select it via tr>td:nth-child(3)? (My apologies if I messed up with the selector; I'm not realy used to the :nth-whatever pseudoclasses, but I hope the idea is clear). For the multiple img elements, the solution is still a draft, and you can find it at [2]. However, I want to point out that content repetition and code repetition is not the same: if you are authoring a page with repeated content (for whatever reason), you can view that the content shows up properly on your browser. For metadata code, this is tricker, since you aren't dealing with something intended for humans to deal with even if you review the output of a RDFa parser > So again, while I can see that this proposal would "work", in what way > is it supposed to be preferable to RDFa? 1) It's less intrussive for non-X HTML languages (less attributes, no "xmlns issue"). 2) It eases the maintenance and reduces the chance of errors on documents where inline metadata is far less than optimal. 3) It can still handle inline metadata, regardless it's not limited to inlining. On Thu, May 21, 2009 at 12:10 AM, Tab Atkins Jr. <jackalmage at gmail.com> wrote: > On Wed, May 20, 2009 at 11:56 AM, Toby Inkster <mail at tobyinkster.co.uk> wrote: >> [...] >> 2. It's more verbose - though only by eleven bytes by my reckoning, so >> this isn't a major issue. > > When used inline, it may be. It's not *intended* to be used inline, > though - that's just there for the occasional case when you absolutely > need to do so, just as @style is available but discouraged in favor of > external CSS. Please, let me state again that CRDF is intended to be used *both* inline and via selectors (be it on <script> or similar embedding elements, or <link>ed from an external file), since both uses have their own use cases and scenarios. The bit of extra verbosity on the inline case, when compared to an inline-only format such as RDFa, is just a small cost of this flexibility. >> 3. It divorces the CURIE prefix definitions from the use of CURIEs in >> the markup. This makes it more vulnerable to copy-paste problems. (As I >> understand <link rel="metadata"> in the proposal, CURIE prefix >> definitions can even be separated out into an external file. This >> obscures them greatly and will certainly be a cause of copy-paste >> issues!) > > If you're using inline CRDF, then yeah, the prefix definitions may be > far from the content. The point here is that in RDFa, the prefix definitions *may* also be far from the content that uses them. Both in RDFa and in CRDF, however, there is quite flexibility to put them quite close to the content that uses them. > The prefixes are defined globally for the document, and may appear anywhere. Ehm. Please check "3.4. Additional considerations" in the doc: prefixes in CRDF *can* be defined globally, but may also be defined for a limited scope. Actually, this quite mimics the XMLish ability to scope "xmlns:" definitions into any arbitrary element (although it's a bit more limited, because it requires a <script> in CRDF). > In practice, inline CRDF should be > rare, and the prefixes should appear at the top of the .crdf file > where they can be easily seen. I wouldn't make valorations about how CRDF would be used in practice until/unless CRDF actually gets used in practice. I have already stated that inline CRDF is perfectly legitimate; and IMO it has a wider range of use cases than inline @style. Let me point out that, the same way it would be silly to put a "xmlns:" definition in the <body> tag for a prefix that only gets used once or twice, deep in the document; it would be equally silly to define prefixes within <head> or externally for a similar usage. Again, section 3.4 of the CRDF document is deliberately intended to deal with these cases, and it has been there since the first version posted on these lists (although I don't know for sure if the section number was the same). Definitelly, I'll be giving it a more descriptive name on the next iteration of the document. On Thu, May 21, 2009 at 1:29 AM, Toby A Inkster <mail at tobyinkster.co.uk> wrote: > On 20 May 2009, at 23:10, Tab Atkins Jr. wrote: >> We are going to have to massively disagree on this point. ^_^ I love >> CSS syntax. > So do I, but CRDF as defined is no more like CSS in terms of syntax than C > or Perl are - they share the curly braces and semicolons, but not much else. Ehm. What about Selectors? and the @namespace syntax, which is directly taken from CSS3 Namespaces? Or the classic property:value syntax? The calc(), attr(), and url() notations maybe? Sure, there are differences with CSS. There are enough differences that I put a rationale and justification to them on "2.2. Rationale on the differences with CSS syntax", maybe you should review it. Keep in mind that, if you don't like the shorthand syntax, you are also allowed to explicitly put the "contents" keyword as much as you please. Maybe you could complain about the "|" inside properties, but it's also a given: CSS3 Namespaces defines "|" for separating prefixes; just that CSS would only use it on selectors, while CRDF uses it on properties because, basically, CSS properties are not namespaced but RDF ones are. Would it be better to define yet another separator? Even if so, the colon is out of question: that would be a nightmare for parsers, and horrendous to read. If you complain about the "@|subject" pseudo-property syntax, I can agree on it being ugly. Do you have any better suggestion? The reasons to chose that syntax were: 1st: it needs to carry a "|" *or* use some other foolproofing mechanism (to ensure that things don't become weird if an author makes the mistake of putting the CSS and the CRDF on the same file). 2nd: to decide what to put before the "|", I thought that pseudo-properties are quite comparable to at-rules, so the "@" seemed a good choice. Other options considered where to put nothing (so we'd have "|subject"), or using a reserved prefix (having something like "crdf|subject"). I don't really like the reserved prefix idea, but if the "@" is that much ugly, it may be taken as a "lesser evil". The idea of no namespace at all might work. Originally, I discarded it because I wasn't sure how would it interact with default namespaces; but afterwards default namespaces were completely avoided, so I could fall back to it if people find it less ugly. Feedback on either of these alternatives, or any other suggestion, would be quite welcome. What's clear is that there is a need to define, at least, subjects for the triples, and their types. >> It is rarely, if ever, necessary to set multiple <img> elements to the >> same @src or @alt. > > I'm thinking of things like a table which has a check-mark column with a > green tick image repeated all the way down, or a traffic-light indicator > column with red, green and perhaps amber images indicating different > statuses. I quite often see such things in web applications. Let's take the traffic-light example, just because I find it clearer: doesn't [2] deal with that? For example: <body> <head> <style type="text/css"> td.red { content: url(http://example.org/images/red.png) } td.green { content: url(http://example.org/images/green.png) } </style> </head> <body> <table><tr><th>state</th><th>whatever</th> <tr><td class="red">Wrong</td><td>This has the red image</td></tr> <tr><td class="green">Good</td><td>And this has the green image</td></tr> </table> </body> </html> In this case, you would need the "red" and "green" classes; but no matter what you do, you need some mechanism to distinguish between the "red" and "green" rows. Note also that if the images fail to load (for example, images disabled or network issues), the browser would fallback and show the contents of the element, which works mostly like @alt would, but also allowing additional markup. And if you want tooltips, just swap the @class by @title, and change the selectors to td[title=red] and the same for green ;-). So, in summary; repetition in general is perceived as a problem, and this is why efforts are being made on addressing it. Regards, Eduard Pascual PS: BTW, replying to three emails at once is *not* sane References: [1] [http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019635.html] [2] [http://www.w3.org/TR/css3-content/#replacedContent]
Received on Thursday, 21 May 2009 04:26:54 UTC