- From: Philip Jägenstedt <philipj@opera.com>
- Date: Thu, 12 Nov 2009 03:23:54 +0100
I've been playing with the microdata DOM APIs again, continuing the JavaScript experimental implementation <http://gitorious.org/microdatajs>. It's not small or elegant, but at least some spec issues have come up in the process. What is the http://www.w3.org/1999/xhtml/microdata# URI? Just leftovers from earlier revisions to the spec? Why are the algorithms for extracting RDF gone? All that's left is the book example with the equivalent Turtle, but it would be nice if it were actually defined how to extract RDF. The same for the JSON stuff, was that no good? http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#associating-names-with-items "Otherwise, if one of the other elements in pending is an ancestor element of candidate, and that element is scope, then remove candidate from pending." "Otherwise, if one of the other elements in pending is an ancestor element of candidate, and that element also has scope as its nearest ancestor element with an itemscope attribute specified, then remove candidate from pending." The intention of these requirements seems to be to eliminate redundant elements in pending, but a comment on the intention of each in the spec would be helpful as it's quite cryptic right now. http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#microdata-dom-api itemtype and itemid are both URL attributes and therefore when getting itemType and itemId relative URLs should be resolved (even if only absolute URLs are valid). Correct? itemprop and itemref are both "unordered set of unique space-separated tokens", but in HTMLElement only itemProp is a DOMSettableTokenList while itemRef is a DOMString. This doesn't really make sense, so make itemRef a DOMSettableTokenList too? From reading the spec it's not obvious (without following cross-references) that itemProp isn't just a plain string. An example using .itemProp.contains(name) or similar would make this more difficult to miss. http://www.whatwg.org/specs/vocabs/current-work/#vcard Having clickable cross-references in this spec would help a lot when reviewing! Grammar: Let value *be* the result of collecting the first vCard subproperty named value in subitem. "Let n1 be the value of the first property named family-name in subitem, or the empty string if there is no such property or the property's value is itself an item." Why not use "collecting the first vCard subproperty" here? Not doing so had me trying to find how the two were different, but I couldn't find any differences given that the values are later escaped. There's also the issue of how newlines from textContent values are escaped. Applying the vCard extraction algorithm to the spec example gives: BEGIN:VCARD PROFILE:VCARD VERSION:3.0 SOURCE:http://foolip.org/microdatajs/demo/vcard.html NAME:vCard demo FN:Jack Bauer PHOTO;VALUE=URI:http://foolip.org/microdatajs/demo/jack-bauer.jpg ORG:Counter-Terrorist Unit;Los Angeles Division ADR:;;10201 W. Pico Blvd.;Los Angeles;CA;90064;United States GEO:34.052339;-118.410623 TEL;TYPE=work:+1 (310)\n 597 3781 URL;VALUE=URI:http://en.wikipedia.org/wiki/Jack_Bauer URL;VALUE=URI:http://www.jackbauerfacts.com/ EMAIL:j.bauer at la.ctu.gov.invalid TEL;TYPE=cell:+1 (310) 555\n 3781 NOTE:If I'm out in the field\, you may be better off\n contacting Chloe O'B rian if it's about\n work\, or ask Tony Almeida if\n you're interested in the CTU five-a-side football team we're trying\n to get going. AGENT;VALUE=VCARD:BEGIN:VCARD\nPROFILE:VCARD\nVERSION:3.0\nSOURCE:http://fo olip.org/microdatajs/demo/vcard.html\nNAME:vCard demo\nEMAIL\;VALUE=URI:ma ilto:c.obrian at la.ctu.gov.invalid\nFN:Chloe O'Brian\nN:O'Brian\;Chloe\;\;\; \nEND:VCARD\n AGENT:Tony Almeida REV:2008-07-20T21:00:00+0100 TEL;TYPE=home:01632 960 123 N:Bauer;Jack;;; END:VCARD TEL and NOTE has line breaks that are just because of how the HTML source is formatted. Importing this into Gmail preserves these linebreaks which looks quite broken. Unless we expect text fields to contain meaningful formatting, perhaps simply collapsing all whitespace into a single space is OK? In the best of worlds <br> would be converted to \n, but I'm not sure if it's worth the trouble. Finally on vCard, the final part of the extraction algorithm goes to great trouble to guess what is the family name and what is the given name. This guess will be broken for transliterated east Asian names (CJKV that I know of, maybe others too). Just saying. Also, why is it important to explicitly add N:;;;; for organizations? http://www.whatwg.org/specs/vocabs/current-work/#vevent "Add an iCalendar line with the type name and the value value to output." At this point value is undefined. Given the algorithm for extracting iCal, it seems that dtstart and dtend must be specified using <time datetime="">, as it's only for time elements that the time stamps will be properly formatted (stripping - and :) There are some errors in the example. I got it working by applying this diff: --- vevent.js.orig 2009-11-11 10:52:37.000000000 +0100 +++ vevent.js 2009-11-11 23:54:15.000000000 +0100 @@ -1,3 +1,3 @@ function getCalendar(node) { - while (node && (!node.nodeScope || !node.itemType == 'http://microformats.org/profile/hcalendar#vevent')) + while (node && (!node.itemScope || !node.itemType == 'http://microformats.org/profile/hcalendar#vevent')) node = node.parentNode; @@ -26,3 +26,3 @@ value = value.replace(/;/g, '\\;'); - value = value.replace(/,/g, \\,'); + value = value.replace(/,/g, '\\,'); value = value.replace(/\n/g, '\\n'); @@ -31,3 +31,3 @@ var name = prop.itemProp[nameIndex]; - if (!name.match(':') && !name.match('.')) + if (!name.match(':') && !name.match('\\.')) calendar += name.toUpperCase() + parameters + ':' + value + '\r\n'; Perhaps /\./ would be better to make it clear that it's a regexp. Also: if (prop.date && prop.time) date and time aren't properties on HTMLTimeElement, I don't know what this is. Is there or should there be a DOM API for determining if a string is a valid date string other than implementing those algorithms in script? http://www.whatwg.org/specs/vocabs/current-work/#licensing-works What's the n in http://n.whatwg.org/work? If this URL is going to stick, it would be nice if there were also something to be seen at that page. Also, the conversion to RDF section isn't really useful and seems to hide some assumptions about how the properties vocabulary should be prefixed with http://n.whatwg.org/work and how the http://www.w3.org/1999/xhtml/microdata# prefix is supposed to be used. http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#domtokenlist The DOM intro box doesn't explain the return value for .toggle(), you have to consult the algorithm to figure it out. I'm sure there will be more issues, but that's it for now. -- Philip J?genstedt
Received on Wednesday, 11 November 2009 18:23:54 UTC