- From: Eduard Pascual <herenvardo@gmail.com>
- Date: Thu, 21 May 2009 13:57:15 +0200
Interesting. Despite my PoV against the microdata proposal, I've taken a look at it and find a minor typo: Within "5.4.1 vCard", by the end of the "n" property description, the spec reads: "The value of the fn property a name in one of the following forms:" shouldn't it read: "The value of the fn property is a name in one of the following forms:" ? Maybe this will grant me a seat for posterity on the acknowledgements section =P. On Wed, May 20, 2009 at 1:07 AM, Ian Hickson <ian at hixie.ch> wrote: > > Some of the use cases I collected from the e-mails sent in over the past > few months were the following: > > ? USE CASE: Exposing contact details so that users can add people to their > ? address books or social networking sites. > > ? SCENARIOS: > ? ? * Instead of giving a colleague a business card, someone gives their > ? ? ? colleague a URL, and that colleague's user agent extracts basic > ? ? ? profile information such as the person's name along with references to > ? ? ? other people that person knows and adds the information into an > ? ? ? address book. > ? ? * A scholar and teacher wants other scholars (and potentially students) > ? ? ? to be able to easily extract information about who he is to add it to > ? ? ? their contact databases. > ? ? * Fred copies the names of one of his Facebook friends and pastes it > ? ? ? into his OS address book; the contact information is imported > ? ? ? automatically. > ? ? * Fred copies the names of one of his Facebook friends and pastes it > ? ? ? into his Webmail's address book feature; the contact information is > ? ? ? imported automatically. > ? ? * David can use the data in a web page to generate a custom browser UI > ? ? ? for including a person in our address book without using brittle > ? ? ? screen-scraping. > > ? REQUIREMENTS: > ? ? * A user joining a new social network should be able to identify himself > ? ? ? to the new social network in way that enables the new social network > ? ? ? to bootstrap his account from existing published data (e.g. from > ? ? ? another social nework) rather than having to re-enter it, without the > ? ? ? new site having to coordinate (or know about) the pre-existing site, > ? ? ? without the user having to give either sites credentials to the other, > ? ? ? and without the new site finding out about relationships that the user > ? ? ? has intentionally kept secret. > ? ? ? (http://w2spconf.com/2008/papers/s3p2.pdf) > ? ? * Data should not need to be duplicated between machine-readable and > ? ? ? human-readable forms (i.e. the human-readable form should be > ? ? ? machine-readable). > ? ? * Shouldn't require the consumer to write XSLT or server-side code to > ? ? ? read the contact information. > ? ? * Machine-readable contact information shouldn't be on a separate page > ? ? ? than human-readable contact information. > ? ? * The information should be convertible into a dedicated form (RDF, > ? ? ? JSON, XML, vCard) in a consistent manner, so that tools that use this > ? ? ? information separate from the pages on which it is found have a > ? ? ? standard way of conveying the information. > ? ? * Should be possible for different parts of a contact to be given in > ? ? ? different parts of the page. For example, a page with contact details > ? ? ? for people in columns (with each row giving the name, telephone > ? ? ? number, etc) should still have unambiguous grouped contact details > ? ? ? parseable from it. > ? ? * Parsing rules should be unambiguous. > ? ? * Should not require changes to HTML5 parsing rules. > > > ? USE CASE: Exposing calendar events so that users can add those events to > ? their calendaring systems. > > ? SCENARIOS: > ? ? * A user visits the Avenue Q site and wants to make a note of when > ? ? ? tickets go on sale for the tour's stop in his home town. The site says > ? ? ? "October 3rd", so the user clicks this and selects "add to calendar", > ? ? ? which causes an entry to be added to his calendar. > ? ? * A student is making a timeline of important events in Apple's history. > ? ? ? As he reads Wikipedia entries on the topic, he clicks on dates and > ? ? ? selects "add to timeline", which causes an entry to be added to his > ? ? ? timeline. > ? ? * TV guide listings - browsers should be able to expose to the user's > ? ? ? tools (e.g. calendar, DVR, TV tuner) the times that a TV show is on. > ? ? * Paul sometimes gives talks on various topics, and announces them on > ? ? ? his blog. He would like to mark up these announcements with proper > ? ? ? scheduling information, so that his readers' software can > ? ? ? automatically obtain the scheduling information and add it to their > ? ? ? calendar. Importantly, some of the rendered data might be more > ? ? ? informal than the machine-readable data required to produce a calendar > ? ? ? event. > ? ? * David can use the data in a web page to generate a custom browser UI > ? ? ? for adding an event to our calendaring software without using brittle > ? ? ? screen-scraping. > ? ? * http://livebrum.co.uk/: the author would like people to be able to > ? ? ? grab events and event listings from his site and put them on their > ? ? ? site with as much information as possible retained. "The fantasy would > ? ? ? be that I could provide code that could be cut and pasted into someone > ? ? ? else's HTML so the average blogger could re-use and re-share my data." > ? ? * User should be able to subscribe to http://livebrum.co.uk/ then sort > ? ? ? by date and see the items sorted by event date, not publication date. > > ? REQUIREMENTS: > ? ? * Should be discoverable. > ? ? * Should be compatible with existing calendar systems. > ? ? * Should be unlikely to get out of sync with prose on the page. > ? ? * Shouldn't require the consumer to write XSLT or server-side code to > ? ? ? read the calendar information. > ? ? * Machine-readable event data shouldn't be on a separate page than > ? ? ? human-readable dates. > ? ? * The information should be convertible into a dedicated form (RDF, > ? ? ? JSON, XML, iCalendar) in a consistent manner, so that tools that use > ? ? ? this information separate from the pages on which it is found have a > ? ? ? standard way of conveying the information. > ? ? * Should be possible for different parts of an event to be given in > ? ? ? different parts of the page. For example, a page with calendar events > ? ? ? in columns (with each row giving the time, date, place, etc) should > ? ? ? still have unambiguous calendar events parseable from it. > ? ? * Should be possible for authors to find out if people are reusing the > ? ? ? information on their site. > ? ? * Code should not be ugly (e.g. should not be mixed in with markup used > ? ? ? mostly for styling). > ? ? * There should be "obvious parsing tools for people to actually do > ? ? ? anything with the data (other than add an event to a calendar)". > ? ? * Solution should not feel "disconnected" from the Web the way that > ? ? ? calendar file downloads do. > ? ? * Parsing rules should be unambiguous. > ? ? * Should not require changes to HTML5 parsing rules. > > > ? USE CASE: Allow users to maintain bibliographies or otherwise keep track > ? of sources of quotes or references. > > ? SCENARIOS: > ? ? * Frank copies a sentence from Wikipedia and pastes it in some word > ? ? ? processor: it would be great if the word processor offered to > ? ? ? automatically create a bibliographic entry. > ? ? * Patrick keeps a list of his scientific publications on his web site. > ? ? ? He would like to provide structure within this publications page so > ? ? ? that Frank can automatically extract this information and use it to > ? ? ? cite Patrick's papers without having to transcribe the bibliographic > ? ? ? information. > ? ? * A scholar and teacher wants other scholars (and potentially students) > ? ? ? to be able to easily extract information about what he has published > ? ? ? to add it to their bibliographic applications. > ? ? * A scholar and teacher wants to publish scholarly documents or content > ? ? ? that includes extensive citations that readers can then automatically > ? ? ? extract so that they can find them in their local university library. > ? ? ? These citations may be for a wide range of different sources: an > ? ? ? interview posted on YouTube, a legal opinion posted on the Supreme > ? ? ? Court web site, a press release from the White House. > > ? REQUIREMENTS: > ? ? * Machine-readable bibliographic information shouldn't be on a separate > ? ? ? page than human-readable bibliographic information. > ? ? * The information should be convertible into a dedicated form (RDF, > ? ? ? JSON, XML, BibTex) in a consistent manner, so that tools that use this > ? ? ? information separate from the pages on which it is found have a > ? ? ? standard way of conveying the information. > ? ? * Parsing rules should be unambiguous. > ? ? * Should not require changes to HTML5 parsing rules. > > > The first two use cases can basically be done today using the hCard and > hCalendar Microformats, but the parsing rules for these Microformats are > somewhat vague, and they aren't easily extensible without hardcoding > extensions into parsers. > > I propose, therefore, to take the hCard and vCalendar vocabularies, and > recast them onto the new microdata model. > > ? http://www.whatwg.org/specs/web-apps/current-work/#vcard > ? http://www.whatwg.org/specs/web-apps/current-work/#vevent > > I have used the knowledge and experience collected and carefully > documented by the Microformats team on their wiki, and written a direct > mapping of those vocabularies to microdata, along with very explicit > definitions for how to convert this data to vCard and iCalendar files, > something which was lacking in the hCard and hCalendar definitions: > > ? http://www.whatwg.org/specs/web-apps/current-work/#vcard-0 > ? http://www.whatwg.org/specs/web-apps/current-work/#icalendar > > The third use case requires a vocabulary for citations, which isn't > something for which a widely deployed solution exists in text/html yet. > > There are a large number of options: > > ?- Refer > ?- RIS > ?- BibTeX > ?- Metadata Object Description Schema > ?- Z39.80 > ?- Dublin Core and variants thereof > ?- part of Journal Publishing Tag Set Tag Library > ?- part of XML Resume > ?- part of OOXML > ?- part of ODF > ?- part of DocBook > ?- the Ann Arbor District Library XML format > ?- SRU > ?- My alma mater's format (University of Bath reference type) > ?- Bibliontology > ?- The Citation Oriented Bibliographic Vocabulary > ?- ISBD > ?- OpenURL COinS > > ...and many more. > > A case could probably be made for any one of these. Based on availability > of tools, simplicity in the format (just name-value pairs vs deeply nested > trees of typed data), actual use in citation-happy fields, extensibility, > use of an understandable vocabulary (e.g. "author" vs "%A"), etc, I ended > up picking the BibTeX vocabulary. It isn't perfect; for example, it's not > going to be a great solution for citing YouTube clips yet. But since it is > relatively easy to extend (and indeed, it has historically been extended > by several groups), it seems like if this feature gets good adoption, we > will be able to extend it to support more types. > > Thus, BibTeX vocabulary for microdata: > > ? http://www.whatwg.org/specs/web-apps/current-work/#bibtex > > Exporting microdata to BibTeX: > > ? http://www.whatwg.org/specs/web-apps/current-work/#bibtex-0 > > > The vocabularies and exports are pretty much useless on their own, though. > There are two ways that make this actually useful: > > ?- There's a scripting API that exposes the microdata and so people can > ? write generic client-side scripts to expose data on the page, and > > ?- User agents are now required to export vCard, iCalendar, and BibTeX > ? when someone drags a selection that includes data marked up with those > ? vocabularies. > > The latter in particular is IMHO very important. Both of these features > require browser implementation support, which IMHO is important to making > anything like this work widely (and has been a sore point with previous > solutions in this space). > > > I shall now go through the scenarios and requirements to show how they can > now be addressed. > > ? USE CASE: Exposing contact details so that users can add people to their > ? address books or social networking sites. > > ? SCENARIOS: > ? ? * Instead of giving a colleague a business card, someone gives their > ? ? ? colleague a URL, and that colleague's user agent extracts basic > ? ? ? profile information such as the person's name along with references to > ? ? ? other people that person knows and adds the information into an > ? ? ? address book. > > This is possible today without using HTML, just make the URL point to a > vCard text/directory resource. > > > ? ? * A scholar and teacher wants other scholars (and potentially students) > ? ? ? to be able to easily extract information about who he is to add it to > ? ? ? their contact databases. > > This is now easy -- given microdata with a vCard, the scholars need but > drag that information to their contact databases, and assuming those > contact databases support vCard, they can import the information directly. > Alternatively, a script can be written in less than 200 lines of code to > convert the microdata to vCard (or other formats) for direct download. (I > wrote proof-of-concept scripts using the APIs in the spec to export vCard, > vEvent, and BibTeX data. The vCard one was about 140 lines; the BibTeX one > was about 60 lines. The vEvent one is in the spec as an example -- search > for getCalendar() -- and is less than 40 lines.) > > > ? ? * Fred copies the names of one of his Facebook friends and pastes it > ? ? ? into his OS address book; the contact information is imported > ? ? ? automatically. > > Assuming the OS address book supports vCard, this is now supported > natively -- all Facebook has to do is encode the data as vCard microdata. > > > ? ? * Fred copies the names of one of his Facebook friends and pastes it > ? ? ? into his Webmail's address book feature; the contact information is > ? ? ? imported automatically. > > If his Webmail supports HTML5 drag and drop (copy-and-paste is defined in > terms of drag-and-drop), then an HTML5 user agent will include all the > microdata of the copied selection in a JSON blob, including the vCard > data. (Actual vCard will also be included.) This is now thus automatically > supported assuming that the sites both use the same vocabulary, implement > the drag-and-drop API, and the user has an HTML5 browser. > > > ? ? * David can use the data in a web page to generate a custom browser UI > ? ? ? for including a person in our address book without using brittle > ? ? ? screen-scraping. > > The spec defines exactly how to get a vCard out of a random HTML page, so > screen-scraping should no longer be necessary. > > > ? REQUIREMENTS: > ? ? * A user joining a new social network should be able to identify himself > ? ? ? to the new social network in way that enables the new social network > ? ? ? to bootstrap his account from existing published data (e.g. from > ? ? ? another social nework) rather than having to re-enter it, without the > ? ? ? new site having to coordinate (or know about) the pre-existing site, > ? ? ? without the user having to give either sites credentials to the other, > ? ? ? and without the new site finding out about relationships that the user > ? ? ? has intentionally kept secret. > ? ? ? (http://w2spconf.com/2008/papers/s3p2.pdf) > > Assuming both sites support the same vocabulary and can identify people > uniquely somehow, this is now possible using microdata (just as it has > been possible using custom microformat-like vocabularies before, or RDFa > and other embedded data formats before). Whether sites will support this > is up to the sites in question; I see no way to force the issue. > > As far as I can tell the privacy problem listed above is not intrinsicly > solved by the microdata solution. I cannot find a solution to those > problems at the HTML level; they seem inherently application-bound. > > > ? ? * Data should not need to be duplicated between machine-readable and > ? ? ? human-readable forms (i.e. the human-readable form should be > ? ? ? machine-readable). > > By and large, this is met. For some of the more esoteric vEvent features > (like repeating rules) I have opted for not really supporting them > natively, but just allowing authors to use the vEvent rules directly. This > is not really an issue as far as I can tell because those features aren't > widely used (and even seem to be getting dropped in the newer version of > iCalendar). > > > ? ? * Shouldn't require the consumer to write XSLT or server-side code to > ? ? ? read the contact information. > > While it's possible for people to write custom code to process this data, > the spec requires browsers to support this natively, making this > unnecessary for these vocabularies. > > > ? ? * Machine-readable contact information shouldn't be on a separate page > ? ? ? than human-readable contact information. > > This requirement is met. > > > ? ? * The information should be convertible into a dedicated form (RDF, > ? ? ? JSON, XML, vCard) in a consistent manner, so that tools that use this > ? ? ? information separate from the pages on which it is found have a > ? ? ? standard way of conveying the information. > > I haven't defined a way to convert this data to XML, but I have provided > explicit ways to convert to JSON, RDF, and vCard. > > > ? ? * Should be possible for different parts of a contact to be given in > ? ? ? different parts of the page. For example, a page with contact details > ? ? ? for people in columns (with each row giving the name, telephone > ? ? ? number, etc) should still have unambiguous grouped contact details > ? ? ? parseable from it. > > Using subject="", this is possible. > > > ? ? * Parsing rules should be unambiguous. > > I hope the parsing rules described in the spec are clear enough. Please > let me know if there are any problems. > > > ? ? * Should not require changes to HTML5 parsing rules. > > The HTML5 parsing rules did not change. > > > ? USE CASE: Exposing calendar events so that users can add those events to > ? their calendaring systems. > > ? SCENARIOS: > ? ? * A user visits the Avenue Q site and wants to make a note of when > ? ? ? tickets go on sale for the tour's stop in his home town. The site says > ? ? ? "October 3rd", so the user clicks this and selects "add to calendar", > ? ? ? which causes an entry to be added to his calendar. > > As demonstrated in the spec, it is not relatively easy to expose this data > and requires little code to convert this data into a form supported by > most calendars. In addition, this can also be supported using > copy-and-paste or drag-and-drop if the source, destination, and browser > all cooperate according to the spec. > > > ? ? * A student is making a timeline of important events in Apple's history. > ? ? ? As he reads Wikipedia entries on the topic, he clicks on dates and > ? ? ? selects "add to timeline", which causes an entry to be added to his > ? ? ? timeline. > > I couldn't find a way to address this as described unless Wikipedia and > the timeline utility cooperated directly. (Drag-and-drop and copy-and- > paste cases can be easily supported, though.) > > > ? ? * TV guide listings - browsers should be able to expose to the user's > ? ? ? tools (e.g. calendar, DVR, TV tuner) the times that a TV show is on. > > Assuming TV guide listings can be described in vEvent form, this is now > possible using drag-and-drop and copy-and-paste. > > > ? ? * Paul sometimes gives talks on various topics, and announces them on > ? ? ? his blog. He would like to mark up these announcements with proper > ? ? ? scheduling information, so that his readers' software can > ? ? ? automatically obtain the scheduling information and add it to their > ? ? ? calendar. Importantly, some of the rendered data might be more > ? ? ? informal than the machine-readable data required to produce a calendar > ? ? ? event. > > This seems easily handled now. > > > ? ? * David can use the data in a web page to generate a custom browser UI > ? ? ? for adding an event to our calendaring software without using brittle > ? ? ? screen-scraping. > > The example in the spec demonstrates that this is now possible with > relatively little code. > > > ? ? * http://livebrum.co.uk/: the author would like people to be able to > ? ? ? grab events and event listings from his site and put them on their > ? ? ? site with as much information as possible retained. "The fantasy would > ? ? ? be that I could provide code that could be cut and pasted into someone > ? ? ? else's HTML so the average blogger could re-use and re-share my data." > > I have included an example in the spec from livebrum.co.uk showing how > this is possible. > > > ? ? * User should be able to subscribe to http://livebrum.co.uk/ then sort > ? ? ? by date and see the items sorted by event date, not publication date. > > This isn't directly possible, but if a tool exists that can sort event > data by date, then given the event data it seems possible to do this > easily. For example, a Web Calendar product could support parsing > microdata vEvents out of a Web page and then could offer to subscribe to > such a page as a feed. > > > > ? REQUIREMENTS: > ? ? * Should be discoverable. > > This isn't met by the microdata vEvent vocabulary intrinsically. I expect > that a convention will arise where people put little icons near their > microdata saying "look, we have vEvent data you can drag to your > calendar!" or some such. > > > ? ? * Should be compatible with existing calendar systems. > > The vEvent part of iCalendar is well established, so this seems met, at > least in principle. The details (e.g. drag and drop support) probably need > some work. > > > ? ? * Should be unlikely to get out of sync with prose on the page. > > By making the prose on the page the source for the microdata, this seems > resolved. > > > ? ? * Shouldn't require the consumer to write XSLT or server-side code to > ? ? ? read the calendar information. > > This is mostly met in the same way as for contact data. > > > ? ? * Machine-readable event data shouldn't be on a separate page than > ? ? ? human-readable dates. > > This is achieved using inline microdata. > > > ? ? * The information should be convertible into a dedicated form (RDF, > ? ? ? JSON, XML, iCalendar) in a consistent manner, so that tools that use > ? ? ? this information separate from the pages on which it is found have a > ? ? ? standard way of conveying the information. > > Output in all those formats except raw XML is explicitly supported in the > spec. > > > ? ? * Should be possible for different parts of an event to be given in > ? ? ? different parts of the page. For example, a page with calendar events > ? ? ? in columns (with each row giving the time, date, place, etc) should > ? ? ? still have unambiguous calendar events parseable from it. > > subject="" supports this. > > > ? ? * Should be possible for authors to find out if people are reusing the > ? ? ? information on their site. > > This isn't met. I couldn't find a good way to do this. When JavaScript is > enabled, drag-and-drop, copy-and-paste, and other mechanisms can be > detected and logged via script, but really there's no good way to detect > all uses of microdata. (Providing a ping=""-like feature for this seems > like overkill and wouldn't help with non-end-user use anyway.) > > > ? ? * Code should not be ugly (e.g. should not be mixed in with markup used > ? ? ? mostly for styling). > > This appears to be met. > > > ? ? * There should be "obvious parsing tools for people to actually do > ? ? ? anything with the data (other than add an event to a calendar)". > > There aren't any obvious tools yet, but since two separate implementations > arose in less than 24 hours from the point where the microdata stuff was > released, it seems like this will prove easy enough to do. > > > ? ? * Solution should not feel "disconnected" from the Web the way that > ? ? ? calendar file downloads do. > > This seems met. > > > ? ? * Parsing rules should be unambiguous. > ? ? * Should not require changes to HTML5 parsing rules. > > The same applies here as with vCard. > > > ? USE CASE: Allow users to maintain bibliographies or otherwise keep track > ? of sources of quotes or references. > > ? SCENARIOS: > ? ? * Frank copies a sentence from Wikipedia and pastes it in some word > ? ? ? processor: it would be great if the word processor offered to > ? ? ? automatically create a bibliographic entry. > > This will require new code in the word processor, but the information, in > an HTML5-compliant browser according to this proposal, would include the > information required to do this. > > > ? ? * Patrick keeps a list of his scientific publications on his web site. > ? ? ? He would like to provide structure within this publications page so > ? ? ? that Frank can automatically extract this information and use it to > ? ? ? cite Patrick's papers without having to transcribe the bibliographic > ? ? ? information. > > This seems to be handled directly now if the page is written using the > BibTeX vocabulary. > > > ? ? * A scholar and teacher wants other scholars (and potentially students) > ? ? ? to be able to easily extract information about what he has published > ? ? ? to add it to their bibliographic applications. > > This seems met in the same way. > > > ? ? * A scholar and teacher wants to publish scholarly documents or content > ? ? ? that includes extensive citations that readers can then automatically > ? ? ? extract so that they can find them in their local university library. > ? ? ? These citations may be for a wide range of different sources: an > ? ? ? interview posted on YouTube, a legal opinion posted on the Supreme > ? ? ? Court web site, a press release from the White House. > > Not all of these types are immediately supported by the BibTeX vocabulary. > I recommend that we extend the BibTeX set over time if this feature gains > a critical mass. > > > ? REQUIREMENTS: > ? ? * Machine-readable bibliographic information shouldn't be on a separate > ? ? ? page than human-readable bibliographic information. > > This is met. > > > ? ? * The information should be convertible into a dedicated form (RDF, > ? ? ? JSON, XML, BibTex) in a consistent manner, so that tools that use this > ? ? ? information separate from the pages on which it is found have a > ? ? ? standard way of conveying the information. > > This is met explicitly for three of those types; for other types it can > be done easily enough also though it is not defined in the spec. > > > ? ? * Parsing rules should be unambiguous. > ? ? * Should not require changes to HTML5 parsing rules. > > These are met in the same way as with vCard and vEvent microdata. > > > In conclusion, to address these use cases and scenarios I've introduced > three vocabularies based on past practices -- vCard, vEvent, and BibTeX -- > to the HTML5 specification, and I've defined how these vocabularies work > in the context of the drag-and-drop model, which I believe is the core > part of this proposal that has been lacking in other proposals previously. > > > A number of further use cases remain to be examined, including one with > scenarios regarding validating custom vocabularies and allowing editors to > provide help with custom vocabularies. I will send further e-mail next > week as I address them. > > -- > Ian Hickson ? ? ? ? ? ? ? U+1047E ? ? ? ? ? ? ? ?)\._.,--....,'``. ? ?fL > http://ln.hixie.ch/ ? ? ? U+263A ? ? ? ? ? ? ? ?/, ? _.. \ ? _\ ?;`._ ,. > Things that are impossible just take longer. ? `._.-(,_..'--(,_..'`-.;.' >
Received on Thursday, 21 May 2009 04:57:15 UTC