- From: timeless <timeless@gmail.com>
- Date: Fri, 24 Apr 2009 14:35:05 +0300
The contacts section uses "event" where it meant contact On 4/23/09, Ian Hickson <ian at hixie.ch> wrote: > > [bcc'ed previous participants in this discussion] > > Earlier this year I asked for use cases that HTML5 did not yet cover, with > an emphasis on use cases relating to semantic microdata. I list below the > use cases and requirements that I derived from the response to that > request, and from related discussions. > > I would appreciate it if people could review this list for errors or > important omissions, before I go through the list to work out whether > these use cases already have solutions, or whether we should have > solutions for these use cases in HTML, or whether we should address these > use cases with other technologies, or whatnot. > > I encourage people to focus on the use cases themselves, rather than on > potential solutions; various solutions to all these use cases have already > been argued in great detail and I have already read all those e-mails, > blog comments, wiki faqs, etc, carefully. > > My primary concern right now is in making sure that these are indeed the > use cases people care about, so that whatever we add to the spec can be > carefully evaluated to make sure it is in fact solving the problems that > we want solving. > > ============================================================================== > > Exposing known data types in a reusable way > > USE CASE: Exposing calendar events so that users can add those events to > their calendaring systems. > > SCENARIOS: > > * A user visits the Avenue Q site and wants to make a note of when > tickets go on sale for the tour's stop in his home town. The site > says > "October 3rd", so the user clicks this and selects "add to calendar", > which causes an entry to be added to his calendar. > * A student is making a timeline of important events in Apple's > history. > As he reads Wikipedia entries on the topic, he clicks on dates and > selects "add to timeline", which causes an entry to be added to his > timeline. > * TV guide listings - browsers should be able to expose to the user's > tools (e.g. calendar, DVR, TV tuner) the times that a TV show is on. > * Paul sometimes gives talks on various topics, and announces them on > his blog. He would like to mark up these announcements with proper > scheduling information, so that his readers' software can > automatically obtain the scheduling information and add it to their > calendar. Importantly, some of the rendered data might be more > informal than the machine-readable data required to produce a > calendar > event. Also of importance: Paul may want to annotate his event with a > combination of existing vocabularies and a new vocabulary of his own > design. (why?) > * David can use the data in a web page to generate a custom browser UI > for adding an event to our calendaring software without using brittle > screen-scraping. > > REQUIREMENTS: > > * Should be discoverable. > * Should be compatible with existing calendar systems. > * Should be unlikely to get out of sync with prose on the page. > * Shouldn't require the consumer to write XSLT or server-side code to > read the calendar information. > * Machine-readable event data shouldn't be on a separate page than > human-readable dates. > * The information should be convertible into a dedicated form (RDF, > JSON, XML, iCalendar) in a consistent manner, so that tools that use > this information separate from the pages on which it is found have a > standard way of conveying the information. > * Should be possible for different parts of an event to be given in > different parts of the page. For example, a page with calendar events > in columns (with each row giving the time, date, place, etc) should > still have unambiguous calendar events parseable from it. > > > --------------------------------------------------------------------------- > > USE CASE: Exposing contact details so that users can add people to their > address books or social networking sites. > > SCENARIOS: > > * Instead of giving a colleague a business card, someone gives their > colleague a URL, and that colleague's user agent extracts basic > profile information such as the person's name along with references > to > other people that person knows and adds the information into an > address book. > * A scholar and teacher wants other scholars (and potentially students) > to be able to easily extract information about who he is to add it to > their contact databases. > * Fred copies the names of one of his Facebook friends and pastes it > into his OS address book; the contact information is imported > automatically. > * Fred copies the names of one of his Facebook friends and pastes it > into his Webmail's address book feature; the contact information is > imported automatically. > * David can use the data in a web page to generate a custom browser UI > for including a person in our address book without using brittle > screen-scraping. > > REQUIREMENTS: > > * A user joining a new social network should be able to identify > himself > to the new social network in way that enables the new social network > to bootstrap his account from existing published data (e.g. from > another social nework) rather than having to re-enter it, without the > new site having to coordinate (or know about) the pre-existing site, > without the user having to give either sites credentials to the > other, > and without the new site finding out about relationships that the > user > has intentionally kept secret. > (http://w2spconf.com/2008/papers/s3p2.pdf) > * Data should not need to be duplicated between machine-readable and > human-readable forms (i.e. the human-readable form should be > machine-readable). > * Shouldn't require the consumer to write XSLT or server-side code to > read the contact information. > * Machine-readable contact information shouldn't be on a separate page > than human-readable contact information. > * The information should be convertible into a dedicated form (RDF, > JSON, XML, vCard) in a consistent manner, so that tools that use this > information separate from the pages on which it is found have a > standard way of conveying the information. > * Should be possible for different parts of an event to be given in > different parts of the page. For example, a page with contact details > for people in columns (with each row giving the name, telephone > number, etc) should still have unambiguous grouped contact details > parseable from it. > > > --------------------------------------------------------------------------- > > USE CASE: Allow users to maintain bibliographies or otherwise keep track > of sources of quotes or references. > > SCENARIOS: > > * Frank copies a sentence from Wikipedia and pastes it in some word > processor: it would be great if the word processor offered to > automatically create a bibliographic entry. > * Patrick keeps a list of his scientific publications on his web site. > He would like to provide structure within this publications page so > that Frank can automatically extract this information and use it to > cite Patrick's papers without having to transcribe the bibliographic > information. > * A scholar and teacher wants other scholars (and potentially students) > to be able to easily extract information about what he has published > to add it to their bibliographic applications. > * A scholar and teacher wants to publish scholarly documents or content > that includes extensive citations that readers can then automatically > extract so that they can find them in their local university library. > These citations may be for a wide range of different sources: an > interview posted on YouTube, a legal opinion posted on the Supreme > Court web site, a press release from the White House. > * A blog, say htmlfive.net, copies content wholesale from another, say > blog.whatwg.org (as permitted and encouraged by the license). The > author of the original content would like the reader of the > reproduced > content to know the provenance of the content. The reader would like > to find the original blog post so he can leave comments for the > original author. > * Chaals could improve the Opera intranet if he had a mechanism for > identifying the original source of various parts of a page. (why?) > > REQUIREMENTS: > > * Machine-readable bibliographic information shouldn't be on a separate > page than human-readable bibliographic information. > * The information should be convertible into a dedicated form (RDF, > JSON, XML, BibTex) in a consistent manner, so that tools that use > this > information separate from the pages on which it is found have a > standard way of conveying the information. > > > --------------------------------------------------------------------------- > > USE CASE: Help people searching for content to find content covered by > licenses that suit their needs. > > SCENARIOS: > > * If a user is looking for recipes of pies to reproduce on his blog, he > might want to exclude from his results any recipes that are not > available under a license allowing non-commercial reproduction. > * Lucy wants to publish her papers online. She includes an abstract of > each one in a page, but because they are under different copyright > rules, she needs to clarify what the rules are. A harvester such as > the Open Access project can actually collect and index some of them > with no problem, but may not be allowed to index others. Meanwhile, a > human finds it more useful to see the abstracts on a page than have > to > guess from a bunch of titles whether to look at each abstract. > * There are mapping organisations and data producers and people who > take > photos, and each may place different policies. Being able to keep > that > policy information helps people with further mashups avoiding > violating a policy. For example, if GreatMaps.com has a public domain > policy on their maps, CoolFotos.org has a policy that you can use > data > other than images for non-commercial purposes, and Johan Ichikawa has > a photo there of my brother's cafe, which he has licensed as "must > pay > money", then it would be reasonable for me to copy the map and put it > in a brochure for the cafe, but not to copy the data and photo from > CoolFotos. On the other hand, if I am producing a non-commercial > guide > to cafes in Melbourne, I can add the map and the location of the cafe > photo, but not the photo itself. > * At University of Mary Washington, many faculty encourage students to > blog about their studies to encourage more discussion using an > instance of WordPress MultiUser. A student with have a blog might be > writing posts relevant to more than one class. Professors would like > to then aggregate relevant posts into one blog. > * Tara runs a video sharing web site for people who want licensing > information to be included with their videos. When Paul wants to blog > about a video, he can paste a fragment of HTML provided by Tara > directly into his blog. The video is then available inline in his > blog, along with any licensing information about the video. > * Fred's browser can tell him what license a particular video on a site > he is reading has been released under, and advise him on what the > associated permissions and restrictions are (can he redistribute this > work for commercial purposes, can he distribute a modified version of > this work, how should he assign credit to the original author, what > jurisdiction the license assumes, whether the license allows the work > to be embedded into a work that uses content under various other > licenses, etc). > > REQUIREMENTS: > > * Content on a page might be covered by a different license than other > content on the same page. > * When licensing a subpart of the page, existing implementations must > not just assume that the license applies to the whole page rather > than > just part of it. > * License proliferation should be discouraged. > * License information should be able to survive from one site to > another > as the data is transfered. > * Expressing copyright licensing terms should be easy for content > creators, publishers, and redistributors to provide. > * It should be more convenient for the users (and tools) to find and > evaluate copyright statements and licenses than it is today. > * Shouldn't require the consumer to write XSLT or server-side code to > process the license information. > * Machine-readable licensing information shouldn't be on a separate > page > than human-readable licensing information. > * There should not be ambiguous legal implications. > > ============================================================================== > > Annotations > > USE CASE: Annotate structured data that HTML has no semantics for, and > which nobody has annotated before, and may never again, for private use > or > use in a small self-contained community. > > SCENARIOS: > > * A group of users want to mark up their iguana collections so that > they > can write a script that collates all their collections and presents > them in a uniform fashion. > * A scholar and teacher wants other scholars (and potentially students) > to be able to easily extract information about what he teaches to add > it to their custom applications. > * The list of specifications produced by W3C, for example, and various > lists of translations, are produced by scraping source pages and > outputting the result. This is brittle. It would be easier if the > data > was unambiguously obtainable from the source pages. This is a custom > set of properties, specific to this community. > * Chaals wants to make a list of the people who have translated W3C > specifications or other documents, and then use this to search for > people who are familiar with a given technology at least at some > level, and happen to speak one or more languages of interest. > * Chaals wants to have a reputation manager that can determine which of > the many emails sent to the WHATWG list might be "more than usually > valuable", and would like to seed this reputation manager from > information gathered from the same source as the scraper that > generates the W3C's TR/ page. > * A user wants to write a script that finds the price of a book from an > Amazon page. > * Todd sells an HTML-based content management system, where all > documents are processed and edited as HTML, sent from one editor to > another, and eventually published and indexed. He would like to build > up the editorial metadata used by the system within the HTML > documents > themselves, so that it is easier to manage and less likely to be > lost. > * Tim wants to make a knowledge base seeded from statements made in > Spanish and English, e.g. from people writing down their thoughts > about George W. Bush and George H.W. Bush, and has either convinced > the people making the statements that they should use a common > language-neutral machine-readable vocabulary to describe their > thoughts, or has convinced some other people to come in after them > and > process the thoughts manually to get them into a computer-readable > form. > > REQUIREMENTS: > > * Vocabularies can be developed in a manner that won't clash with > future > more widely-used vocabularies, so that those future vocabularies can > later be used in a page making use of private vocabularies without > making the earlier annotations ambiguous. > * Using the data should not involve learning a plethora of new APIs, > formats, or vocabularies (today it is possible, e.g., to get the > price > of an Amazon product, but it requires learning a new API; similarly > it's possible to get information from sites consistently using > 'class' > values in a documented way, but doing so requires learning a new > vocabulary). > * Shouldn't require the consumer to write XSLT or server-side code to > process the annotated data. > * Machine-readable annotations shouldn't be on a separate page than > human-readable annotations. > * The information should be convertible into a dedicated form (RDF, > JSON, XML) in a consistent manner, so that tools that use this > information separate from the pages on which it is found have a > standard way of conveying the information. > * Should be possible for different parts of an item's data to be given > in different parts of the page, for example two items described in > the > same paragraph. ("The two lamps and A and B. The first is $20, the > second $30. The first is 5W, the second 7W.") > * It should be possible to define globally-unique names, but the syntax > should be optimised for a set of predefined vocabularies. > * Adding this data to a page should be easy. > * The syntax for adding this data should encourage the data to remain > accurate when the page is changed. > * The syntax should be resilient to intentional copy-and-paste > authoring: people copying data into the page from a page that already > has data should not have to know about any declarations far from the > data. > * The syntax should be resilient to unintentional copy-and-paste > authoring: people copying markup from the page who do not know about > these features should not inadvertently mark up their page with > inapplicable data. > > > --------------------------------------------------------------------------- > > USE CASE: Allow authors to annotate their documents to highlight the key > parts, e.g. as when a student highlights parts of a printed page, but in > a > hypertext-aware fashion. > > SCENARIOS: > > * Fred writes a page about Napoleon. He can highlight the word Napoleon > in a way that indicates to the reader that that is a person. Fred can > also annotate the page to indicate that Napoleon and France are > related concepts. > > ============================================================================== > > Search > > USE CASE: Site owners want a way to provide enhanced search results to > the > engines, so that an entry in the search results page is more than just a > bare link and snippet of text, and provides additional resources for > users > straight on the search page without them having to click into the page > and > discover those resources themselves. > > SCENARIOS: > > * For example, in response to a query for a restaurant, a search engine > might want to have the result from yelp.com provide additional > information, e.g. info on price, rating, and phone number, along with > links to reviews or photos of the restaurant. > > REQUIREMENTS: > > * Information for the search engine should be on the same page as > information that would be shown to the user if the user visited the > page. > > > --------------------------------------------------------------------------- > > USE CASE: Search engines and other site categorisation and aggregation > engines should be able to determine the contents of pages with more > accuracy than today. > > SCENARIOS > > * Students and teachers should be able to discover each other -- both > within an institution and across institutions -- via their blogging. > * A blogger wishes to categorise his posts such that he can see them in > the context of other posts on the same topic, including posts by > unrelated authors (i.e. not via a pre-agreed tag or identifier, not > via a single dedicated and preconfigured aggregator). > * A user whose grandfather is called "Napoleon" wishes to ask Google > the > question "Who is Napoleon", and get as his answer a page describing > his grandfather. > * A user wants to ask about "Napoleon" but, instead of getting an > answer, wants the search engine to ask him which Napoleon he wants to > know about. > > REQUIREMENTS: > > * Should not disadvantage pages that are more useful to the user but > that have not made any effort to help the search engine. > * Should not be more susceptible to spamming than today's markup. > > > --------------------------------------------------------------------------- > > USE CASE: Web browsers should be able to help users find information > related to the items discussed by the page that they are looking at. > > SCENARIOS: > > * Finding more information about a movie when looking at a page about > the movie, when the page contains detailed data about the movie. > * For example, where the movie is playing locally. > * For example, what your friends thought of it. > * Exposing music samples on a page so that a user can listen to all the > samples. > * Students and teachers should be able to discover each other -- both > within an institution and across institutions -- via their blogging. > * David can use the data in a web page to generate a custom browser UI > for calling a phone number using our cellphone without using brittle > screen-scraping. > > REQUIREMENTS: > > * Should be discoverable, because otherwise users will not use it, and > thus users won't be helped. > * Should be consistently available, because if it only works on some > pages, users will not use it (see, for instance, the rel=next story). > * Should be bootstrapable (rel=next failed because UAs didn't expose it > because authors didn't use it because UAs didn't expose it). > > > --------------------------------------------------------------------------- > > USE CASE: Finding distributed comments on audio and video media. > > SCENARIOS: > > * Sam has posted a video tutorial on how to grow tomatoes on his video > blog. Jane uses the tutorial and would like to leave feedback to > others that view the video regarding certain parts of the video she > found most helpful. Since Sam has comments disabled on his blog, his > users cannot comment on the particular sections of the video other > than linking to it from their blog and entering the information > there. > Jane uses a video player that aggregates all the comments about the > video found on the Web, and displays them as subtitles while she > watches the video. > > REQUIREMENTS: > > * It shouldn't be possible for Jane to be exposed to spam comments. > * The comment-aggregating video player shouldn't need to crawl the > entire Web for each user independently. > > > --------------------------------------------------------------------------- > > USE CASE: Allow users to price-check digital media (music, TV shows, etc) > and purchase such content without having to go through a special website > or application to acquire it, and without particular retailers being > selected by the content's producer or publisher. > > SCENARIOS: > > * Joe wants to sell his music, but he doesn't want to sell it through a > specific retailer, he wants to allow the user to pick a retailer. So > he forgoes the chance of an affiliate fee, negotiates to have his > music available in all retail stores that his users might prefer, and > then puts a generic link on his page that identifies the product but > doesn't identifier a retailer. Kyle, a fan, visits his page, clicks > the link, and Amazon charges his credit card and puts the music into > his Amazon album downloader. Leo instead clicks on the link and is > automatically charged by Apple, and finds later that the music is in > his iTunes library. > * Manu wants to go to Joe's website but check the price of the offered > music against the various retailers that sell it, without going to > those retailers' sites, so that he can pick the cheapest retailer. > * David can use the data in a web page to generate a custom browser UI > for buying a song from our favorite online music store without using > brittle screen-scraping. > > REQUIREMENTS: > > * Should not be easily prone to clickjacking (sites shouldn't be able > to > charge the user without the user's consent). > * Should not make transactions harder when the user hasn't yet picked a > favourite retailer. > > ============================================================================== > > Cross-site communication > > USE CASE: Copy-and-paste should work between Web apps and native apps and > between Web apps and other Web apps. > > SCENARIOS: > > * Fred copies an e-mail from Apple Mail into GMail, and the e-mail > survives intact, including headers, attachments, and > multipart/related > parts. > * Fred copies an e-mail from GMail into Hotmail, and the e-mail > survives > intact, including headers, attachments, and multipart/related parts. > > > --------------------------------------------------------------------------- > > USE CASE: Allow users to share data between sites (e.g. between an online > store and a price comparison site). > > SCENARIOS > > * Lucy is looking for a new apartment and some items with which to > furnish it. She browses various web pages, including apartment > listings, furniture stores, kitchen appliances, etc. Every time she > finds an item she likes, she points to it and transfers its details > to > her apartment-hunting page, where her picks can be organized, sorted, > and categorized. > * Lucy uses a website called TheBigMove.com to organize all aspects of > her move, including items that she is tracking for the move. She goes > to her "To Do" list and adds some of the items she collected during > her visits to various Web sites, so that TheBigMove.com can handle > the > purchasing and delivery for her. > > REQUIREMENTS: > > * Should be discoverable, because otherwise users will not use it, and > thus users won't be helped. > * Should be consistently available, because if it only works on some > pages, users will not use it (see, for instance, the rel=next story). > * Should be bootstrapable (rel=next failed because UAs didn't expose it > because authors didn't use it because UAs didn't expose it). > * The information should be convertible into a dedicated form (RDF, > JSON, XML) in a consistent manner, so that tools that use this > information separate from the pages on which it is found have a > standard way of conveying the information. > > ============================================================================== > > Blogging > > USE CASE: Remove the need for feeds to restate the content of HTML pages > (i.e. replace Atom with HTML). > > SCENARIOS: > > * Paul maintains a blog and wishes to write his blog in such a way that > tools can pick up his blog post tags, authors, titles, and his > blogroll directly from his blog, so that he does not need to maintain > a parallel version of his data in a "structured format." In other > words, his HTML blog should be usable as its own structured feed. > > > --------------------------------------------------------------------------- > > USE CASE: Allow users to compare subjects of blog entries when the > subjects are hard to tersely identify relative to other subjects in the > same general area. > > SCENARIOS: > > * Paul blogs about proteins and genes. His colleagues also blog about > proteins and genes. Proteins and genes are identified by long > hard-to-compare strings, but Paul and his colleagues can determine if > they are talking about the same things by having their user agent > compare some sort of flags embedded in the blogs. > * Rob wants to publish a large vocabulary in RDFS and/or OWL. Rob also > wants to provide a clear, human readable description of the same > vocabulary, that mixes the terms with descriptive text in HTML. > > ============================================================================== > > Data extraction from uncooperative sources > > USE CASE: Getting data out of poorly written Web pages, so that the user > can find more information about the page's contents. > > SCENARIOS: > > * Alfred merges data from various sources in a static manner, > generating > a new set of data. Bob later uses this static data in conjunction > with > other data sets to generate yet another set of static data. Julie > then > visits Bob's page later, and wants to know where and when the various > sources of data Bob used come from, so that she can evaluate its > quality. (In this instance, Alfred and Bob are assumed to be > uncooperative, since creating a static mashup would be an example of > a > poorly-written page.) > * TV guide listings - If the TV guide provider does not render a link > to > IMDB, the browser should recognise TV shows and give implicit links. > (In this instance, it is assumed that the TV guide provider is > uncooperative, since it isn't providing the links the user wants.) > * Students and teachers should be able to discover each other -- both > within an institution and across institutions -- via their blogging. > (In this instance, it is assumed that the teachers and students > aren't > cooperative, since they would otherwise be able to find each other by > listing their blogs in a common directory.) > * Tim wants to make a knowledge base seeded from statements made in > Spanish and English, e.g. from people writing down their thoughts > about George W. Bush and George H.W. Bush. (In this instance, it is > assumed that the people writing the statements aren't cooperative, > since if they were they could just add the data straight into the > knowledge base.) > > REQUIREMENTS: > > * Does not need cooperation of the author (if the page author was > cooperative, the page would be well-written). > * Shouldn't require the consumer to write XSLT or server-side code to > derive this information from the page. > > > --------------------------------------------------------------------------- > > USE CASE: Remove the need for RDF users to restate information in online > encyclopedias (i.e. replace DBpedia). > > SCENARIOS: > > * A user wants to have information in RDF form. The user visits > Wikipedia, and his user agent can obtain the information without > relying on DBpedia's interpretation of the page. > > REQUIREMENTS: > > * All the data exposed by DBpedia should be derivable from Wikipedia > without using DBpedia. > > ============================================================================== > > -- > Ian Hickson U+1047E )\._.,--....,'``. fL > http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. > Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' > -- Sent from my mobile device
Received on Friday, 24 April 2009 04:35:05 UTC