- From: Eduard Pascual <herenvardo@gmail.com>
- Date: Tue, 28 Apr 2009 23:31:53 +0200
On Thu, Apr 23, 2009 at 10:46 PM, Ian Hickson <ian at hixie.ch> wrote: [...] > Exposing known data types in a reusable way > > ? USE CASE: Exposing calendar events so that users can add those events to > ? their calendaring systems. [...] > ? REQUIREMENTS: [...] > ? ? * Should be unlikely to get out of sync with prose on the page. > ? ? * Machine-readable event data shouldn't be on a separate page than > ? ? ? human-readable dates. [...] > ? --------------------------------------------------------------------------- > > ? USE CASE: Exposing contact details so that users can add people to their > ? address books or social networking sites. [...] > ? REQUIREMENTS: [...] > ? ? * Data should not need to be duplicated between machine-readable and > ? ? ? human-readable forms (i.e. the human-readable form should be > ? ? ? machine-readable). > ? ? * Machine-readable contact information shouldn't be on a separate page > ? ? ? than human-readable contact information. [...] > ? --------------------------------------------------------------------------- > > ? USE CASE: Allow users to maintain bibliographies or otherwise keep track > ? of sources of quotes or references. [...] > ? REQUIREMENTS: > > ? ? * Machine-readable bibliographic information shouldn't be on a separate > ? ? ? page than human-readable bibliographic information. [...] > ? --------------------------------------------------------------------------- > > ? USE CASE: Help people searching for content to find content covered by > ? licenses that suit their needs. [...] > ? REQUIREMENTS: [...] > ? ? * License information should be able to survive from one site to another > ? ? ? as the data is transfered. [...] > ? ? * Machine-readable licensing information shouldn't be on a separate page > ? ? ? than human-readable licensing information. [...] > ============================================================================== > > Annotations > > ? USE CASE: Annotate structured data that HTML has no semantics for, and > ? which nobody has annotated before, and may never again, for private use or > ? use in a small self-contained community. > [...] > ? REQUIREMENTS: > [...] > ? ? * Machine-readable annotations shouldn't be on a separate page than > ? ? ? human-readable annotations. [...] > ? ? * The syntax for adding this data should encourage the data to remain > ? ? ? accurate when the page is changed. > ? ? * The syntax should be resilient to intentional copy-and-paste > ? ? ? authoring: people copying data into the page from a page that already > ? ? ? has data should not have to know about any declarations far from the > ? ? ? data. > ? ? * The syntax should be resilient to unintentional copy-and-paste > ? ? ? authoring: people copying markup from the page who do not know about > ? ? ? these features should not inadvertently mark up their page with > ? ? ? inapplicable data. > > ? --------------------------------------------------------------------------- [...] > ? USE CASE: Site owners want a way to provide enhanced search results to the > ? engines, so that an entry in the search results page is more than just a > ? bare link and snippet of text, and provides additional resources for users > ? straight on the search page without them having to click into the page and > ? discover those resources themselves. [...] > ? REQUIREMENTS: > > ? ? * Information for the search engine should be on the same page as > ? ? ? information that would be shown to the user if the user visited the > ? ? ? page. > > ============================================================================== > > Cross-site communication > > ? USE CASE: Copy-and-paste should work between Web apps and native apps and > ? between Web apps and other Web apps. I have noticed (highlighted by the quoted fragments above) quite a bit of recurrence of some of the requirements, namely: - Information for the machine / agent / whatever should be on the same page as information for the (human) user. - copy-paste resilience - (on some cases) Data shouldn't be duplicated for humans and for machines (although this is not always achievable, for example with dates). There is a requirement that has been put forward previously [1], which IMO may interact with these, and didn't show up on Ian's original mail: - Meta-data (or any additional markup or data used to allow the machine to understand the actual information) shouldn't be redundantly repeated. Examples: -> An author puts up a page with contact information for several people (for example, the people responsible for the website; a list of entities that are somehow related to the website, like sponsors; or a list of friends in a restricted-access social website, such as in Microsoft's "Live Spaces"). Let's say that author puts this info in a table, with the contact name on the first column, the e-mail address on the second column, and so on, just because that's the kind of job tables are for. Of course, the first row in the table would hold the headers describing what each column means. The author *should* be able to tell the machine something like "the first column (or the first cell on each row) are the names, the second column (or 2nd cell on each row) are the e-mail addresses, ...", rather than, for each contact, having to repeat "this is the name", "this is the e-mail address", and so on. -> A website lists a series of software projects or products (from something as huge as SourceForge to something as small as a company's site listing its own few products), stating the product's title, author/s (in the case the products have diferent authors) license, version, and date of the last release. Again, the author of that site should be able to tell the machine something like "these are the products, these the authors, these the licenses, ...", rather than stating "this is the product's name, this is the product's author, this is the product's license, ..." for each and every product listed. Rationale: I hopoe it can be noticed how ignoring this need would raise some serious issues: first, and foremost, having to repeat the meta-information for each "entry" is tedious and error-prone: if an author misses to add a meta-data field to the new entry s/he just added, the whole purpose of using metadata is ruined, since users would need to manually retrieve the information anyway (wasn't the error-prowness the main reason to require keeping the metadata as close as possible to the actual information?). Next, redundant data means larger files, which directly translates in slower page loads for the user and higher bandwith costs for the publisher. There may be some secondary issues from this (for example, some search engines tend to "truncate" large files and ignore everything beyond a certain threshold), but those come from the needlessly enlargement of the file; so file bloating is the actual issue to keep in mind and deal with here. Additional considerations: - Fullfilling this requirement could make harder to deal with the copy-paste tasks, but not impossible. Some browsers can preserve the formatting applied from an external CSS when copying, so preserving the metadata when it has been defined upon structure would be equally achievable. - There *are* cases where repeating the metadata a few times can be better than having it centralized. I have nothing against any solution that *allows redundancy*, as long as it *does not enforce redundancy*. - I want to make clear that there is a difference between having the human-readable and machine-readable information in the same place (even reusing the same info for both consumers when doable) and having the metadata (the data that defines how to interpret the actual data) there as well. There might even be cases where having the metadata somewhere else may make sense (for example, in the SourceForge example above, it would be quite reasonable to have a single file defining how to retrieve the useful data for each SERP (SEarch Result Page), rather than defining it on every SERP). Again, I feel that the ideal solution should allow either practice and force none (after all, from an author's PoV, more choice means more power, which is always better for us). References: [1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-August/016037.html (You might want to review other messages on that thread as well, but I think this is the one that better describes the actual issue. Also, keep in mind that, while my intention with this post is to bring the problem/need into consideration, that thread evolved into discussing some solution ideas. I think we should have the list of needs and use-cases properly defined before we start discussing solutions.) Regards, Eduard Pascual
Received on Tuesday, 28 April 2009 14:31:53 UTC