- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Wed, 23 Nov 2011 17:55:36 +0000
- To: tantek@cs.stanford.edu
- Cc: "HTML Data Task Force WG" <public-html-data-tf@w3.org>, "Ivan Herman" <ivan@w3.org>
Thanks Tantek, On 23 Nov 2011, at 15:19, Tantek Çelik wrote: > Generic consumers can absolutely pickup all necessary information from microformats 2 syntax (again by design), and at least some generic information from microdata syntax as well. E.g. an HTML5 Drag & Drop implementation can do generic parsing of microformats 2 and microdata, convert them to a standard (and interoperable) JSON data model, and incorporate them into the data being dragged/dropped. OK, let's try to put together some wording together for a separate section on generic consumers. Here's a start, but I'd appreciate input about what microformats-2 processors can and can't do, particularly around locating additional machine-readable information about the vocabulary: Microdata, RDFa and microformats-2 all use a generic syntax, which means that it's possible to have generic parsers operate over them to extract data. In the case of microdata and microformats-2, the data has a JSON structure; data extracted from RDFa has a RDF structure (microdata can also be converted into RDF). Generic applications can work in the browser to do things such as highlighting markup that follows a particular syntax or enabling users to download the data embedded within a page into a separate file. These can also use the context in which the HTML data is found to provide additional features. For example, generic consumers may detect that each row in a table is associated with a distinct entity, and each cell with a particular property, and enable users to sort that table based on property values. In this case, a consumer could ensure that when values are marked up as dates, times or durations using the <time> element, the items are sorted by date/time/duration rather than alphabetically. Both microformats-2 and RDFa provide additional facilities that enable publishers to indicate the type of values to support generic consumers. Microformats-2 properties have a prefix that can indicate when a value is a URL (u-*), a date/time (dt-*), extended HTML (e-*) or a string (p-*). RDFa supports a @datatype attribute that publishers can use to indicate the datatype of a value, usually an XML Schema datatype such as xsd:integer or xsd:language. Note that once microformats-2 data is extracted from a page into JSON, these prefixes are no longer available, so a consumer of the JSON has to know the vocabulary to tell whether a given value should be interpreted as a string or as HTML markup, for example. In contrast, the datatypes used to annotate RDFa values are carried within the RDF data. RDFa also adheres to a follow-your-nose principle, whereby vocabulary authors are encouraged to provide a machine-readable description of classes and properties at the URL used for the class or property. This can enable generic processors to automatically pick up additional information about the class or property such as labels, help text, superclasses, property cardinality and ranges and so on. While microdata also uses URLs for types and properties, microdata consumers are not permitted to dereference URLs that they do not already recognise. >> And we can bring out the guidance on the vocabulary side about not making vocabularies where the datatype of a value can't be determined from the property and its syntax. > > The double negative in that statement is confusing. > > I'm not sure how this is necessary. I'd need specific examples of how this helps to understand what you're saying. Well, as an example, there's a particular RDF vocabulary, SKOS, which states that the skos:notation property can be used to give a code for a skos:Concept. Some concepts might have codes from different coding schemes. So that vocabulary says that the RDF datatype of the skos:notation value should be used to indicate the type of the coding scheme. So you're actually encouraged in this vocabulary to end up with something like: <dog> a skos:Concept ; skos:notation "3-12"^^eg:CodingScheme1 ; skos:notation "7-53"^^eg:CodingScheme2 ; . There's some more about this at http://patterns.dataincubator.org/book/custom-datatype.html What I was trying to say is that this pattern is bad in HTML data vocabularies, because it limits what syntaxes can be used with the vocabulary (you have to use RDFa) and because it places burden on publishers and leads to unreliable data. It should always be possible for a vocabulary-aware application to tell the type of a value based on (a) what property its given as a value for and (b) what the syntax of the value is. Cheers, Jeni -- Jeni Tennison http://www.jenitennison.com
Received on Wednesday, 23 November 2011 17:56:13 UTC