- From: Charles McCathieNevile <chaals@opera.com>
- Date: Sun, 04 Jan 2009 12:14:49 +1100
On Sat, 03 Jan 2009 04:52:35 +1100, Tab Atkins Jr. <jackalmage at gmail.com> wrote: > On Fri, Jan 2, 2009 at 12:12 AM, Charles McCathieNevile > <chaals at opera.com> wrote: >> On Fri, 02 Jan 2009 05:43:05 +1100, Andi Sidwell <andi at takkaria.org> >> wrote: >> >>> On 2009-01-01 15:24, Toby A Inkster wrote: >>>> >>>> The use cases for RDFa are pretty much the same as those for >>>> Microformats. >>> >>> Right, but microformats can be used without any changes to the HTML >>> language, whereas RDFa requires such changes. If they fulfill the >>> same use >>> cases, then there's not much point in adding RDFa. >> >> ... > > Why the non-response? Because the response comes in the next paragraph, to the first question that was worth asking. >>>> So why RDFa and not Microformats? >> >> (I think the question should be why RDFa is needed *as well as* >> ?formats) > > This is correct. Microformats exist already. They solve current > problems. (Elsewhere in this thread you wrote [[[ It has not yet been established that there is a problem worth solving that metadata would address at all. ]]] Do you consider that ?formats do not encode metadata? Otherwise, I am not sure how to reconcile these statements. In any case I would greatly appreciate clarification of what you think microformats do, since I do believe that microformats are very explicitly directed to allowing the encoding of metadata, anbd therefore it is not clear that we are discussing from similar premises). > Are there further problems that Microformats don't address > which can be solved well by RDFa? Are these problems significant > enough to authors to be worth addressing in the spec, or can we wait > and let the community work out its own solutions further before we > make a move? In my opinion, yes there are further problems ?formats don't solve (that RDFa does), yes they are significant, and the community has come up with a way to solve them - RDFa. > Microformats are the metadata equivalent of Flash-based video players. > They are hacks used to allow authors to accomplish something not > explicitly accounted for in the language. Are there significant > problems with this approach? Yes. The problems are that they rely on precoordination on a per-vocabulary basis before you can do anything useful with the data. In practical usage they rely on choosing attribute names that hopefully don't clash with anything - in other words, trying to solve the problem of disambiguation that namespaces solves, but by choosing names that are wierd enough not to clash or by circumscribing the problem spaces that can be addressed to the extent that you can expect no clashes. (This is hardly news, by the way). > Is metadata embedding used widely enough > to justify extending the language for it, or are the current hacks > (Microformats, in this case) enough? Are current metadata embedding > practices mature enough that we can be relatively sure we're solving > actual problems with our extension? Current metadata embedding is done using ?formats, and it's pretty clear that they are not sufficient. A large body of work uses RDF data models (Dublin Core, IMS, LOM, FOAF, POWDER are all large-scale formats. The people who are testing RDF engines with hundreds of millions of triples and more are doing it with real data, not stuff generated for the experiment). It is also clear that people would like to develop further small-scale formats, and that ?formats through its requirement for community consultation is effectively too heavyweight for the purposes of many developers. > These are all questions that must > be asked of any extention to the language. > >>>> Firstly, RDFa provides a single unified parsing algorithm that >>>> Microformats do not. ... >> >>> This is not necessarily beneficial. If you have separate parsing >>> algorithms, you can code in shortcuts for common use-cases and thus >>> optimise the authoring experience. >> >> On the other hand, you cannot parse information until you know how it is >> encoded, and information encoded in RDFa can be parsed without knowing >> more. >> >> And not only can you optimise your parsing for a given algorithm, you >> can also do for a known vocabulary - or you can optimise the >> post-parsing treatment. > > What is the benefit to authors of having an easily machine-parsed > format? Assuming that the format is sufficiently easy to write, and to generate, I am not sure what isn't obvious about the answer to the question. (In case I am somehow very clever, and others aren't, the benefit is that it is easy to machine parse and use the information). > Are they greater than the benefits of a > format that is harder to parse, but easier for authors to write? For a certain set of authors, yes the benefits are greater. >>> Also, as has been pointed out before in the distributed extensibility >>> debate, parsing is a very small part of doing useful things with >>> content. >> >> Yes. However many of the use cases that I think justify the inclusion of >> RDFa are already very small on their own, and valuable when several >> vocabularies are combined. So being able to do off-the-shelf parsing is >> valuable, compared to working out how to parse a combination of formats >> together. > > Can you provide these use-cases? The discussion has an astonishing > dearth of use-cases by which we can evaluate the effectiveness of > proposals. The small-scale use cases are difficult to provide, since they are based on the fact that people do something quickly because they need it. One set of potential use cases is all the microformats that haven't been blessed by the ?formats community as formally agreed "standards" - writing them in RDFa is sufficient to have them be usable. Another use case is noting the source of data in mashups. This enables information to be carried about the licensing, the date at which the data was mashed (or smushed, to use the older terminology from the Semantic Web), and so on. Another (the second time I have noted it in two emails) is to provide information useful for improving the accessibility of Web content. The set of use cases that led to the development of GRDDL are also use cases for RDFa - since RDFGa allows a direct extraction to RDF without having to develop a new parser for each data model, authors can simplify the way they extract data by using RDFa to encode it, saving themselves the bother of explaining how to extract it. This time saving means that they can afford to develop a smaller, more specialised vocabulary. > Is there any indication that use of > ambiguous names produces significant problems for authors? Not that I am aware of, although I think the question is poorly considered so I haven't given it much thought. There is plenty of evidence (for example the attempts to use Dublin Core within existing HTML mechanisms) that it causes problems for data consumers. >>>> It can be argued that going through a >>>> community to develop vocabularies is beneficial, as it allows the >>>> vocabulary to be built by "many minds" - RDFa does not prevent this, >>>> it >>>> just gives people alternatives to community development. >>> >>> RDFa does not give anything over what the class attribute does in >>> terms of >>> community vs individual development, so this doesn't really speak in >>> RDFa's >>> favour. >> >> In principle no, but in real world usage the class attribute is >> considered something that is primarily local, whereas RDFa is generally >> used by people who have a broader outlook on the desirable permanence >> and re-usability of their data. > > Can we extract a requirement from this, then? A poor formulation (I hope that those who are better at very detailed requirements can help improve my phrasing) could be: Provide an easy mechanism to encode new data in a way that can be machine-extracted without requiring any explanation of the data model. >>>> Lastly, there are a lot of parsing ambiguities for many Microformats. >>>> One area which is especially fraught is that of scoping. The editors >>>> of >>>> many current draft Microformats[1] would like to allow page authors to >>>> embed licensing data - e.g. to say that a particular recipe for a pie >>>> is >>>> licensed under a Creative Commons licence. However, it has been noted >>>> that the current rel=license Microformat can not be re-used within >>>> these >>>> drafts, because virtually all existing rel=license implementations >>>> will >>>> just assume that the license applies to the whole page rather than >>>> just >>>> part of it. RDFa has strong and unambiguous rules for scoping - a >>>> license, for example, could apply to a section of the page, or one >>>> particular image. >>> >>> Are there other cases where this granularity of scoping would be >>> genuinely >>> helpful? If not, it would seem better to work out a solution for >>> scoping >>> licence information... >> >> Yes. >> >> Being able to describe accessibility of various parts of content, or >> point >> to potential replacement content for particular use cases, benefits >> enormously from such scoping (this is why people who do industrial-scale >> accessibility often use RDF as their infrastructure). ARIA has already >> taken >> the approach of looking for a special-purpose way to do this, which >> significantly bloats HTML but at least allows important users to satisfy >> their needs to be able t produce content with certain information >> included. >> >> Government and large enterprises produce content that needs to be >> maintained, and being able to include production, cataloguing, and >> similar >> metadata directly, scoped to the document, would be helpful. As a >> trivial >> example, it would be useful to me in working to improve the Web content >> we >> produce at Opera to have a nice mechanism for identifying the original >> source of various parts of a page. > > Can we distill this into use-cases, then? Sure. It just takes a small amount of thinking. How many use cases would you think will be sufficient to demonstrate that this would be important. Or do you measure it by how many people each use case applies to? (It is far easier to justify the cost of developing use cases where there is more clarity about the goals for those use cases - and it enables people to decide whether to develop their own, or go find the people who are doing this and ask them to provide the information). > You, as an author, want to > be able to specify the original source of a piece of content. What's > the practical use of this? Does it require an embedded, > machine-readable vocabulary to function? Are existing solutions > adequate (frex, footnotes)? ... > Not quite. Specifically, is there any practical use for marking up > various sections of a site with licensing information specific to that > section *in an embedded, machine-readable manner*? Are the existing > solutions adequate (frex, simply putting a separate copyright notice > on each section, or noting the various copyrights on a licensing > page)? Let me treat these as the same question since I don't think they introduce anything usefully different between them. I will add to that Henri's questions about my use case for this already published elsewhere in this thread. A practical use case is in an organisation where different people are responsible for different parts of content. Instead of having to look up, myself, who is responsible for each piece, and what rights are associated with it, I can automate the process. (This is one of the value propositions offered by content management systems. I hope we can agree that these are sufficiently widely used to a priori assume a use case, but if not please say so). This means that instead of manually checking many pages for things like accessibility or being up to date, and then having to find which part of the page was produced by which part of the organisation (which is what I do at Opera) I can simply have this information trawled and presented as I please by a program (which many large organisations do, or partially do). Another example is that certain W3C pages (the list of specifications produced by W3C, for example, and various lists of translations) are produced from RDF data that is scraped from each page through a customised and thus fragile scraping mechanism. Being able to use RDFa would free authors of the draconian constraints on the source-code formatting of specifications, and merely require them to us the right attributes, in order to maintain this data. An example of how this data can be re-used is that it is possible to determine many of the people who have translated W3C specifications or other documents - and thus to search for people who are familiar with a given technology at least at some level, and happen to speak one or more languages of interest. This is at least as important to me in looking for potential people to recruit as any free-text search I can do - and has the benefit that while I don't have the resources to develop large-scale free-text searching, I do have the resources to develop simple queries based on a standardised data model and an encoding of it. Alternatively I could use the same information to seed a reputation manager, so I can determine which of the many emails I have no time to read in WHAT-WG might be more than usually valuable. cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle fran?ais -- hablo espa?ol -- jeg l?rer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
Received on Saturday, 3 January 2009 17:14:49 UTC