- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Tue, 06 Nov 2012 13:27:46 -0500
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- CC: RDFa Working Group <public-rdfa-wg@w3.org>
On 11/05/12 19:16, Tab Atkins Jr. wrote: > As outlined in the original threads that introduced this issue, > usage in the wild shows that authors very commonly author "invalid" > markup which uses a common prefix without specifying the prefix. For some value of "very commonly". One of the biggest problems with this discussion is that we don't have good data on how common this is. That said, the RDFa WG was concerned enough about this possibility that we introduced the RDFa Initial Context in RDFa 1.1 (which pre-defines these common prefixes that authors may forget to define in their documents): http://www.w3.org/2011/rdfa-context/rdfa-1.1 Do you have any data to support the claim that this is a wide-spread occurrence in RDFa documents? Something like this (that demonstrates that prefixes for vocabularies were not declared in the document): http://webdatacommons.org/vocabulary-usage-analysis/index.html > 2. The developers of consumers either *also* share this > misunderstanding, or just don't find it worthwhile to be correct > when they can do just as well in practice by treating the prefix as > meaningful. This suggests that there may be a real interoperability > danger if an author *properly* declares a prefix where the prefix is > a common one, but the URL is to something other than what common use > points to - in "correct" consumers the document will be interpreted > as the author intended, but in many common consumers it will instead > be misinterpreted to be using the common vocabulary rather than what > the author intended. We have not seen reports of this sort of wide-spread abuse. Is there new data to demonstrate that there are consumers out there that are doing this? > 3. In addition to the theoretical interop problem above, we have a > real interop problem already - many consumers will happily consume > pages that don't declare their prefix, as long as they use a > "well-known" prefix for it. A conformant consumer, on the other > hand, would *not* do so, and would find no valid data on the pages. As others in this thread have pointed out, it seems that you are not aware of the RDFa Initial Context feature introduced in RDFa 1.1? A conformant consumer /would/ find data on the pages where the author has forgotten to declare their prefixes. > You have to reverse-engineer the web to find out which prefixes > need to be supported without a declaration, and what URL they should > be bound to. One of your suggestions below is that we pre-define common prefixes in use. How are we supposed to do that if we don't do crawls of the Web to understand the common prefixes in use? Is that what you mean by "reverse-engineer the Web"? If so, it has also been shown that this can be done. In fact, Yahoo and Common Crawl did crawls of a section of their corpus to give us statistically significant results which were then used to define the RDFa Initial Context. If we were to adopt your option #1 below, we'd have to do this. > 1. Discover and document the common prefixes in use, define them to > always be bound to the URL they're commonly bound to, even without > an actual declaration, and don't allow them to be bound to a URL > other than that predefined one. What is implicit in your proposal above is that we'd continue to support decentralized extensibility, but only for prefixes that are not pre-defined? If that's the case, what happens when a developer uses 'foo' today, but then the W3C decides to pre-define 'foo' in the future to map to a different URL? Wouldn't that break the author's document? That said, this is an interesting proposal, and one that I think we should implement *IF* there is data to back up the premise of the argument. The data should show that: 1) Declaring the same prefix to point to two different URLs is a common practice on the Web, and 2) It leads to consumers mis-interpreting the data in a way that generates the wrong data. I think #1 is true for the 'dc' prefix, but in that case, both 'elements' and 'terms' are largely compatible with one another. I can't think of any other vocabulary where this holds true. Seeing data showing that something like 'ogp' or 'schema' are being commonly mapped to two different vocabulary URLs would be very compelling evidence. We haven't seen any data to raise the concern that #2 is true. Do you have data demonstrating this assertion? > 2. Drop the indirection of prefixes entirely, and simply declare > that prefixes themselves are meaningful. Predefine the common > prefixes in use. This would effectively break any document that is using a "statistically insignificant" prefix. It would also harm innovation in vocabularies as the barrier to entry would be much higher. It would break decentralized extensibility in RDFa. I think you realize all of this, but I just wanted to make sure that others in the thread understand the ramifications of a change like this. > 1. If people adopted the convention of simply using their domain > name (quite reasonable, I think, and likely more-or-less what people > will naturally use anyway), it would convey the exact same meaning > and uniqueness as a full URL, but with less typing - "http://foo.com" > is 11 characters longer than "foo". While that's true, it would also push people to not put a great deal of thought into creating a vocabulary, or publish their vocabularies, and it would effectively produce data that is not portable at all across websites. > However, if #2 is for whatever reason unacceptable, #1 is the *bare > minimum* that needs to be done for the RDFa spec to document > reality, such that a consumer can follow the spec and reasonably > expect to correctly consume content already on the web. Of your two proposals, I think #1 has the best chance of being adopted (if not for RDFa 1.1, then for RDFa 2.0). The part that is missing is the data that supports the premise of your argument. -- manu -- Manu Sporny (skype: msporny, twitter: manusporny) President/CEO - Digital Bazaar, Inc. blog: HTML5 and RDFa 1.1 http://manu.sporny.org/2012/html5-and-rdfa/
Received on Tuesday, 6 November 2012 18:28:44 UTC