- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 17 Feb 2009 06:47:24 +0000 (UTC)
- To: Ben Adida <ben@adida.net>, Kjetil Kjernsmo <kjetil@kjernsmo.net>, Jeremy Carroll <jeremy@topquadrant.com>, Julian Reschke <julian.reschke@gmx.de>, Kingsley Idehen <kidehen@openlinksw.com>, Henri Sivonen <hsivonen@iki.fi>
- Cc: public-rdfa@w3.org, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, 'Karl Dubost' <karl@la-grange.net>, Mark Birbeck <mark.birbeck@webbackplane.com>, Manu Sporny <msporny@digitalbazaar.com>, Sam Ruby <rubys@intertwingly.net>, Dan Brickley <danbri@danbri.org>, Michael Bolger <michael@michaelbolger.net>, Tim Berners-Lee <timbl@w3.org>, Dan Connolly <connolly@w3.org>
This is a bulk reply to several e-mails on this thread. I apologise for its length. On Fri, 13 Feb 2009, Ben Adida wrote: > Ian Hickson wrote: > > > So, can we look at the use cases as a whole? > > > > In a word, no. > > > > A very common architectural mistake that software engineers make is > > looking at five problems, seeing their commonality, and attempting to > > solve all five at once. The result is almost always a solution that is > > sub-par for all five problems. > > I think you're taking a good piece of advice -- "don't over-generalize" > -- to the other and equally dangerous extreme. You want to refuse to > look for *any* common patterns? Oh don't get me wrong, if there are solutions that have commonalities, then obviously we should reuse solutions where possible. For example, we had several use cases -- offline Hotmail, offline Google Spreadsheets, offline Flickr -- and we came up with a single solution that covers all of these. But we evaluated the solution against each case independently. In other words, to get a good result, instead of: 1. Find problems. 2. Extract commonalities of problems. 3. Adopt a solution that solves the commonalities. ...the process needs to be: 1. Find problems. 2. Propose solutions that solve one or more of those problems. 3. Evaluate the solutions against each problem. 4. If a solution is found that addresses many of the problems, adopt it. That is, the use cases have to be used both at the start of the process _and_ at the end of the process. Otherwise, we risk ending up with something that doesn't actually solve any of the use cases we were attempting to solve. The reason I bring this up is that I have noticed that whenever I am talking about RDF with someone, the conversation tends to go me: "Tell me a problem that RDF solves." other: "RDF solves X!" me: "Wouldn't Y be a better solution for X?" other: "Well, X was a bad example." I don't know that I've ever heard a _good_ example! > Consider, for example, Creative Commons. We can't afford to get everyone > to build Creative Commons support into their tools if that involves > buying into a CC-specific language and toolset. Neither can Bitmunk with > respect to music. But if we come together, use the same markup and > parsing technology, and even share relevant pieces of our respective > vocabularies, then it becomes tractable. The work that Manu does > benefits me, and vice-versa. IMHO, the syntax and data model is the easy part. If you had trouble getting adoption of your vocabulary with a trivial dedicated syntax, I don't think you're likely to have any more luck now that your vocabulary comes with a general-purpose data model and half a dozen different syntaxes. But your mileage may vary, I guess. This line of argumentation (that small problems should share solutions so as to leverage each others' work) is not convincing to me. > Using the same principle, we also future-proof our work. At CC, we're > not sure what other fantastic media will appear next. 3D video? Full > virtual reality? Who knows. But when those come out, with their custom > attributes to describe properties we don't even know about yet, we'll > still be able to use the same RDF and RDFa to express their licensing > terms, and the same parser to pick things up. Personally I prefer to address today's problems today and tomorrow's problems tomorrow, so that as we meet new problems, they are addressed with surgical precision, rather than trying to come up with systems that can solve everything forever. But again, to each his own. This line of argumentation (that we should design systems that solve all future needs, whether forseeable or not) is also not convincing to me. On Fri, 13 Feb 2009, Ben Adida wrote: > > [...] we're not asking browsers to implement any specific features other > than make those attributes officially available in the DOM. You presumably do want some user agents some where at some time to do something with these triples, otherwise what's the point? Whether this is through extensions, or through browsers in ten years when the state of the art is at the point where something useful can be done with any RDFa, or through search engines processing RDFa data, there has to be _some_ user agent somewhere that uses this data, otherwise what's the point? > In fact, I would say the cost of doing it *differently* is higher for > HTML5, too, since none of our test suite, none of our parsing rules, > none of our existing work could be reused. Currently, as Mark has > mentioned, a *lot* of our work can be easily reused by HTML5, including > our test suite. I agree that if it is the case that there are problems that are best solved through RDFa, that it would make sense to use RDFa as is and that not using it would be silly. Of course, it may be that there are no such problems, or that such problems aren't compelling enough to need to solve them in HTML5, or that all these problems that are solved through RDFa are in fact a subset of the problems that can all be solved using a common feature. In these cases, reusing RDFa wouldn't make sense -- we'd want to (respectively) not use anything, not use anything yet, or use something else from which one could obtain triples as well as other things. On Sat, 14 Feb 2009, Kjetil Kjernsmo wrote: > On Saturday 14 February 2009, you wrote: > > > > Please don't take these questions as personal attacks. I honestly am > > trying to find out how RDF and RDFa are to work in HTML5, to see if > > they make sense to add. > > Sure! Skepticism is sound, but you have be aware that the questions you > raise has all been discussed at length elsewhere, and sometimes all this > advocacy seems to be a waste of time, time that would be better spent > actually writing code (and stick to XHTML for the web page needs) to > prove the case by actual running code. Thus, I will be very brief. The problem is that every time I ask these questions, I get that reply -- we've answered these questions long ago, so the answers will be brief. Unfortunately this doesn't really end up answering my questions. > > > > Note that you can already "ask questions" on the Web. For example, > > > > I just searched for "which country napolean", which is neither the > > > > right question nor correctly spelt (though that wasn't > > > > intentional), and Google answered: > > > > > > Well, you just proved that google sucks, didn't you? It couldn't get > > > the answer to that basic question right... > > > > Would a system based on RDF or RDFa give a better answer to the same > > question? How? Is there a system running somewhere that can > > demonstrate this? Does it require all data to be marked up as RDFa? > > I suggested a SPARQL query builder for KDE yesterday. It would be very > good at cases as this. A SPARQL query builder does nothing for most people. It is not a substitute for a freeform query UI. How would a system based on RDF or RDFa give a better answer to the same freeform question? Would such systems require all data to be marked up as RDFa or other RDF variants? I assume a SPARQL query builder can't do free-form searches across the Web corpus -- what should happen if the RDF stores of the world don't include the data you're looking for, or have contradictory data? These aren't rhetorical questions. Without real, complete answers to these questions, the problem isn't solved. I'm not trying to be difficult here. It would be far easier for me to just say "why yes, you're right, RDFa solves this problem" and just ignore all these problems. But I wouldn't be doing my job if I did that. > > > Another example, I'd like to have the latest version of the SPARQL > > > Update spec, and I expect to get it if I ask for "sparql update". > > > > How does RDF or RDFa solve this problem? > > dct:date I beg your pardon? > > Do we have reason to believe that it is more likely that we will get > > authors to widely and reliably include such relations than it is that > > we will get high quality natural language processing? Why? > > Yeah. Because high quality natural language processing is very unlikely > to ever happen. It will remain a niche auxiliary system, and something > that is only half-decent for English. IMHO, the odds of us getting authors to widely and reliably include such relations are zero, which is even less likely than "very unlikely". Note that natural language processing today is the solution we are using to the problem of "finding information" (qv. Google, Yahoo! Search, etc). Sure, it's extremely primitive, but it works better than structured data analysis does today, and it doesn't rely on the authors really marking anything up -- most of the semantics of HTML documents are basically ignored by search engines. This is necessary because authors have difficulty doing the most basic things in HTML, such as using <h1> correctly, or using <table> only for tabular data, etc. What makes you think we can get authors to widely and reliably include data relations? > > How would an RDF/RDFa system deal with people gaming the system? > > trust networks. Ok, so let me describe a system I might expect to see in an ideal RDF- and RDFa-enabled world. Stop me if I go wrong. In this world, wikipedia, instead of being a presentational wiki, is a wiki with relationships marked up, so that, for instance, if I visit the page on Paul Desmond, my browser can tell that every instance of the word "Paul Desmond" on that page is actually a Person, described by that page, and further can tell that this Person has released Audio CDs, including one with the track "Caravan", recorded in 1969. It knows all this, so that after I have visited this page, I can later ask my browser (using some mechanism that I won't describe here) for it to give me the name of the person who played "Caravan" and the date that they played it. It knows that if I ask this question, it can trust Wikipedia, because I trust Jimmy Wales, and Jimmy Wales trusts content on Wikipedia. And thus it can tell me that the answer is "Paul Desmond". I also trust Amazon, and I visit the page for the MP3 for Caravan and it includes an assertion regarding the price. So later, when I ask the browser for the name and price of the track that was released in 1969 that was played by Desmond, it can tell me what it is: Caravan, $2.99. Or so I think. Unbeknownst to me, or Jimmy Wales, or Amazon, at the time I visited Wikipedia, there was another assertion on the Paul Desmond page. That assertion was written by another user, and was reverted shortly after I visited the page, but it was present while I visited. That assertion said that there was another track by Paul Desmond, released in 1969 (and indeed every year from 1900 to 2009), called "Viagra", which costs $0.99. What stops my browser from telling me that this is the answer? This isn't farfetched. There is a multibillion dollar industry doing this 24/7, writing software that automates such spamming. Such software is actually on the leading edge of massively parallelised programming, with clusters of hundreds of thousands of nodes (computers owned by the unsuspecting people who can't even use <h1> correctly) posting on blogs, forums, wikis, etc, non-stop. Right now it's not such a problem for the end user, because search engine vendors spend millions and millions of dollars every year to combat the problem with their own huge computational power. > > How would an RDF/RDFa system deal with the problem of the _questions_ > > being unstructured natural language? > > See my tuberculosis use case. You make the false assumption that the > user needs to formulate a question. The assumption I'm making is that the user has a question, and that they want an answer to it. But I'm willing to accept that there might be other interfaces -- what are they? The tuberculosis example didn't include sample UI. Could you give me an example of what the aforementioned users are going to see? > > How would an RDF/RDFa system deal with data provided by companies that > > have no interest in providing the data in RDF or RDFa? (e.g. companies > > providing data dumps in XML or JSON.) > > I think we need something I've called GRLLA, i.e. the guerilla version > of GRDDL ;-) If we're assuming that there'll be a way to convert from dedicated formats to RDF, why bother with RDFa? Why not just have people output the data in their most convenient format, and then convert from that? That way RDF isn't made special, and if an even better data model comes along, people can convert from the native format to that one too. > > How would an RDF/RDFa system deal with companies that do not want to > > provide the data free of charge? > > That's OK. As long as there are links to something that the rest of the > world likes, this is not a problem, it is a good thing. I don't understand your answer. > > How would an RDF/RDFa system deal with companies that want to track > > per-developer usage of their data? > > Wrong question, developers as we see them today will be an anachronism, > that's part of the fun. I don't understand your answer. Where are the developers going to go? > > > > How does RDFa solve the problem that they have that I described > > > > but that you cut from the above quotes, namely that they want to > > > > track usage on a per-developer basis? > > > > > > OK, it doesn't. > > > > If the problem is that we want price data out of Amazon pages, and > > RDFa doesn't solve the problem to Amazon's satisfaction, then why is > > RDFa being put forward as a solution? > > I think Amazon will realise that they do not act in their own best > interest, though it may take some time. Do you not see the parallel here between what you just said and what you say you are hearing from non-RDF people? You can't tell someone that they are wrong and that your way is the right way and that they'll come to realise it eventually. You have to actually listen to their needs, and then actually address them. Just ignoring their needs and saying "they'll realise they're wrong in due course" is not going to make them adopt your solution. It'll just sideline your work and make it irrelevant. > > What did you do with the genres once you had them all aligned with > > union, intersection, and sams-as relationships? That doesn't seem like > > the most useful structure for data to be exposed to a random user. > > We did a bit of reasoning, constructed a graph from it where all the > relations between genres are expressed, then found that the we didn't > have the hardware to do what we wanted, so we chopped it up to a tree > again. So the user has a nice 2D tree on a ball that can be rotated at > 30 frames per second. With better hardware, we want to do a 3D rendering > of it. Holy mackrel. And all this was easier than the few lines of code to read the ID3 tags out of the music files?! I certainly wouldn't know how to do all of the above in a dozen lines of code! > > > I want to provide pointers to detailed descriptions of the things I > > > mention in what I write. > > > > Isn't an <a href=""> suitable for this already? > > Nope, this should be self-evident. Providing pointers to things is what <a href=""> does. So it's not really evident at all that it isn't suitable for providing pointers, no. Could you elaborate on this? > > > I want to be able to express myself succinctly with pointers to > > > other places on the Web where descriptions of the people, places, > > > subject matter can be obtained. > > > > Again, <a href=""> seems to have solved this problem well until now, > > why does it no longer solve the problem? > > I really don't understand that you cannot see the problem with how this > is done today... How is what we have today broken? I honestly don't understand how if you have a document, and you want to point to other documents, <a href=""> doesn't do what you want. > > > Note, I don't want to point them to another chunk of blurb, I want > > > to point my readers to a page that has the sole function of > > > describing the aforementioned entities via their attributes and > > > relationships. > > > > Why? > > Oh, please... This is the kind of questions that gives people a strong > impression that talking to you is a total waste of time... I'm sorry if that is the impression I give. I haven't been in the RDF world, so things that may seem obvious to you are really not obvious to me. Could you humour me and explain why you would want to do this? > > > As a page reader: > > > I want to have access to the entities behind the blurb. Today I can > > > see an opaque but nice looking Web page, I can also see the markup > > > behind the page, but I cannot easily discern the description of > > > entities mentioned in a Web Page. > > > > What good are these entities? What is my dad supposed to do with them? > > The same thing that the people talking with our librarians are doing > with them, actually find the information they look for. Could you be more specific? Are these HTML files? PNG files? RDF triples? Is the user expected to store them? Read them? Print them and go to the library with them? > > If the above represents the state of the art for RDF or RDFa, then we > > are a _long_ way from RDF being ready to be exposed to regular users. > > Yeah... Well, it is a question of how you'd expose it... It is the data, > not the model that is interesting to expose to the user right now. If the state of the art does not yet make this data actually usable by the user, then we shouldn't expose it, or we will permanently break the feature and make it unusable. Here's an example of this actually happening: HTML4 has a "longdesc" attribute on <img> elements that was intended to allow the author to provide a URI to a Web page that described the image. The state of the art in accessibility tool wasn't really ready for this, and browsers didn't do anything with it. However, there was evangelisation for people to use it. People didn't know how to use it, but knew they should use it. They had no feedback loop to determine if they were using it correctly. They ended up uniformly using it wrongly (on 99.9987% of pages it is either missing or used wrong, according to our data [1]). This made the feature essentially useless, because once the tools supported the feature, users were actually worse off for using it, even though it was intended to help them. [1] http://blog.whatwg.org/the-longdesc-lottery If the state of the art is not ready for RDF to be used widely, then we shouldn't expose it yet, because otherwise we will poison the well and make it unusable. > > People have a hard enough time (as you point out!) doing simple > > natural language queries where all they have to do is express > > themselves in their own native tongue. > > > > Asking them to understand "yago:BattlesOfTheNapoleonicWars" or > > "dbpedia-owl:MilitaryConflict" isn't going to fly. > > Actually, this is an easier problem that you'd might think, it just > hasn't had any attention yet. It is easy enough to attach an rdfs:label > to those URIs, in any language, which would make it a lot more friendly. IMHO these kinds of problems need to be resolved _before_ we unleash RDF onto the world in HTML. In practice, I fear you'll find localisation of a huge number of terms like the above is far, far harder a problem than just attaching an rdfs:label as your suggest. On Fri, 13 Feb 2009, Jeremy Carroll wrote: > > e.g. could these additional attributes be included in a script data block element? > > <script type="text/rdfa"> > about="http://example.org" > datatype="xsd:int" > </script> You could actually just include raw RDF/XML or n3 or any other RDF serialisation straight into the <script> block, no need for RDFa at that point. This is allowed and possible in HTML5 today. If it turns out that RDFa or some other data annotation mechanism isn't added to HTML5, this would be the (suboptimal, certainly) alternative. > (although not wanting to get back to the rdf/xml in comments within > HTML) ... are we allowed an XML element inside a script element - that > would be less ugly. Yes, this is allowed, provided you don't have the "</script" sequence anywhere in the content. On Sat, 14 Feb 2009, Kingsley Idehen wrote: > > In a sense, we are actually playing out via this debate the very thing > we are hoping the Web will ultimately simplify: discourse discovery and > participation. I assume you don't mean RDFa will actually literally help with discussions like this... if you do mean that, could you elaborate on how? That would be something that would be convincing. It doesn't sound like it needs broad uptake to work; is there something I can do to obtain this benefit immediately? Is there software that already helps with this? > NLP is not the issue at hand here. This isn't about linguistics. It is > about structured data, more like a DBMS. We haven't added any kind of declarative DBMS mechanism to HTML5 either. We have added an API for working with SQL, though. I could understand wanting an API for working with RDF data directly, but that doesn't seem to be what is being requested here. > > How would an RDF/RDFa system deal with people gaming the system? > > Great question! > > It would help identify the people gaming the system. See the recent > foaf+ssl [1] endeavor for instance. It appears you didn't include the link. I would be quite interested in finding out more about this. How would the people be identified when they are anonymous wikipedia contributors posting using automated distributed malware running on the aforementioned unsuspecting users' computers, either generating new certificates for each edit, or hijacking the user's certificates? > > How would an RDF/RDFa system deal with the problem of the _questions_ > > being unstructured natural language? > > RDF has a query language: SPARQL. Ok, but none of these users are going to learn SPARQL, so that's mostly an academic concern. What is the UI going to look like? Where is it going to come from? > > How would an RDF/RDFa system deal with data provided by companies that > > have no interest in providing the data in RDF or RDFa? (e.g. companies > > providing data dumps in XML or JSON.) > > We transform, and the expose as RDFa, as per this example which does > expose RDFa: > > http://linkeddata.uriburner.com/about/html/http://en.wikipedia.org/wiki/Napoleon_I_of_France This seems to assume that you have the license to do this, which in most cases you would not. Incidentally, this brings up an interesting question. The above Wikipedia page says "An autopsy concluded he died of stomach cancer". How would this be exposed in RDFa? Or is this not the kind of thing that we would expose? > Amazon should not be a factor in this discussion. Ditto Google, or any > other entity. We are talking about the Web. Amazon was brought up as a use case for RDFa: http://lists.w3.org/Archives/Public/public-rdfa/2009Feb/0035.html ...which is the only reason I mentioned it. Google was brought up in the context of a search engine (I also brought up Microsoft's search engine at the same time), because search engines are how users find information on the Web today, and RDFa was put forward as a way to address the problem of users wanting to find out information. These seem like reasonable reasons for them to be a factor in the discussion. I'd also like to point out that both Google and Amazon are part of the Web, so they are reasonable topics of discussion when we are talking about the Web, as you put it. > > This is what I mean by evaluating solutions, by the way. I don't > > personally care whether we use RDFa or something else. I _do_, > > however, want to make sure that whatever solution we end up using is a > > solution that actually solves the problems we set out to solve. > > > > Here, if the problem is "associate price with item on Amazon pages", > > RDFa does not solve the problem. > > RDFa allows us to choose to associate price with an item in a structured > way. Right, but if the problem is "associate price with item on Amazon pages", as opposed to the pages of someone else, then RDFa does not solve the problem, because, as demonstrated by Amazon's use of APIs rather than a simple class value, Amazon has needs that aren't addressed by RDFa. > Here is a document (with RDFa) about "Mosquitoes" from the GoeSpecies > Linked Data space. A few email exchanges between the GeoSpecies kbase > author and I lead to this: > > http://linkeddata.uriburner.com/about/html/http://species.geospecies.org/family_concept_uuid/1e0e9bfe-f1ee-4b14-9511-cb896e8ebf97/ > > The document above is itself a purveyor of structured data for anyone > esle on the Web to exploit. > > I don't want to be the only one capable of doing this on the Web, I want > anyone that uses the Web to be able to do this, and RDFa is a very low > cost mechanism for achieving this goal. > > I was to express myself clearly and succinctly without compromising > clarity or brevity, when I publish documents on the Web. Likewise, I > want to read documents from others who are able to do the very same > thing: express themselves clearly and succinctly without compromising > clarity or brevity. How does my biologist friend, who knows nothing about computers, but does know about mosquitoes, make use of this information? How does it help her more than this page would?: http://species.geospecies.org/family_concept_uuid/1e0e9bfe-f1ee-4b14-9511-cb896e8ebf97/ This isn't a rhetorical question. I'm sure that there is indeed something that would help my friend here. I just don't know what it is. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 17 February 2009 06:48:05 UTC