Re: RDFa and Web Directions North 2009 from Ian Hickson on 2009-02-13 (public-rdfa@w3.org from February 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 13 Feb 2009 23:58:56 +0000 (UTC)
To: Kjetil Kjernsmo <kjetil@kjernsmo.net>, Karl Dubost <karl@la-grange.net>, Kingsley Idehen <kidehen@openlinksw.com>
Cc: public-rdfa@w3.org, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Sam Ruby <rubys@intertwingly.net>, Dan Brickley <danbri@danbri.org>, Michael Bolger <michael@michaelbolger.net>, Tim Berners-Lee <timbl@w3.org>, Dan Connolly <connolly@w3.org>
Message-ID: <Pine.LNX.4.62.0902132324180.952@hixie.dreamhostps.com>
Just to clarify -- while I'm being devil's advocate below, that doesn't
mean I personally have an opinion on this matter. I apply the same
skeptical standards to everything that is proposed for HTML5. (If you
think HTML5 is bloated, just imagine how big it would be if I didn't!)

Please don't take these questions as personal attacks. I honestly am 
trying to find out how RDF and RDFa are to work in HTML5, to see if they 
make sense to add.


On Fri, 13 Feb 2009, Kjetil Kjernsmo wrote:
> On Friday 13 February 2009, Ian Hickson wrote:
> > Note that you can already "ask questions" on the Web. For example, I 
> > just searched for "which country napolean", which is neither the right 
> > question nor correctly spelt (though that wasn't intentional), and 
> > Google answered:
> 
> Well, you just proved that google sucks, didn't you? It couldn't get the 
> answer to that basic question right...

Would a system based on RDF or RDFa give a better answer to the same 
question? How? Is there a system running somewhere that can demonstrate 
this? Does it require all data to be marked up as RDFa?


> Another example, I'd like to have the latest version of the SPARQL 
> Update spec, and I expect to get it if I ask for "sparql update".

How does RDF or RDFa solve this problem?


> Add a some relations, and you have a decent chance, so that's what we 
> did, so what you can do now is just search for what you know, and with 
> two clicks, you'll find several things that will give you some 
> associations, such as "inflammation of the lining of the lungs", or 
> pleuritis, which gets you on the right track. Google requires you to 
> offer these associations up front, it doesn't assist with anything 
> beyond simple speling mistakes, whereas just a little bit of knowledge 
> organisation will assist people in finding what they want.

Do we have reason to believe that it is more likely that we will get 
authors to widely and reliably include such relations than it is that we 
will get high quality natural language processing? Why?

How would an RDF/RDFa system deal with people gaming the system?

How would an RDF/RDFa system deal with the problem of the _questions_ 
being unstructured natural language?

How would an RDF/RDFa system deal with data provided by companies that 
have no interest in providing the data in RDF or RDFa? (e.g. companies 
providing data dumps in XML or JSON.)

How would an RDF/RDFa system deal with companies that do not want to 
provide the data free of charge?

How would an RDF/RDFa system deal with companies that want to track 
per-developer usage of their data?


On Fri, 13 Feb 2009, Kjetil Kjernsmo wrote:
> On Friday 13 February 2009, Ian Hickson wrote:
> >If Amazon couldn't even be bothered to add a class for "price" in the 
> >last  decade, why do we believe they will add RDFa? 
> 
> Because RDF(a) is actually powerful, class isn't. That's what I think 
> anyway...

The problem description was just to get a relationship between an item and 
a price. Both a simple set of classes and RDFa completely solve this 
problem. Being more powerful is irrelevant in the context of that problem.


> >How does RDFa solve the problem that they have that I described but 
> >that you cut from the above quotes, namely that they want to track 
> >usage on a per-developer basis?
> 
> OK, it doesn't.

If the problem is that we want price data out of Amazon pages, and RDFa 
doesn't solve the problem to Amazon's satisfaction, then why is RDFa being 
put forward as a solution?

This is what I mean by evaluating solutions, by the way. I don't 
personally care whether we use RDFa or something else. I _do_, however, 
want to make sure that whatever solution we end up using is a solution 
that actually solves the problems we set out to solve.

Here, if the problem is "associate price with item on Amazon pages", RDFa 
does not solve the problem.


> > How did you resolve issues such as different vocabularies having 
> > different enumerations of music genres?
> 
> Mostly, they are owl:SameAs. In some cases, they are owl:UnionOf, and in 
> some case owl:IntersectionOf. It solved every problem we had, but 
> conceivably, you could have genres that we weren't aware of existed. In 
> which case we would have looked up in large ontologies managed by 
> someone else. The archive/library/museum sector has a lot of useful 
> stuff here.
>
> > Why was getting data out of those varied formats hard? For ID3, for 
> > example, it's three lines of perl to get most of the common 
> > information.
> 
> Yeah, but this was written in Java, it takes at least 10 times as much 
> code ;-) Besides, I didn't say it was hard, I said it was harder than 
> e.g. aligning genres. Which says a lot about how easy this is to do! :-)

What did you do with the genres once you had them all aligned with union, 
intersection, and sams-as relationships? That doesn't seem like the most 
useful structure for data to be exposed to a random user.


On Sat, 14 Feb 2009, Karl Dubost wrote:
> 
> I love natural language processing too. It is useful, though it doesn't 
> solve everything (except maybe in an English centric world.)

Nobody said it would solve everything. My point was just that it solved 
one specific problem as well as RDFa does.


On Fri, 13 Feb 2009, Kingsley Idehen wrote:
> Ian Hickson wrote:
> > On Fri, 13 Feb 2009, Kingsley Idehen wrote:
> >   
> > > When writing HTML (by hand or indirectly via a program) I want to 
> > > isolate at describe what the content is about in terms of people, 
> > > places, and other real-world things. I want to isolate "Napoleon" 
> > > from a paragraph or heading, and state that the aforementioned 
> > > entity is: is of type "Person" and he is associated with another 
> > > entity "France".
> > 
> > Why? (I do not ask this rhetorically.)
>
> As a page writer:
> 
> I want to provide pointers to detailed descriptions of the things I 
> mention in what I write.

Isn't an <a href=""> suitable for this already?


> I want to be able to express myself succinctly with pointers to other 
> places on the Web where descriptions of the people, places, subject 
> matter can be obtained.

Again, <a href=""> seems to have solved this problem well until now, why 
does it no longer solve the problem?


> Note, I don't want to point them to another chunk of blurb, I want to 
> point my readers to a page that has the sole function of describing the 
> aforementioned entities via their attributes and relationships.

Why?


> I want to point to a resource that exposes a granular description of 
> Napoleon e.g., <http://dbpedia.org/resource/Napoleon_I_of_France> 
> instead of one that is quite opaque e.g, 
> <http://en.wikipedia.org/wiki/Napoleon_I_of_France> .

Why?


> As a page reader:
> I want to have access to the entities behind the blurb. Today I can see 
> an opaque but nice looking Web page, I can also see the markup behind 
> the page, but I cannot easily discern the description of entities 
> mentioned in a Web Page.

What good are these entities? What is my dad supposed to do with them?


> > How do you envisage these annotations being used?
>
> To spread knowledge via the Web. Basically, accelerate propagation and 
> exploitation of collective knowledge.

Wikipedia seems to be doing this pretty well; how is RDFa going to make 
the thousands or millions of Wikipedia contributors faster?


> Let's say Google did provide the perfect answer (which it absolutely 
> doesn't), what bearing does that actually have on my example. The fact 
> that you can do something one way is stylistic and not the basis for 
> invalidating alternatives.

In general with HTML5 we have tried to reduce redundancy in solutions -- 
once we have solved a problem once, we try not to solve it again unless 
there are compelling reasons to do so. This is because anything we add to 
the Web platform will -- even if mostly unsuccessful -- have to be 
supported for decades to come, with a resulting high price.


> Back to your Google example, Google answered, but where is the 
> disambiguation? How does it determine the context for the search? There 
> are many dimensions to "Napoleon" and Google statistically guessed one 
> based on link density of its subjectively assembled index and page rank 
> algorithm. How do you as writer or ready efficiently navigate the many 
> aspects/facets associated with the pattern: "Napoleon"?

Can an RDF/RDFa system do better from a natural language query?


> The essence of the Web is openness, no presumptions, and allegiance to 
> "open world assumptions".
> 
> Here is what I can do instead of depending solely on Google,. I call 
> this "Search" + "Find" as the following sequence is a simple demo.
> 
> visit; http://lod.openlinksw.com/fct/facet.vsp .
> 
> The service above allows me use the attributes, relationships, and types 
> of my search subject to find what I want. All of this happens because I 
> am interacting with the data entities distilled from Web pages and other 
> data sources. Remember, RDFa is about simplifying the distillation 
> process above all else without the tedium of alternative approaches such 
> as RDF/XML.
> 
> Here are my steps to context driven search and find re. Napoleon:
> 
> 1. Enter pattern: Napoleon and get what  I would get normally from Google
> (without page rank algorithm applied)
> 2. Click on "Type" in the Navigation section to filter my search result by
> Entity Types
> 3. Pick "dbpedia-owl:MilitaryConflict". "yago:BattlesOfTheNapoleonicWars ",
> "yago:BattlesInvolvingRussia "
> 4. From the drop down (in navigator section) click on Map
> 
> Steps 1-4 basically translate to:
> 
> e1 has any property whose value contains "Napoleon".
> e1 is a dbpedia-owl:MilitaryConflict .
> e1 is a yago:BattlesOfTheNapoleonicWars .
> e1 is a yago:BattlesInvolvingRussia .
> 
> Where "e" is an entity (or S in SPO parlance).
> 
> I hope I've made my use-case for knowledge dissemination as part of an effort
> to objectively drive collective intelligence via the Web a little clearer?

If the above represents the state of the art for RDF or RDFa, then we are 
a _long_ way from RDF being ready to be exposed to regular users.

People have a hard enough time (as you point out!) doing simple natural 
language queries where all they have to do is express themselves in their 
own native tongue.

Asking them to understand "yago:BattlesOfTheNapoleonicWars" or 
"dbpedia-owl:MilitaryConflict" isn't going to fly.

I'm also rather confused as to how this is going to scale. I am already 
completely overwhelmed by this interface. How will it work when there are 
hundreds of millions of users all using their own ontologies and relations?

Again, these are not rhetorical questions. I'm trying to work out how RDF 
or RDFa solve real problems for typical end users.


On Fri, 13 Feb 2009, Kingsley Idehen wrote:
> 
> Does it take an attribute to make a new language? But let's say that it 
> does, is that costly when it doesn't break anything?

If you are asking "What is the cost of adding an attribute to HTML5?", 
then the answer is actually quite high. Features need implementing, 
testing, documenting; tutorial authors need to understand and explain it; 
data mining tools would need to support the new feature; bugs would need 
to be worked out; user interfaces would need to be designed and usability- 
tested, iterated upon, deployed; problems with spam and malicious data 
would need tracking down, solutions found, deployed; security bugs, if 
any, would need fixing; problems with data reliability would need to be 
addressed and fixed; etc.

If features solve new problems in practical ways, then the cost is worth 
it. If they don't, then it behooves us not to add them, as the total cost 
to the community would not be trivial.


On Fri, 13 Feb 2009, Kingsley Idehen wrote:
> Ian Hickson wrote:
> > On Fri, 13 Feb 2009, Kjetil Kjernsmo wrote:
> >
> > If Amazon couldn't even be bothered to add a class for "price" in the 
> > last decade, why do we believe they will add RDFa?
>
> Great question.
> 
> Because RDFa isn't a decade old.

On what basis do you believe that newer technologies get adopted more than 
older ones?


> It's taken a decade to get to the point where making statements for the 
> RDF model is simple and unobtrusive enough for it to happen within HTML. 
> In a nutshell, what wasn't cost-effective to Amazon over the last 10 
> years, no longer necessarily holds today.

It's no less expensive to use RDFa than use class attribute values.


> But for arguments sakes, lets assume Amazon doesn't use RDFa, so what? 
> That's no different than your earlier analogy re. Google.

If Amazon doesn't use RDFa, then the problem (how to expose prices of 
Amazon products) isn't solved by RDFa, and thus the use case is not an 
argument for RDFa.


> The Web is not about behemoths, it about a democratic space with multi 
> tiered dexterity along the following vectors:
> 
> 1. Data
> 2. Information
> 3. Knowledge.
> 
> The Web was bootstrapped via point 2 (as an information space), but it 
> needs the trinity above to stay true to its essence.

The Web is not about behemoths, no. But I think it would be more accurate 
to depict it as meritocractic and evolutionary rather than democratic; I 
think it would be more accurate to say that it is based on social 
interactions, end-user needs, and porn, than data, information, or 
knowledge; and I do not think that we have any kind of responsibility to 
keep the Web true to its original essence if the Web community as a whole 
moves in another direction.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 13 February 2009 23:59:35 UTC