W3C home > Mailing lists > Public > www-archive@w3.org > May 2009

Public e-mails that fed into the microdata use cases

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 8 May 2009 21:26:27 +0000 (UTC)
To: www-archive@w3.org
Message-ID: <Pine.LNX.4.62.0905082122280.7824@hixie.dreamhostps.com>

Attached is a MIME digest of e-mails that were used as the source of 
microdata use cases, scenarios, and requirements.

This collection excludes a dozen or so private e-mails.

In addition, feedback was collected from the following Web pages:

http://realtech.burningbird.net/semantic-web/semantic-markup/stop-justifying-rdfa
http://realtech.burningbird.net/semantic-web/semantic-markup/stop-justifying-rdfa?page=1
http://rdfa.info/wiki/Rdfa-use-cases and related pages
http://developer.yahoo.com/searchmonkey/ and related pages

...as well as IRC discussions and a number of private conversations.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

attached mail follows:



Summary:

I believe that there are use cases for RDFa - and that they are precisely  
the sort of thing that Yahoo, Google, Ask, and their ilk are not going to  
be interested in, since they are based on solving problems that those  
search engines do not efficiently solve, such as (among others) using  
private data or dealing with trustworthy data to answer very specific  
questions automatically.

If Ian needs to understand the Semantic Web Industry and why people have  
invested in the RDFa proposal, then it is important to identify the right  
questions, and having him alone identify the sub-questions when he doesn't  
understand the issue isn't going to help him make a well-informed decision.

Some of Ian's questions are discussed here. I cut the mail "short" since I  
think it is already too long for many people, which means that the debate  
will simply pass without their reading or input.

On Wed, 31 Dec 2008 20:46:01 +1100, Ian Hickson <ian@hixie.ch> wrote:

> One of the outstanding issues for HTML5 is the question of whether HTML5
> should solve the problem that RDFa solves, e.g. by embedding RDFa
...
> Before I can determine whether we should solve this problem, and before I
> can evaluate proposals for solving this problem, I need to learn what the
> problem is.
>
> Earlier this year, there was a thread on RDFa on the WHATWG list. Very
> little of the thread focused on describing the problem. This e-mail is an
> attempt to work out what the problem is based on that feedback, on
> discussions at the recent TPAC, and on other research I have done.
>
>
> On Mon, 25 Aug 2008, Manu Sporny wrote:
>> Ian Hickson wrote:
>> > I have no idea what problem RDFa is trying to solve. I have no idea
>> > what the requirements are.
>>
>> Web browsers currently do not understand the meaning behind human
>> statements or concepts on a web page. If web browsers could understand
>> that a particular page was describing a piece of music, a movie, an
>> event, a person or a product, the browser could then help the user find
>> more information about the particular item in question. It would help
>> automate the browsing experience. Not only would the browsing experience
>> be improved, but search engine indexing quality would be better due to a
>> spider's ability to understand the data on the page with more accuracy.
>
> Let's see if I can rephrase that in terms of requirements.
>
> * Web browsers should be able to help users find information related to
>   the items that page they are looking at discusses.
>
> * Search engines should be able to determine the contents of pages with
>   more accuracy than today.
>
> Is that right?
>
> Are those the only requirements/problems that RDFa is attempting to
> address? If not, what other requirements are there?

I don't think so. I think there are some other requirements:

A standard way to include arbitrary data in a web page and extract it for  
machine processing, without having to pre-coordinate their data models.

Since many people use RDF as an interchange, storage and processing format  
for this kind of data (because it provides for automated mapping of data  
 from one schema to many others, without requiring anyone to touch the  
original schemata or agree in advance how they should be created), I  
believe there is a requirement for a method that allows third parties to  
include RDF data in, and extract it from information encoded within an  
HTML page.

>> The Microformats community has done a remarkable job of working on the
>> web semantics problem, creating several different methods of expressing
>> common human concepts (contact information (hCard), events (hCalendar),
>> and audio recordings (hAudio)).
>
> Right; with Microformats, each Microformat has its own problem space and
> thus each one can be evaluated separately. It is much harder to evaluate
> something when the problem space is as generic as it appears RDFa's is.

The point is that there are a very large set of very small problem spaces  
relevant to a small group at a time. Like RDF itself, RDFa is meeting the  
problem of allowing these people to share machine-processable data without  
previously coordinating their approach.

>> The results of the first set of Microformats efforts were some pretty
>> cool applications, like the following one demonstrating how a web
>> browser could forward event information from your PC web browser to your
>> phone via Bluetooth:
>>
>> http://www.youtube.com/watch?v=azoNnLoJi-4
>
> It's a technically very interesting application. What has the adoption
> rate been like? How does it compare to other solutions to the problem,
> like CalDav, iCal, or Microsoft Exchange? Do people publish calendar
> events much? There are a lot of Web-based calendar systems, like MobileMe
> or WebCalendar. Do people expose data on their Web page that can be used
> to import calendar data to these systems?

In some cases this data is indeed exposed to Webpages. However, anecdotal  
evidence (which unfortunately is all that is available when trying to  
study the enormous collections of data in private intranets) suggests that  
this is significantly more valuable when it can be done within a  
restricted access website.

...
>> In short, RDFa addresses the problem of a lack of a standardized
>> semantics expression mechanism in HTML family languages.
>
> A standardized semantics expression mechanism is a solution. The lack of  
> a solution isn't a problem description. What's the problem that a
> standardized semantics expression mechanism solves?

There are many many small problems involving encoding arbitrary data in  
pages - apparently at least enough to convince you that the data-*  
attributes are worth incorporating.

There are many cases where being able to extract that data with a simple  
toolkit from someone else's content, or using someone else's toolkit  
without having to tell them about your data model, solves a local problem.  
The data-* attributes, because they do not represent a formal model that  
can be manipulated, are insufficient to enable sharing of tools which can  
extract arbitrary modelled data.

RDF, in particular, also provides estabished ways of merging existing data  
encoded in different existing schemata.

There are many cases where people build their own dataset and queries to  
solve a local problem. As an example, Opera is not intersted in asking  
Google to index data related to internal developer documents, and use it  
to produce further documentation we need. However, we do automatically  
extract various kinds of data from internal documents and re-use it. While  
Opera does not in fact use the RDF toolstack for that process, there are  
many other large companies and organisations who do, and who would benefit  
 from being able to use RDFa in that process.

>> RDFa not only enables the use cases described in the videos listed
>> above, but all use cases that struggle with enabling web browsers and
>> web spiders understand the context of the current page.
>
> It would be helpful if we could list these use cases clearly and in  
> detail so that we could evaluate the solutions proposed against them.
>
> Here's a list of the use cases and requirements so far in this e-mail:
>
> * Web browsers should be able to help users find information related to
>   the items that page they are looking at discusses.
>
> * Search engines should be able to determine the contents of pages with
>   more accuracy than today.
>
> * Exposing calendar events so that users can add those events to their
>   calendaring systems.
>
> * Exposing music samples on a page so that a user can listen to all the
>   samples.
>
> * Getting data out of poorly written Web pages, so that the user can find
>   more information about the page's contents.
>
> * Finding more information about a movie when looking at a page about the
>   movie, when the page contains detailed data about the movie.
>
> Can we list some more use cases?
>
>
> Here are some other questions that I would like the answers to so that I
> can better understand what is being proposed here:
>
> Does it make sense to solve all these problems with the same syntax?

That depends on the answers to your next two questions.

Moreover, that is not actually a very good question in this case. I think  
the judgement call should be whether a syntax that allows people to solve  
the identified problem set consistently is sufficiently valuable (measured  
in terms of the advantages weighed against the disadvantages) to justify  
being part of HTML5.

> What are the disadvantanges of doing so?

I am not sure.

> What are the advantages?

Many people will be able to use standard tools which are part of their  
existing infrastructure to manipulate important data. They will be able to  
store that data in a visible form, in web pages. They will also be able to  
present the data easily in a form that does not force them to lose  
important semantics.

People will be able to build toolkits that allow for processing of data  
 from webpages without knowing, a priori, the data model used for that  
information.

> What is the
> opportunity cost of encouraging everyone to expose data in the same way?

I don't know. I don't see much of an opportunity cost.

> What is the cost of having different data use specialised formats?

If the data model, or a part of it, is not explicit as in RDF but is  
implicit in code made to treat it (as is the case with using scripts to  
process things stored in arbitrarily named data-* attributes, and is also  
the case in using undocumented or semi-documented XML formats, it requires  
people to understand the code as well as the data model in order to use  
the data. In a corporate situation where hundreds or tens of thousands of  
people are required to work with the same data, this makes the data model  
very fragile.

Such considerations also apply to larger communities, for example those  
dealing with complex scientific information.

> Do publishers actually want to use a common data format?

It would appear so - even in cases where they don't want to publish their  
data in such an easy-to-use format for commercial reasons.

> How have past efforts in creating data formats fared?

Some have been pretty successful. Dublin Core is a general format for  
labelling content that is widely used. MARC records have been very  
successful.

> Are enough data providers actually willing to expose their data in a
> machine readable manner for this to be truly useful?

To make this truly useful it doesn't need to be exposed to the public. It  
would appear that organisations are prepared to make large investments in  
RDF data whether they expose them or not (and some very large ones do  
expose data), which suggests that this data is truly useful.

> If data providers
> will be willing to expose their data as RDFa, why are they not already
> exposing their data in machine-readable form today?
>
>  - For example, why doesn't Amazon expose a CSV file of your usage
>    history, or an Atom feed of the comments for each product, or an
>    hProduct annotated form of their product data? (Or do they? And if so,
>    do we know if users use this data?)

Why would they need to?

>  - As another example, why doesn't Craigslist like their data being  
>    reused in mashups? Would they be willing to allow their users to reuse
>    their data in these new and exciting ways, or would they go out of
>    their way to prevent the data from being accessible as soon as a
>    critical mass of users started using it?

This is a key question. Why *should* a data provider be required to offer  
their product (data) for other people to use, in order to demonstrate that  
the data is useful. Google, a large provider of data, insists on certain  
conditions being met before it makes its services available, and that  
seems perfectly reasonably to me.

Whether Craigslist actively attempts to make their data easier to  
aggregate, or actively avoids facilitating that process, strikes me as  
irrelevant to the question of whether there is value in enabling them to  
do so. Because large organisations specialising in gathering people's  
data, from Flickr to Google and Facebook to Government taxation  
departments are not the only consumers and producers of data that  
determine value for users.

It would seem important that the Web easily enable small-time users of  
data to efficiently communicate with one another, without the need to have  
one of the giants as an intermediary. When libraries in the Dominican  
Republic want to share data, and librarians in Léon want to use that data,  
it seems that the Web should facilitate that without resorting to  
intermediaries like Amazon or Yahoo! and since we already have the  
technology to do so in a way that enables very powerful data models to be  
used without requiring coordination, it seems odd that you don't even  
understand how this could be valuable.

> What will the licensing situation be like for this data? Will the  
> licenses allow for the reuse being proposed to solve the problems and
> use cases listed above?

In some cases yes, and in some cases no. In other words, making such data  
available does not distort natural market conditions one way or another.

> How are Web browsers going to expose user interfaces to answer user
> questions?

I am glad to see that you think user interface behaviour is in fact  
important to the process of specifying HTML (I had been under the  
impression that you believed the spec should not touch on it). There are  
various query systems already available in browsers, from the search  
engine in Opera that lets you do a free-text search on pages stored in  
your history to Tabulator - a substantial RDF browser available as a  
Widget for Opera or as an extension to Firefox, that allows for a variety  
of pre-configured questions as well as free-form questions.

> Can only previously configured, hard-coded questions be asked,
> or will Web browsers be able to answer arbitrary free-form questions from
> users using the data exposed by RDFa?

Both of these are possible. The value of RDFa is that it actually supports  
the possibility of asking free-form questions by using a data model that  
is sufficiently well specified to enable constructions of tools that are  
not dependent on being preconfigured to recognise the exact type of data  
being queried (unlike, say, microformats, which require an intermediate  
agreement to enable people to extract the data, and don't provide for  
merging data of different types for rich queries).

> How are Web browsers that expose this data going to handle data that is
> not exposed in the same format? For example, if a site exposes data in
> JSON or CSV format rather than RDFa, will that data be available to the
> user in the same way?

Who cares? But for those who do, this is up to Web browsers. They can  
choose to implement transformations between some particular CSV data and  
RDFa. The difficulty here (and therefore illustration of the value of  
RDFa) is that CSV data has important details of the meaning of the data  
only available out of band in looking at how the data is recorded, while  
RDF allows for automating the process of merging data originally encoded  
in different RDFa vocabularies.

...

> What is the expected strategy to fight spam in these systems? Is it
> expected that user agents will just collect data in the background? If  
> so, how are user agents expected to distinguish between pages that have
> reliable data and pages that expose data that is misleading or wrong?

Aggregating data in real-time is relatively expensive, so is a strategy  
more suited to dealing with asking new questions. Typical systems so far  
have aggregated data in the background to deal with known queries (one  
example is Google, which crawls pages in advance, anticipating searches  
that match terms against the content of those pages), and use live  
querying for cases where the result cannot reliably be stored (e.g.  
airline reservation systems like TravelJungle or LastMinute which  
determine price and availability based on constantly changing data).

Different use cases will imply different strategies for fighting spam.  
Some obvious ones are to rely on trusted sites and secured and signed  
data, to use reputation managers, to follow the "shape" of data over time  
so that anamolies can be highlighted and checked more carefully (in the  
manner of Bayesian filters for email). Some use cases don't care much  
about spam, or are not very interesting to spammers. Some use cases are  
private data anyway.

>  - Systems like Yahoo! Search and Live Search expend extraordinary  
>    amounts of resources on spam fighting technology; such technology
>    would not be accessible to Web browsers unless they interacted with
>    anti-spam services much like browsers today interact with
>    anti-phishing services.

Actually, at least Opera already incorporates anti-spam technology in its  
mail client. Where browsers are the primary consumers of data there is  
nothing at all to suggest that they cannot incorporate anti-spam  
technology directly. (Indeed, the POWDER specification is designed in part  
to make that easy - and it is exactly the sort of data that might  
sometimes be usefully encoded in RDFa since it is based on an RDF model).

>    Yet anti-phishing services have been controversial, since they involve
>    exposing the user's browsing history to third parties; anti-spam
>    services would be a significantly greater problem due to the vastly
>    greater level of spamming compared to phishing. What is the solution
>    proposed to tackle this problem?

It is not clear that this problem is any different in the context of RDFa  
to the general problem already faced by the Web. In general, the solutions  
proposed are the same as those already used on the Web, and of course  
those in development.

>  - Even with a mechanism to distinguish trusted sites from spammy sites,
>    how would Web browsers deal with trusted sites that have been subject
>    to spamming attacks? This is common, for instance, on blogs or wikis.

Right. But that doesn't mean we question whether browsers should enable  
blogs or wikis. Why would RDFa data be different enough to make this  
question relevant?

> These are not rhetorical questions, and I don't know the answers to them.

Some of them seem to be poorly phrased, although if you don't understand  
why people have been working on this technology and why they think it  
would be valuable to have it available in HTML I guess that is almost  
inevitable.

> We need detailed answers to all those questions before we can really
> evaluate the various proposals that have been made here.

No, we apparently need you to personally understand the Semantic Web  
Industry. Determining answers to the questions which are important is  
probably helpful, but also helpful is explaining when your questions are  
irrelevant because they are based on a lack of understanding. This is not  
intended as a slight, but to clarify the process required to have  
something as large as the "Sematic Web" (capital letters, implying the  
whole W3C activity, the industry based around RDF, and so on) evaluated  
for potential inclusion in the HTML5 specification.

I presume the same would apply if the "Web Services" people came and asked  
to have all of their things included in HTML, and offered a specification  
that could be used to achieve their desires.
...

[not clear what the context was here, so citing as it was]
>> > I don't think more metadata is going to improve search engines. In
>> > practice, metadata is so highly gamed that it cannot be relied upon.
>> > In fact, search engines probably already "understand" pages with far
>> > more accuracy than most authors will ever be able to express.
>>
>> You are correct, more erroneous metadata is not going to improve search
>> engines. More /accurate/ metadata, however, IS going to improve search
>> engines. Nobody is going to argue that the system could not be gamed. I
>> can guarantee that it will be gamed.
>>
>> However, that's the reality that we have to live with when introducing
>> any new web-based technology. It will be mis-used, abused and corrupted.
>> The question is, will it do more good than harm? In the case of RDFa
>> /and/ Microformats, we do think it will do more good than harm.
>
> For search engines, I am not convinced. Google's experience is that
> natural language processing of the actual information seen by the actual
> end user is far, far more reliable than any source of metadata. Thus from
> Google's perspective, investing in RDFa seems like a poorer investment
> than investing in natural language processing.

Indeed. But Google is something of an edge case, since they can afford to  
run a huge organisation with massive computer power and many engineers to  
address a problem where a "near-enough" solution brings themn the users  
who are in turn the product they sell to advertisers. There are many other  
use cases where a small group of people want a way to reliably search  
trusted data.

 From global virtual library systems to a single websites, there are many  
others who find that processing structured data is more efficient for  
their needs than doing free-text analysis of web pages (something that  
they effectively contract out to Google, Ask, Yahoo! and their many  
competitors who specialise in it). Some of these are the people whe have  
decided that investing in RDFa is a far more valuable exercis than trying  
to out-invest Google in natural language processing.

This email is already too long for most people to get through it :( I  
believe that this discussion is going to last for some time (I cannot  
imagine why, given the HTML timeline, it would need to be resolved before  
June), so there will be time for others to discuss more fully the many  
points Ian raises as ones he would like to understand.

cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

attached mail follows:



On Jan 1, 2009, at 06:41, Charles McCathieNevile wrote:

> There are many cases where people build their own dataset and  
> queries to solve a local problem. As an example, Opera is not  
> intersted in asking Google to index data related to internal  
> developer documents, and use it to produce further documentation we  
> need. However, we do automatically extract various kinds of data  
> from internal documents and re-use it. While Opera does not in fact  
> use the RDF toolstack for that process, there are many other large  
> companies and organisations who do, and who would benefit from being  
> able to use RDFa in that process.

If the data production and consumption are both under the control of  
one entity (Opera in this case), why does the solution need to be  
engineered for spontaneous integration of decentralized data sources?

Do the savings of using off-the-shelf tools outweigh the cost they  
impose by not being quite right for any specific purpose? Presumably  
the Opera-specific processing is more significant than generic  
parsing. Or is it?

It seems that RDFa is motivated by private data and by interchange at  
the same time. This suggests multiple bilateral access control  
agreements instead of a Web-like system where data is made available  
for GETting without prior agreement between the parties.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



On Wed, Dec 31, 2008 at 10:41 PM, Charles McCathieNevile
<chaals@opera.com> wrote:
> A standard way to include arbitrary data in a web page and extract it for
> machine processing, without having to pre-coordinate their data models.

This isn't a requirement (or in other words, a problem), it's a
solution.  What are the problems that need to be solved, and for which
having a standard way to include arbitrary data in a web page and have
it easily extractable would be helpful?  (Note:  I think there
certainly *are* problems that *would* find this helpful, I'm just
trying to lead your argument into the right direction.)  (As well,
since the discussion is about RDFa specifically, not data-markup in
general, what are the problems that need RDFa *specifically* as a
solution, as compared to the myriad other ways to embed data?)

> Since many people use RDF as an interchange, storage and processing format
> for this kind of data (because it provides for automated mapping of data
> from one schema to many others, without requiring anyone to touch the
> original schemata or agree in advance how they should be created), I believe
> there is a requirement for a method that allows third parties to include RDF
> data in, and extract it from information encoded within an HTML page.

Solutions for this already exist; embedded N3 in a <script> tag, just
to name something that Ian already mentioned, allows you to mash RDF
data into a page in a machine-extractable way, and brings in any of
the specific ancillary benefits of RDF.

>>> The Microformats community has done a remarkable job of working on the
>>> web semantics problem, creating several different methods of expressing
>>> common human concepts (contact information (hCard), events (hCalendar),
>>> and audio recordings (hAudio)).
>>
>> Right; with Microformats, each Microformat has its own problem space and
>> thus each one can be evaluated separately. It is much harder to evaluate
>> something when the problem space is as generic as it appears RDFa's is.
>
> The point is that there are a very large set of very small problem spaces
> relevant to a small group at a time. Like RDF itself, RDFa is meeting the
> problem of allowing these people to share machine-processable data without
> previously coordinating their approach.

Not quite correct.  Again, the problem of embedded shareable data in a
web page has been solved multiple times.  The specific problem of
sharing *RDF* data (due to needing/wanting the specific benefits RDF
can offer) has also been solved.  What are the precise problems that
require *RDFa* as a solution?

(I won't belabor this point, though it could be brought up several
times more in your email.  This is and was the primary point of
contention between RDFa supporters and those of us who aren't
convinced it belongs in the HTML5 spec.  It is the major thrust of
much of Ian's email; he's trying to help you (RDFa supporters in
general, that is) find exactly what the problem is that RDFa
specifically is trying to solve.)

> Moreover, that is not actually a very good question in this case. I think
> the judgement call should be whether a syntax that allows people to solve
> the identified problem set consistently is sufficiently valuable (measured
> in terms of the advantages weighed against the disadvantages) to justify
> being part of HTML5.

Well, there are many things that would offer more advantages than
disadvantages by themselves.  We can't possibly include all of them in
the spec; you can think about this as including a hidden large
disadvantage of 'will grow the size of the spec and the amount of work
implementors have to do'.  Thus the advantages must generally be
significantly larger than the disadvantages; this is why the best
argument for including something in the spec is often "there are
already widespread hacks to accomplish this".  <video>, for example,
was included based on pretty much precisely that argument.

Of course, that just means that we've identified a problem that is
significant enough to be solved in the spec.  There is still
significant work involved in ensuring that we identify a solution that
actually hits the problem squarely; the existing hacks are usually
inadequate, not through any true fault of their own, but merely
because they had not considered the problem broadly enough, or lacked
enough eyes to find rough edges and missing spots.

>> What are the advantages?
>
> Many people will be able to use standard tools which are part of their
> existing infrastructure to manipulate important data. They will be able to
> store that data in a visible form, in web pages. They will also be able to
> present the data easily in a form that does not force them to lose important
> semantics.
>
> People will be able to build toolkits that allow for processing of data from
> webpages without knowing, a priori, the data model used for that
> information.

Part of the point of Ian's email is that this is not a problem that is
solved by RDFa, it's a problem that's solved by *any* sufficient data
format.  Many solutions currently exist which don't require any
addition to the spec.


>> What is the
>> opportunity cost of encouraging everyone to expose data in the same way?
>
> I don't know. I don't see much of an opportunity cost.

There is no perfect data model, or perfect representation method.
Every group of data is different, has different ideal representations,
and incurs some degree of cost when forced into an existing data model
(that is, one not tailored to the data's specs).  This must thus be
considered.

>>  - As another example, why doesn't Craigslist like their data being
>> reused in mashups? Would they be willing to allow their users to reuse
>>   their data in these new and exciting ways, or would they go out of
>>   their way to prevent the data from being accessible as soon as a
>>   critical mass of users started using it?
>
> This is a key question. Why *should* a data provider be required to offer
> their product (data) for other people to use, in order to demonstrate that
> the data is useful. Google, a large provider of data, insists on certain
> conditions being met before it makes its services available, and that seems
> perfectly reasonably to me.
>
> Whether Craigslist actively attempts to make their data easier to aggregate,
> or actively avoids facilitating that process, strikes me as irrelevant to
> the question of whether there is value in enabling them to do so. Because
> large organisations specialising in gathering people's data, from Flickr to
> Google and Facebook to Government taxation departments are not the only
> consumers and producers of data that determine value for users.
>
> It would seem important that the Web easily enable small-time users of data
> to efficiently communicate with one another, without the need to have one of
> the giants as an intermediary. When libraries in the Dominican Republic want
> to share data, and librarians in Lon want to use that data, it seems that
> the Web should facilitate that without resorting to intermediaries like
> Amazon or Yahoo! and since we already have the technology to do so in a way
> that enables very powerful data models to be used without requiring
> coordination, it seems odd that you don't even understand how this could be
> valuable.

This is precisely a key question because of many of the arguments that
RDFa supporters have brought up (specifically, in the last flurry of
emails to the group on this subject), that having RDFa will allow web
users to query their browsers, which can then seek out structured data
to answer their questions.  If large websites are not willing to
provide their data to the web-at-large in a structured format, though,
then all the data formats in the world won't accomplish the goal.

In this email, though, you are largely arguing for smaller, more
personal use cases.  Most of the questions are still valid, however.
Problem: Librarians across the world want to share data.  What are the
requirements here?  How is RDFa meet those requirements?  Are there
other solutions which meet those requirements better?  Are existing
solutions adequate if deployed consistently (thus negating the need
for a new technology)?

Specifically, small-time users seem (to me, at least) to need RDFa as
a solution the least.  They can negotiate a shared data format
themselves, or at least present an API that can be engineered against
by others.  RDF itself may be a useful tool here, if it allows reuse
of existing tools and thus simplifies the process of sharing and
consuming the data, but RDFa specifically is a solution for embedding
this data within a web page and allowing browsers to digest it as they
encounter it.  This is not an appropriate solution for the sharing of
catalog data between libraries; it *may* be a solution for the average
web user to have their browser grab the embedded information on a page
for a specific book and query for reviews on the product across the
web.

This, though, then once again brings up the traditional questions.  Is
RDFa the best solution for this?  Are there existing solutions to
this?  Ian specifically mentioned simply Googling for the book title;
this is indeed often quite adequate for a web user.  Does the use of
RDFa and the active involvement of the browser in this process offer
enough of a benefit above just typing a phrase into the search bar to
justify inclusion into the spec?  If you believe so, can you explain
precisely why?

>> Can only previously configured, hard-coded questions be asked,
>> or will Web browsers be able to answer arbitrary free-form questions from
>> users using the data exposed by RDFa?
>
> Both of these are possible. The value of RDFa is that it actually supports
> the possibility of asking free-form questions by using a data model that is
> sufficiently well specified to enable constructions of tools that are not
> dependent on being preconfigured to recognise the exact type of data being
> queried (unlike, say, microformats, which require an intermediate agreement
> to enable people to extract the data, and don't provide for merging data of
> different types for rich queries).

This is not a benefit of RDFa.  It *may* be a benefit of RDF.  What
does RDFa bring to the table that other solutions do not?  What does
it take away?

> Aggregating data in real-time is relatively expensive, so is a strategy more
> suited to dealing with asking new questions. Typical systems so far have
> aggregated data in the background to deal with known queries (one example is
> Google, which crawls pages in advance, anticipating searches that match
> terms against the content of those pages),

Google is a large company, and can indeed invest resources into
trawling and recording such data.  This is explicitly not an option
for the smaller uses you seem to be highlighting in this email,
though.  RDFa is specifically a (very) distributed data storage
system.  Can it address these sorts of problems, if the small-time
users simply can't trawl the entire web for matching information?
When the info is relatively contained (such that finding and reading
the pages it exists on is feasible), is trawling the pages for RDFa
data the best solution?  Are there other solutions which would work
better (such as providing an API for hitting a database)?  Are there
existing solutions which work adequately?

> and use live querying for cases
> where the result cannot reliably be stored (e.g. airline reservation systems
> like TravelJungle or LastMinute which determine price and availability based
> on constantly changing data).

Similarly, would these sites work by trawling reservation sites for
RDFa data?  As well, what if the reservation sites aren't interested
in providing the data in a machine-readable format (for example, if
they want users to go directly to their sites)?  Would it be better
for these types of sites to hit an API provided by the reservation
sites directly?  Would it be better for the discount sites to trawl
with custom algorithms that don't require the cooperation of the
reservation sites?  Within the space of page-embedded data, are there
better solutions, or existing adequate solutions?

>>  - Systems like Yahoo! Search and Live Search expend extraordinary
>> amounts of resources on spam fighting technology; such technology
>>   would not be accessible to Web browsers unless they interacted with
>>   anti-spam services much like browsers today interact with
>>   anti-phishing services.
>
> Actually, at least Opera already incorporates anti-spam technology in its
> mail client. Where browsers are the primary consumers of data there is
> nothing at all to suggest that they cannot incorporate anti-spam technology
> directly. (Indeed, the POWDER specification is designed in part to make that
> easy - and it is exactly the sort of data that might sometimes be usefully
> encoded in RDFa since it is based on an RDF model).

Fighting email spam is a different problem from fighting black-hat SEO
spamming.  The attack surfaces presented by RDFa are much closer to
the latter than the former.

>>  - Even with a mechanism to distinguish trusted sites from spammy sites,
>>   how would Web browsers deal with trusted sites that have been subject
>>   to spamming attacks? This is common, for instance, on blogs or wikis.
>
> Right. But that doesn't mean we question whether browsers should enable
> blogs or wikis. Why would RDFa data be different enough to make this
> question relevant?

Users are interacting with blogs/wikis on a human level, and thus can
exercise their own (admittedly poor in practice) judgement.  This is a
different problem from the browser automatically parsing data on a
page and removing the spam.

> I presume the same would apply if the "Web Services" people came and asked
> to have all of their things included in HTML, and offered a specification
> that could be used to achieve their desires.

It would be the case that they would be subject to the same questions
as the RDFa spec is, yes.

> ...
>
> [not clear what the context was here, so citing as it was]
>>>
>>> > I don't think more metadata is going to improve search engines. In
>>> > practice, metadata is so highly gamed that it cannot be relied upon.
>>> > In fact, search engines probably already "understand" pages with far
>>> > more accuracy than most authors will ever be able to express.
>>>
>>> You are correct, more erroneous metadata is not going to improve search
>>> engines. More /accurate/ metadata, however, IS going to improve search
>>> engines. Nobody is going to argue that the system could not be gamed. I
>>> can guarantee that it will be gamed.
>>>
>>> However, that's the reality that we have to live with when introducing
>>> any new web-based technology. It will be mis-used, abused and corrupted.
>>> The question is, will it do more good than harm? In the case of RDFa
>>> /and/ Microformats, we do think it will do more good than harm.
>>
>> For search engines, I am not convinced. Google's experience is that
>> natural language processing of the actual information seen by the actual
>> end user is far, far more reliable than any source of metadata. Thus from
>> Google's perspective, investing in RDFa seems like a poorer investment
>> than investing in natural language processing.
>
> Indeed. But Google is something of an edge case, since they can afford to
> run a huge organisation with massive computer power and many engineers to
> address a problem where a "near-enough" solution brings themn the users who
> are in turn the product they sell to advertisers. There are many other use
> cases where a small group of people want a way to reliably search trusted
> data.
>
> From global virtual library systems to a single websites, there are many
> others who find that processing structured data is more efficient for their
> needs than doing free-text analysis of web pages (something that they
> effectively contract out to Google, Ask, Yahoo! and their many competitors
> who specialise in it). Some of these are the people whe have decided that
> investing in RDFa is a far more valuable exercis than trying to out-invest
> Google in natural language processing.

"Processing structured data" is something that can be done without
RDFa.  The reason for the resistance to RDFa from this working group
so far is the lack of sufficient significant problems that are best
solved by RDFa specifically.

As well, the use cases for in-the-small data interchange and
in-the-large data interchange are significantly different.  Again,
RDFa is a very distributed data storage format; you don't see the
entire 'database' until you've trawled all the pages which include it.
 This is why there is such a focus on whether RDFa is a decent
solution for search engines - they *see* the web better than anyone
else, and thus appear to be able to utilize such a distributed data
format most effectively than anyone else.  However, Ian is pointing
out that those same search engines (at least Google, though I expect
Yahoo, etc. feel the same) believe that natural-language processing is
a far more effective method of gathering information.  It is less
prone to gaming (natural language being naturally unstructured, it's
harder to emit spam data that has the same statistical
characteristics), and allows for extracting far more data
automatically than any one user would ever think to include.

> This email is already too long for most people to get through it :( I
> believe that this discussion is going to last for some time (I cannot
> imagine why, given the HTML timeline, it would need to be resolved before
> June), so there will be time for others to discuss more fully the many
> points Ian raises as ones he would like to understand.

The HTML timeline is partially a joke (2023 is the date for 'full
compliance'; there isn't a single browser yet who has fully
implemented *html4* ^_^).  We still would like things resolved with
all due speed; the faster they hit the spec, the faster they'll be
integrated into browsers.


Conclusion
==========

There is significant confusion (or at least lack of distinction) in
your email (and generally in the arguments from RDFa supporters in my
experience) between RDFa and RDF, RDF and the general concept of data
interchange formats, distributed and centralized data storage,
in-the-small data interchange and in-the-large data interchange, and
personal use (ie web users) and organization use (ie search engines).
Each of these individually confuse the argument; when brought together
as they typically are, they render many arguments completely useless.

Separating RDFa from RDF
------------------------

The bonuses/maluses of RDF itself are completely irrelevant to this
discussion.  This is because there already exists several methods in
active use for embedding RDF in a web page.  In other words, whatever
problem requires you to embed RDF in a webpage has been *solved*, and
without any necessity of cooperation from the html language itself.
RDFa is specifically a proposal to embed structured data in a web page
using attributes on elements.  *This* is the solution we need to find
problems for if we want RDFa merged into the spec.

Separating RDF from general data interchange formats
----------------------------------------------------

Many of the problems that can be solved by using a common data
interchange format don't require specifically what RDF brings to the
table.  As noted earlier in this email, every collection of data has
its own shape, and its own particular 'ideal' representation.  RDF
forces a particular method of representation.  This has its bonuses
and maluses, but they are *completely separate* from the
bonuses/maluses of generically using a data interchange format.
Libraries don't need RDF to exchange data, they just need *some*
agreement on data representation.  What problems are specifically
solved by RDF and its specific representation being favored in the
spec over a more general method of data representation?

Separating distributed and centralized data storage
----------------------------------------------------

RDFa is a distributed data storage format - a single page includes
only a fraction of the relevant data.  The opposite possibility is
centralized data storage - a single entity holding the data in a
particular place (such as a database on their servers).  The latter is
very common, simple, and natural.  To get at the data, you just run
queries against the single database.  This does require the entity
with the data to produce an API to run queries against, but the same
is required for use of a distributed data format (the company in
charge of the site has to specifically code to expose that data in the
given format).  Both storage methods, though, allow sharing of data
and enable all manner of useful web services.  What problems are
specifically solved by a distributed data strategy which are solved
worse or not at all by a centralized data strategy?

Separating in-the-small and in-the-large data interchange
---------------------------------------------------------

In-the-small data interchange involves a small number of entities who
can trust each other and generally receive a direct benefit from
structuring and sharing their data.  In-the-large data interchange
involves a large number of disparate entities who *can't* trust each
other and won't generally receive direct benefit for structuring their
data.  What problems are shared by these two situations?  Which are
best solved by RDFa?  Are there existing solutions to these problems
that are adequate?

If RDFa is intended to be for one or the other of these situations, it
would be convenient for advocates to agree which it is, so that we can
then focus the discussion on that.  As it is we are getting into
useless arguments where someone is talking about one situation, and
then someone else brings up a "Yes, but..." involving the other
situation.

Separating personal consumption from corporate consumption
----------------------------------------------------------

It has already been noted that existing search engines have found
metadata to be generally unreliable, and instead rely on
natural-language processing to extract information from pages.  Can
RDFa offer better solutions to the problems of search engines than
they currently employ?

Personal use is an entirely different issue.  RDFa is often touted as
making it easy for users to look up information about data on the
page.  It has also been noted, though, that simply highlighting some
text (say, a song title) and selecting "Search Google for the text
'...'" (specific text is from my machine; your experience may vary)
does essentially the same thing, and possibly offers much more.  As
well, new features such as IE8's accelerators offer even more advanced
functionality when you need it, such as allowing you to search
IMBD.com specifically for your highlighted text, using IMDB's own
search form.  Are there significant problems left in this space?  Does
RDFa solve them?  Are they better solved by other solutions?

~TJ

attached mail follows:



Tab Atkins Jr. wrote:
> ...
> Solutions for this already exist; embedded N3 in a <script> tag, just
> to name something that Ian already mentioned, allows you to mash RDF
> data into a page in a machine-extractable way, and brings in any of
> the specific ancillary benefits of RDF.
> ...

Well, it'll require an N3 parser where previously none was needed. Also, 
it separates the metadata from the text, a situation most people want to 
avoid.

This may work, but as far as I can tell, the use of <script> for "data 
blocks" is an afterthought -- for instance, it's described in a section 
about, well, Scripting.

So, is anybody using this successfully in practice?

> ...
> Not quite correct.  Again, the problem of embedded shareable data in a
> web page has been solved multiple times.  The specific problem of
> sharing *RDF* data (due to needing/wanting the specific benefits RDF
> can offer) has also been solved.  What are the precise problems that
> require *RDFa* as a solution?
> ...

Could you elaborate a bit on these solutions?

My understanding was that RDFa has been produced in order to address 
problems with other approaches, such as using <meta> elements, eRDF, or 
microformats.

If there is a *successful* alternative to RDFa that does not require new 
attributes, please let us know :-).

> ...
> Well, there are many things that would offer more advantages than
> disadvantages by themselves.  We can't possibly include all of them in
> the spec; you can think about this as including a hidden large
> disadvantage of 'will grow the size of the spec and the amount of work
> implementors have to do'.  Thus the advantages must generally be
> significantly larger than the disadvantages; this is why the best
> argument for including something in the spec is often "there are
> already widespread hacks to accomplish this".  <video>, for example,
> was included based on pretty much precisely that argument.
> ...

Reminder: RDFa is one of the things the (W3C) Working Group's Charter 
mentions as candidate for inclusion (either by a generic extensibility 
mechanism, or otherwise by extending the language):

"The HTML WG is encouraged to provide a mechanism to permit 
independently developed vocabularies such as Internationalization Tag 
Set (ITS), Ruby, and RDFa to be mixed into HTML documents." 
<http://www.w3.org/2007/03/HTML-WG-charter.html#other>

 > ...

Best regards, Julian

attached mail follows:



On Fri, Jan 2, 2009 at 11:55 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Tab Atkins Jr. wrote:
>>
>> ...
>> Solutions for this already exist; embedded N3 in a <script> tag, just
>> to name something that Ian already mentioned, allows you to mash RDF
>> data into a page in a machine-extractable way, and brings in any of
>> the specific ancillary benefits of RDF.
>> ...
>
> Well, it'll require an N3 parser where previously none was needed.

RDFa requires an RDFa parser as well, and in general *any* metadata
requires a parser, so this point is moot.  The only metadata that
doesn't require a parser is no metadata at all.

> Also, it
> separates the metadata from the text, a situation most people want to avoid.

That sounds like a requirement, but it's one that already presumes
that metadata is useful to embed in webpages.  It has not yet been
established that there is a problem worth solving that metadata would
address at all.  (Clarifying this was the primary purpose of Ian's
mail, and my first mail in this thread.)

> This may work, but as far as I can tell, the use of <script> for "data
> blocks" is an afterthought -- for instance, it's described in a section
> about, well, Scripting.
>
> So, is anybody using this successfully in practice?

I have no idea.  The point is, though, that it *is* an existing
possibility that requires no further effort from this working group or
browser developers.  As such, if it solves the problem (whatever it
is, since that hasn't yet been well-established) sufficiently, we can
leave it alone.  It is in the best interests of everybody if a
solution can be found without any changes to the language, because it
means browser uptake is quick (immediate and retroactive, to be
precise ^_^).

We have to ensure that the problem isn't already solved by the
language first, and only after that can we evaluate whether the
language is the correct place to solve the problem, and only after
*that* can we start discussing how to actually go about solving the
problem in the language.  Too much of this discussion is jumping
straight to step 3, so Ian, I, and others are trying to focus it on
step 1.

>> ...
>> Not quite correct.  Again, the problem of embedded shareable data in a
>> web page has been solved multiple times.  The specific problem of
>> sharing *RDF* data (due to needing/wanting the specific benefits RDF
>> can offer) has also been solved.  What are the precise problems that
>> require *RDFa* as a solution?
>> ...
>
> Could you elaborate a bit on these solutions?

Microformats, embedded data in <script> blocks, embedded XML, custom
attributes, other miscellaneous uses of @class and related attributes,
and simply putting the data in natural language.

These solutions already exist, and in several cases are easier to use
than RDFa.  Do they have specific failings that RDFa addresses?  Are
these failings significant enough to warrant extending the language to
solve them?  To we *understand* the failings (assuming they exist and
are significant) well enough to be confident we can solve them
correctly in the language, rather than waiting for the community to
solve them themselves and then simply reifying their solutions?

> My understanding was that RDFa has been produced in order to address
> problems with other approaches, such as using <meta> elements, eRDF, or
> microformats.
>
> If there is a *successful* alternative to RDFa that does not require new
> attributes, please let us know :-).

The most successful alternative is nothing at all.  ^_^  We can
extract copious data from web pages reliably without metadata, either
using our human senses (in personal use) or natural-language-based
processing (in search engine use).  It has not yet been established
that sufficient and significant enough problems *exist* to justify a
solution, let alone one that requires an addition to html.  That is
what Ian is specifically looking for.

Unfortunately, you really do need to justify metadata anew; you can't
just point at Microformats or something similar and say "we're doing
the same things as those guys!".  They exist currently because they
can fit their solutions into the language as it is; there is no
further need to justify them in this group.  Modifying the language,
though, is an explicit admission that this is a problem worth solving
and worth solving in a particular way, and so requires significant
justification.

>> ...
>> Well, there are many things that would offer more advantages than
>> disadvantages by themselves.  We can't possibly include all of them in
>> the spec; you can think about this as including a hidden large
>> disadvantage of 'will grow the size of the spec and the amount of work
>> implementors have to do'.  Thus the advantages must generally be
>> significantly larger than the disadvantages; this is why the best
>> argument for including something in the spec is often "there are
>> already widespread hacks to accomplish this".  <video>, for example,
>> was included based on pretty much precisely that argument.
>> ...
>
> Reminder: RDFa is one of the things the (W3C) Working Group's Charter
> mentions as candidate for inclusion (either by a generic extensibility
> mechanism, or otherwise by extending the language):
>
> "The HTML WG is encouraged to provide a mechanism to permit independently
> developed vocabularies such as Internationalization Tag Set (ITS), Ruby, and
> RDFa to be mixed into HTML documents."
> <http://www.w3.org/2007/03/HTML-WG-charter.html#other>

As a note, this isn't the W3C's HTML WG.  The WHATWG is independent
from the W3C.

~TJ

attached mail follows:



Tab Atkins Jr. wrote:
> ...
>> Well, it'll require an N3 parser where previously none was needed.
> 
> RDFa requires an RDFa parser as well, and in general *any* metadata
> requires a parser, so this point is moot.  The only metadata that
> doesn't require a parser is no metadata at all.

With RDFa, most of the parsing is done by HTML. So I would call it an 
"RDFa processor". And yes, that doesn't change the fact that code needs 
to be written. But it affects the type of the code that needs to be written.

> ...
> I have no idea.  The point is, though, that it *is* an existing
> possibility that requires no further effort from this working group or
> browser developers.  As such, if it solves the problem (whatever it
> is, since that hasn't yet been well-established) sufficiently, we can
> leave it alone.  It is in the best interests of everybody if a
> solution can be found without any changes to the language, because it
> means browser uptake is quick (immediate and retroactive, to be
> precise ^_^).
> ...

Well, there are lots of conditionals in this statement :-)

> We have to ensure that the problem isn't already solved by the
> language first, and only after that can we evaluate whether the
> language is the correct place to solve the problem, and only after
> *that* can we start discussing how to actually go about solving the
> problem in the language.  Too much of this discussion is jumping
> straight to step 3, so Ian, I, and others are trying to focus it on
> step 1.

I would say this is because the research and design in this area totally 
predates HTML5. Are you seriously suggesting that all of that needs to 
start from scratch?

>>> ...
>>> Not quite correct.  Again, the problem of embedded shareable data in a
>>> web page has been solved multiple times.  The specific problem of
>>> sharing *RDF* data (due to needing/wanting the specific benefits RDF
>>> can offer) has also been solved.  What are the precise problems that
>>> require *RDFa* as a solution?
>>> ...
>> Could you elaborate a bit on these solutions?
> 
> Microformats, embedded data in <script> blocks, embedded XML, custom
> attributes, other miscellaneous uses of @class and related attributes,
> and simply putting the data in natural language.
> ...

- Microformats: how do they solve sharing RDF data?

- embedded data in <script>: see discussion above

- embedded XML: embedded in where?

- custom attributes: wow, that sounds like RDFa

- @class and friends: that sounds like eRDF, which the way it is 
currently specified is broken in HTML5 (@profile)

- natural language: hey great, please elaborate :-)

> ...
>> My understanding was that RDFa has been produced in order to address
>> problems with other approaches, such as using <meta> elements, eRDF, or
>> microformats.
>>
>> If there is a *successful* alternative to RDFa that does not require new
>> attributes, please let us know :-).
> 
> The most successful alternative is nothing at all.  ^_^  We can
> extract copious data from web pages reliably without metadata, either
> using our human senses (in personal use) or natural-language-based
> processing (in search engine use).  It has not yet been established
> that sufficient and significant enough problems *exist* to justify a
> solution, let alone one that requires an addition to html.  That is
> what Ian is specifically looking for.

That's what you and Ian claim. Many disagree.

> Unfortunately, you really do need to justify metadata anew; you can't
> just point at Microformats or something similar and say "we're doing
> the same things as those guys!".  They exist currently because they
> can fit their solutions into the language as it is; there is no
> further need to justify them in this group.  Modifying the language,
> though, is an explicit admission that this is a problem worth solving
> and worth solving in a particular way, and so requires significant
> justification.

Disagreed.

The very existence of Microformats prove that people want to augment 
their content with metadata that is machine-readable. Some of the 
shortcomings of Microformats are caused by the way they are retrofitted 
into HTML. So it's totally natural to discuss whether a better solution 
can be reached by adding new stuff to the language.

>>> ...
>> Reminder: RDFa is one of the things the (W3C) Working Group's Charter
>> mentions as candidate for inclusion (either by a generic extensibility
>> mechanism, or otherwise by extending the language):
>>
>> "The HTML WG is encouraged to provide a mechanism to permit independently
>> developed vocabularies such as Internationalization Tag Set (ITS), Ruby, and
>> RDFa to be mixed into HTML documents."
>> <http://www.w3.org/2007/03/HTML-WG-charter.html#other>
> 
> As a note, this isn't the W3C's HTML WG.  The WHATWG is independent
> from the W3C.
 > ...

Sounds like we need to restart the thread on the HTML WG's mailing list 
then.

Best regards, Julian

attached mail follows:



On 3/1/09 14:02, Julian Reschke wrote:
> Tab Atkins Jr. wrote:
>> ...
>>> Well, it'll require an N3 parser where previously none was needed.
>>
>> RDFa requires an RDFa parser as well, and in general *any* metadata
>> requires a parser, so this point is moot. The only metadata that
>> doesn't require a parser is no metadata at all.
>
> With RDFa, most of the parsing is done by HTML. So I would call it an
> "RDFa processor". And yes, that doesn't change the fact that code needs
> to be written. But it affects the type of the code that needs to be
> written.

Somewhat of an aside, but for the curious - here is an RDFa 
parser/processor app:

http://code.google.com/p/rdfquery/wiki/Introduction
example: http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html
js: http://rdfquery.googlecode.com/svn/trunk/jquery.rdfa.js

[...]

>> The most successful alternative is nothing at all. ^_^ We can
>> extract copious data from web pages reliably without metadata, either
>> using our human senses (in personal use) or natural-language-based
>> processing (in search engine use). It has not yet been established
>> that sufficient and significant enough problems *exist* to justify a
>> solution, let alone one that requires an addition to html. That is
>> what Ian is specifically looking for.
>
> That's what you and Ian claim. Many disagree.

My main problem with the natural language processing option is that it 
feels too close to waiting for Artificial Intelligence. I'd rather add 6 
attributes to HTML and get on with life.

But perhaps a more practical concern is that it unfairly biases things 
towards popular languages - lucky English, lucky Spanish, etc., and 
those that lend themselves more to NLP analysis. The Web is for 
everyone, and people shouldn't be forced to read and write English to 
enjoy the latest advances in Web automation. Since HTML5 is going 
through W3C, such considerations need to be taken pretty seriously.

>> As a note, this isn't the W3C's HTML WG. The WHATWG is independent
>> from the W3C.

But the WHATWG HTML5 *work* is no longer entirely independent of W3C; 
the two organizations embarked on a major joint venture. It seems 
reasonable for members of the WHATWG world to take W3C-oriented 
considerations seriously, regardless of mailing list.

cheers,

Dan

--
http://danbri.org/

attached mail follows:



Also sprach Dan Brickley:

 > My main problem with the natural language processing option is that it 
 > feels too close to waiting for Artificial Intelligence. I'd rather add 6 
 > attributes to HTML and get on with life.

:-)

Personally, I think the 'class' attribute may still be a more
compelling option in a less-is-more way. It already exists and can
easily be used for styling purposes. Styling is bait for authors to
disclose semantics.

Cheers,

-h&kon
              Hkon Wium Lie                          CTO e
howcome@opera.com                  http://people.opera.com/howcome

attached mail follows:



On 3/1/09 16:54, Hkon Wium Lie wrote:
> Also sprach Dan Brickley:
>
>   >  My main problem with the natural language processing option is that it
>   >  feels too close to waiting for Artificial Intelligence. I'd rather add 6
>   >  attributes to HTML and get on with life.
>
> :-)

Another thought re NLP. RDFa (and similar, ...) are formats that can be 
used for writing down the conclusions of NLP analysis. For example here 
see the BBC's recent Muddy Boots experiment, using DBPedia (Wikipedia in 
RDF) data to drive autoclassification / named entity recognition. So 
here we can agree with Ian and others that text analysis has much to 
offer, and still use RDFa (or other semantic markup - i'll sidestep that 
debate for now) as a notation for marking up the words with a 
machine-friendly indicator of their NLP-guessed meaning.

http://www.bbc.co.uk/blogs/journalismlabs/2008/12/muddy_boots.html

> Personally, I think the 'class' attribute may still be a more
> compelling option in a less-is-more way. It already exists and can
> easily be used for styling purposes. Styling is bait for authors to
> disclose semantics.

I'm sure there's mileage to be had there. I'm somehow incapable of 
writing XSLT so GRDDL hasn't really charmed me, but 'class' certainly 
corresponds to a lot of meaningful markup. Naturally enough it is 
stronger at tagging bits of information with a category than at defining 
relationships amongst the things defined when they're scattered around 
the page. But that's no reason to dismiss it entirely.

Did you see the RDF-EASE draft, 
http://buzzword.org.uk/2008/rdf-ease/spec? From which comes: "Ten second 
sales pitch: CSS is an external file that specifies how your document 
should look; RDF-EASE is an external file that specifies what your 
document means."

RDF-EASE uses CSS-based syntax. More discussion here, 
http://lists.w3.org/Archives/Public/semantic-web/2008Dec/0148.html 
including question of whether it ought to be expressed using 
css3-namespace, 
http://lists.w3.org/Archives/Public/semantic-web/2008Dec/0175.html

chers,

Dan

--
http://danbri.org/


attached mail follows:


I've tried to follow all the discussion besides of its lengths and my
conclusion is:
You're asking the wrong question

People against RDFa in HTML5 are asking "why do you need RDFa?", and
supporters of the proposal are actually describing the benefits of RDFa
itself.

The right question is: why do you need RDFa *inside HTML5*?

My personal answer to this question is:
There is no needing for RDFa inside HTML5. There are other markup languages
which support RDFa natively (XHTML for example).
You may say that in this way you help to divide the web in two sides, users
of HTML5 and users of XHTML2.

Actually the web, is already divided in two big groups:
- Web of data
- Web of interaction

Web of data means all the page whose primary objective is to provide some
information, either user-readable or machine-readable to the users, while
web of interaction include web application, whose primary purpose is to
provide additional services to the users.
These two groups have very different requirements (GMail doesn't need RDFa
in application code, while Wikipedia doesn't need a progress element), so
specific markup languages may suit better the web site.
Moreover, this distinction is not a requirement, is just an advice: you can
put metadata inside HTML5 using Microformats and you can put interactivity
inside XHTML2 using XMLEvents.

Summing up: if you author feel the absolute needing for metadata, because
delivering content to the users is your primary goal, then switch from HTML5
to something else, and leave HTML5 to web application, focused on user
interaction.

Giovanni


attached mail follows:



On Sun, 04 Jan 2009 02:54:18 +1100, Håkon Wium Lie <howcome@opera.com>
wrote:

> Also sprach Dan Brickley:
>
>  > My main problem with the natural language processing option is that it
>  > feels too close to waiting for Artificial Intelligence. I'd rather  >  
> add 6 attributes to HTML and get on with life.
...
> Personally, I think the 'class' attribute may still be a more
> compelling option in a less-is-more way. It already exists and can
> easily be used for styling purposes. Styling is bait for authors to
> disclose semantics.

I agree that this is a clear first step - and microformats were developed
by paving a cowpath from authors who had done this on their own initiative.

I think the reason for adding the RDFa attributes is that there are cases
where the semantic richness offered by class is insufficient. The relevant
cases are where people are already dealing in rich formalised semantics,
not those where it is a battle to get people to provide any semantics at
all. I think there is a clear benefit in drawing these people to HTML5
rather than suggesting they go off into some different Web.

I used the pattern of adding semantics through class, a decade or so ago,
and in some cases it met my needs perfectly, but in others was
insufficient to enable re-use of the data directly from pages, and forced
me to adopt external systems for managing my data which in turn implied an
increased cost in management because I had to keep the data model clear
although I did not have a simple formalism to specify it at the time.

cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
       je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

attached mail follows:



Dan Brickley ha scritto:
> On 3/1/09 14:02, Julian Reschke wrote:
>> Tab Atkins Jr. wrote:
>>> The most successful alternative is nothing at all. ^_^ We can
>>> extract copious data from web pages reliably without metadata, either
>>> using our human senses (in personal use) or natural-language-based
>>> processing (in search engine use). It has not yet been established
>>> that sufficient and significant enough problems *exist* to justify a
>>> solution, let alone one that requires an addition to html. That is
>>> what Ian is specifically looking for.
>>
>> That's what you and Ian claim. Many disagree.
>
> My main problem with the natural language processing option is that it 
> feels too close to waiting for Artificial Intelligence. I'd rather add 
> 6 attributes to HTML and get on with life.
>
> But perhaps a more practical concern is that it unfairly biases things 
> towards popular languages - lucky English, lucky Spanish, etc., and 
> those that lend themselves more to NLP analysis. *The Web is for 
> everyone*, and people shouldn't be forced to read and write English to 
> enjoy the latest advances in *Web automation*. Since HTML5 is going 
> through W3C, such considerations need to be taken pretty seriously.
>

My concern is: is RDFa really suitable for everyone and for Web 
automation? My own answer, at first glance, is no. That's because RDF(a) 
can perhaps address nicely very niche needs, where determining how much 
data can be trusted is not a problem, but in general misuses AND 
deliberate abuses may harm automation heavily, since an automaton is 
unlikely to be able to understand whether metadata express the real 
meaning of a web page or not (without a certain degree of AI).

If an external mechanism is needed to determine trust level for 
metadata, that is to establish when an automation results are good or 
bad, such a mechanism may involve human beings at some stage, thus 
breaking automation (this is somehow similar to the problem of defining 
an "oracle machine" described by Turing, according to whom such a 
machine isn't an automaton).

On another hand, a very custom model thought for very custom needs (and 
not requiring wide support) may be less prone to abuses, since it's 
unlikely to find someone willing to cheat himself. Thus, having third 
parties agreeing a certain model and related APIs, and implementing APIs 
on their own sides, might be more reliable in some cases (anyway, third 
parties should agree their respective metadata are reliable and find a 
way to evaluate they really are).

Dan Brickley ha scritto:
> On 3/1/09 16:54, Hkon Wium Lie wrote:
>> Also sprach Dan Brickley:
>>
>>   >  My main problem with the natural language processing option is 
>> that it
>>   >  feels too close to waiting for Artificial Intelligence. I'd 
>> rather add 6
>>   >  attributes to HTML and get on with life.
>>
>> :-)
>
> Another thought re NLP. RDFa (and similar, ...) are formats that can 
> be used for writing down the conclusions of NLP analysis. For example 
> here see the BBC's recent Muddy Boots experiment, using DBPedia 
> (Wikipedia in RDF) data to drive autoclassification / named entity 
> recognition. So here we can agree with Ian and others that text 
> analysis has much to offer, and still use RDFa (or other semantic 
> markup - i'll sidestep that debate for now) as a notation for marking 
> up the words with a machine-friendly indicator of their NLP-guessed 
> meaning.
>
> http://www.bbc.co.uk/blogs/journalismlabs/2008/12/muddy_boots.html
>
>> Personally, I think the 'class' attribute may still be a more
>> compelling option in a less-is-more way. It already exists and can
>> easily be used for styling purposes. Styling is bait for authors to
>> disclose semantics.
>
> I'm sure there's mileage to be had there. I'm somehow incapable of 
> writing XSLT so GRDDL hasn't really charmed me, but 'class' certainly 
> corresponds to a lot of meaningful markup. Naturally enough it is 
> stronger at tagging bits of information with a category than at 
> defining relationships amongst the things defined when they're 
> scattered around the page. But that's no reason to dismiss it entirely.
>
> Did you see the RDF-EASE draft, 
> http://buzzword.org.uk/2008/rdf-ease/spec? From which comes: "Ten 
> second sales pitch: CSS is an external file that specifies how your 
> document should look; *RDF-EASE is an external file that specifies 
> what your document means.*"
>
> RDF-EASE uses CSS-based syntax. More discussion here, 
> http://lists.w3.org/Archives/Public/semantic-web/2008Dec/0148.html 
> including question of whether it ought to be expressed using 
> css3-namespace, 
> http://lists.w3.org/Archives/Public/semantic-web/2008Dec/0175.html
>
> chers,
>
> Dan
>
> -- 
> http://danbri.org/
>

My question is: how often can I trust such a file specifies what your 
document really means, without evaluating its content?

I'd distinguish two cases (not pretendig to make a complete classification),

- The semantics described by metadata is used for server-side 
computations: there's no need to evaluate content (since I'm trusting to 
you when navigating your site, and it's unlikely to find you purposedly 
messing with yourself), as well as to have client-side support for such 
metadata (by the UA). This is the case of a centralised database.

For instance, a *pedia page may send queries to the server, which 
elaborates them and sends results back the the user.

- The UA must understand metadata and automatically gather informations 
meshed-up in a page from several sources: each source must be actively 
evaluated and trusted (a bot can't do such). This is the case of a 
decentralized database.

For instance, that's easy to think of a spamming advertiser who 
apparently puts honest content into your pages (which maybe take 
reliable content from dbpedia), whereas he uses fake metadata to cheat 
my browser and send me irrelevant informations (or infos I'm not 
interested in) when I ask for related content [1], perhaps without you 
even guessing what's going on (and you may be loosing visitors because 
of that).

For obvious reasons, a trust evaluation mechanism can't be as easy as 
getting/creating a signature to be used in a secure connection, because 
someone must actively evaluate at least two things:
- the metadata really reflects a resource content, and
- the metadata is properly used with respect to an external schema 
involved to model data (otherwise, no relationship would be reliable -- 
however, such might be a minor concern from a certain angle, since 
misused metadata might be less harmful than deliberately abused ones).

The result can be very expensive (as certifying a driver or an 
application for a certain platform), or lead to a free choice to avoid 
any evaluation and instead to trust to any third parties. Both solutions 
may work, perhaps, for niche/limited cases, but I don't think such may 
be a good base for a "global" - and general purpose - automation.

[1] That's not the same as using the @rel attribute without any 
relationship with other metadata: a UA may just provide a link somehow 
described as pointing to a related resource with respect to the 
surrounding content, so that I can choose to follow such a link or not; 
if the @rel attribute is used by an automated mechanism in response to a 
query and with respect to other metadata, the UA must decide on its own 
if a link is worth to be followed or not, and I don't think there is any 
easy way to take automated decisions involving trust.

Best regards,
Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Incrementa la visibilita' della tua azienda con l'invio di newsletter e campagne email marketing.
* Con investimento di soli 250 Euro puoi incrementare la tua visibilita'
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8350&d=3-1

attached mail follows:



On Jan 3, 2009, at 17:05, Dan Brickley wrote:

> But perhaps a more practical concern is that it unfairly biases  
> things towards popular languages - lucky English, lucky Spanish,  
> etc., and those that lend themselves more to NLP analysis. The Web  
> is for everyone, and people shouldn't be forced to read and write  
> English to enjoy the latest advances in Web automation.

Some languages are higher in the pecking order than others when  
software development is prioritized, and RDFa cannot level the playing  
field here.

Suppose there's a use case that can be satisfactorily addressed by  
applying NLP heuristics to content for the top-tier languages. Even if  
there were an RDF mechanism for addressing the same use case without  
relying on natural language, software aimed for serving the top-tier  
languages would still do the NLP thing for the use case. Thus, the  
development of the parallel RDF-based solution would be borne by the  
communities using the other languages. If the other languages can't  
get the users of the top-tier languages to use the same technical  
solution, they are still at a disadvantage even if an alternative  
technology stack is theoretically possible, because most software  
development effort goes into what makes sense for the top-tier  
languages without the results being applicable also for the other  
languages.

Instead of bearing the cost of developing a totally alternative  
technology stack for the other languages without benefiting from any  
spillover from the effort done for the top-tier languages, it makes  
more sense to invest the effort into building upon the reusable parts  
already developed for the top-tier languages.

(Quick case study about language-sensitive technology adoption and  
markets: When movable type was developed, a *subset* of the alphabet  
used for German--the native language of printing press suppliers--was  
adopted for Finnish. Today, hundreds of years later, digital font  
availability for Finnish is better than font availability for  
languages of comparable installed base that adopted *extensions* for  
the alphabet used for German or that used a totally different script.  
That is, NIH *still* hasn't caught up with the first-mover advantage  
as far as type goes.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



On Mon, 05 Jan 2009 00:17:39 +1100, Henri Sivonen <hsivonen@iki.fi> wrote:

> On Jan 3, 2009, at 17:05, Dan Brickley wrote:
>
>> But perhaps a more practical concern is that it unfairly biases things  
>> towards popular languages - lucky English, lucky Spanish, etc., and  
>> those that lend themselves more to NLP analysis. The Web is for  
>> everyone, and people shouldn't be forced to read and write English to  
>> enjoy the latest advances in Web automation.
>
> Some languages are higher in the pecking order than others when software  
> development is prioritized, and RDFa cannot level the playing field here.
>
> Suppose there's a use case that can be satisfactorily addressed by  
> applying NLP heuristics to content for the top-tier languages. Even if  
> there were an RDF mechanism for addressing the same use case without  
> relying on natural language, software aimed for serving the top-tier  
> languages would still do the NLP thing for the use case.

No. There is no reason for most developers to prefer one over the other  
under the circumstances described.

Clearly Google has an investment in text-harvesting in a bunch of  
languages. Equally clearly its competitors who are more sucessfeul in  
various languages (Yandex, Baidu, etc) have an investment in the  
technology they use.

But developing a new indexing process, there is no a priori reason to  
favour NLP over some other technique that is also satisfactory, and if you  
happen to be interested in a global market, it makes sense to develop a  
system that can be more easily adapted, other things being equal.
...
> Instead of bearing the cost of developing a totally alternative  
> technology stack for the other languages without benefiting from any  
> spillover from the effort done for the top-tier languages, it makes more  
> sense to invest the effort into building upon the reusable parts already  
> developed for the top-tier languages.

Except that it turns out that the re-usable parts of most search engines,  
for the general developer, are pretty limited. Whereas the re-usable parts  
of the RDF stack are numerous, available for many different platforms,  
 from GPL open source to bespoke commercial closed-source and everything  
between.

All this does not necessarily establish the case for using RDF in HTML, it  
is just meant to demonstrate that this particular case *against* doesn't  
seem to be established, to me.

cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

attached mail follows:



Charles McCathieNevile ha scritto:
>>> The results of the first set of Microformats efforts were some pretty
>>> cool applications, like the following one demonstrating how a web
>>> browser could forward event information from your PC web browser to 
>>> your
>>> phone via Bluetooth:
>>>
>>> http://www.youtube.com/watch?v=azoNnLoJi-4
>>
>> It's a technically very interesting application. What has the adoption
>> rate been like? How does it compare to other solutions to the problem,
>> like CalDav, iCal, or Microsoft Exchange? Do people publish calendar
>> events much? There are a lot of Web-based calendar systems, like 
>> MobileMe
>> or WebCalendar. Do people expose data on their Web page that can be used
>> to import calendar data to these systems?
>
> In some cases this data is indeed exposed to Webpages. However, 
> anecdotal evidence (which unfortunately is all that is available when 
> trying to study the enormous collections of data in private intranets) 
> suggests that this is significantly more valuable when it can be done 
> within a restricted access website.
>
> ...
>>> In short, RDFa addresses the problem of a lack of a standardized
>>> semantics expression mechanism in HTML family languages.
>>
>> A standardized semantics expression mechanism is a solution. The lack 
>> of a solution isn't a problem description. What's the problem that a
>> standardized semantics expression mechanism solves?
>
> There are many many small problems involving encoding arbitrary data 
> in pages - apparently at least enough to convince you that the data-* 
> attributes are worth incorporating.
>
> There are many cases where being able to extract that data with a 
> simple toolkit from someone else's content, or using someone else's 
> toolkit without having to tell them about your data model, solves a 
> local problem. The data-* attributes, because they do not represent a 
> formal model that can be manipulated, are insufficient to enable 
> sharing of tools which can extract arbitrary modelled data.
>

That's because the data-* attributes are meant to create custom models 
for custom use cases not (necessarily) involving interchange and (let me 
say) "agnostic extraction" of data. However, data-* attributes might be 
used to "emulate" support for RDFa attributes, so that each one might be 
mapped to, let's say, a "data-rdfa-<attribute>" one and viceversa (I 
don't think "data-rdfa-about" vs "about" would make a great difference, 
at least in a test phase, since it wouldn't be much different from 
"rdfa:about", which might be used to embed RDFa attributes in a somewhat 
xml language (e.g. an "external" markup embedded in a xhtml document 
through the extension mechanism)).

Since it seems there are several problems which may be addressed (beside 
other, more custom models) by RDFa for organization-wide internal use 
and intranet publication, without the explicit requirement of external 
interchange, when both HTML5 specific features and RDFa attributes are 
felt as necessary, it shouldn't be too difficoult to create a custom 
parser, comforming to RDFa spec and availing of data-* attributes, to be 
plugged in a certain browser supporting html5 (and data-*) for internal 
test first, then exposed to the community, so that html5+rdfa can be 
tested on a wider scale (especially once alike parsers are provided for 
all main browsers), looking for a widespread adoption to point out an 
effective need to merge RDFa into HTML5 spec (or to standardize an 
approach based on data-* attributes).

That is, since RDFa can be "emulated" somehow in HTML5 and tested 
without changing current specification, perhaps there isn't a strong 
need for an early adoption of the former, and instead an "emulated" 
mergence might be tested first within current timeline.

>> What is the cost of having different data use specialised formats?
>
> If the data model, or a part of it, is not explicit as in RDF but is 
> implicit in code made to treat it (as is the case with using scripts 
> to process things stored in arbitrarily named data-* attributes, and 
> is also the case in using undocumented or semi-documented XML formats, 
> it requires people to understand the code as well as the data model in 
> order to use the data. In a corporate situation where hundreds or tens 
> of thousands of people are required to work with the same data, this 
> makes the data model very fragile.
>

I'm not sure RDF(a) solves such a problem. AIUI, RDFa just binds (xml) 
properties and attributes (in the form of curies) to RDF concepts, 
modelling a certain kind of relationships, whereas it relies on external 
schemata to define such properties. Any undocumented or semi-documented 
XML formats may lead to misuses and, thus, to unreliably modelled data, 
and it is not clear to me how just creating an explicit relationship 
between properties is enough to ensure that a property really represents 
a subject and not a predicate or an object (in its wrongly documented 
schema), if the problem is the correct definition of the properties 
themselves. Perhaps it is enough to parse them, and perhaps it can 
"inspire" a better definition of the external schemata (if the RDFa 
"vision" of data as triples is suitable for the effective data to 
model), but if the problem is the right understanding of "what 
represents what" because of a lack in documentations, I think that's 
something RDF/RDFa can't solve.

I think the same applies to data-* attributes, because _they_ describe 
data (and data semantics) in a custom model and thus _they_ need to be 
documented for others to be able to manipulate them; the use of a custom 
script rather than a built-in parser does not change much from this 
point of view.


> [not clear what the context was here, so citing as it was]
>>> > I don't think more metadata is going to improve search engines. In
>>> > practice, metadata is so highly gamed that it cannot be relied upon.
>>> > In fact, search engines probably already "understand" pages with far
>>> > more accuracy than most authors will ever be able to express.
>>>
>>> You are correct, more erroneous metadata is not going to improve search
>>> engines. More /accurate/ metadata, however, IS going to improve search
>>> engines. Nobody is going to argue that the system could not be gamed. I
>>> can guarantee that it will be gamed.
>>>
>>> However, that's the reality that we have to live with when introducing
>>> any new web-based technology. It will be mis-used, abused and 
>>> corrupted.
>>> The question is, will it do more good than harm? In the case of RDFa
>>> /and/ Microformats, we do think it will do more good than harm.
>>
>> For search engines, I am not convinced. Google's experience is that
>> natural language processing of the actual information seen by the actual
>> end user is far, far more reliable than any source of metadata. Thus 
>> from
>> Google's perspective, investing in RDFa seems like a poorer investment
>> than investing in natural language processing.
>
> Indeed. But Google is something of an edge case, since they can afford 
> to run a huge organisation with massive computer power and many 
> engineers to address a problem where a "near-enough" solution brings 
> themn the users who are in turn the product they sell to advertisers. 
> There are many other use cases where a small group of people want a 
> way to reliably search trusted data.
>

I think the point with general purpose search engines is another one: 
natural language processing, whereas being expensive, grants a far more 
accurate solution than RDFa and/or any other kind of metadata can bring 
to a problem requiring data must never need to be trusted (and, instead, 
a data processor must be able to determine data's level of trust without 
any external aid). Since there is no "direct" relationship between the 
semantics expressed by RDFa and the real semantics of a web page 
content, relying on RDFa metadata would lead to widespread cheats, as it 
was when the keywords meta tag was introduced. Thus, a trust 
chain/evaluation mechanism (such as the use of signatures) would be 
needed, and so a general purpose search engine relying on RDFa would 
seem to be working more as a search directory, where human beings 
analyse content to classify pages, resulting in a more accurate result, 
but also in a smaller and very slowly growing database of classified 
sites (since obviously there will always be far more sites not caring of 
metadata and/or of making their metadata trusted, than sites using 
trusted RDFa metadata).

(the same reasoning may apply to a local search made by a browser in its 
local history: results are reliable as far as the expressed semantics is 
reliable, that is as far as its source is reasonably trusted, which may 
not be true in general - in general, misuses and deliberate abuses 
whould be the most common case without a trust evaluation mechanism, 
which, in turn, would restrict the number of pages where the presence of 
rdf(a) metadata is really helpful).

My concern is that any data model requiring any level of trust to 
achieve a good-working interoperability may address very small (and 
niche) use cases, and even if a lot of such niche use cases might be 
grouped in a whole category consistently addressed by RDFa (perhaps 
beside other models), the result might not be an enough significant use 
case fitting actual specification guidelines (which are somehow hostile 
to (xml) extensibility, as far as I've understood them) -- though they 
might be changed when and if really needed.

Best regards,
Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Con Meetic trovi milioni di single, iscriviti adesso e inizia subito a fare nuove amicizie
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8290&d=3-1

attached mail follows:



On Sun, 04 Jan 2009 03:51:53 +1100, Calogero Alex Baldacchino  
<alex.baldacchino@email.it> wrote:

> Charles McCathieNevile ha scritto:
> ... it shouldn't be too difficoult to create a custom parser, comforming  
> to RDFa spec and availing of data-* attributes...
>
> That is, since RDFa can be "emulated" somehow in HTML5 and tested  
> without changing current specification, perhaps there isn't a strong  
> need for an early adoption of the former, and instead an "emulated"  
> mergence might be tested first within current timeline.

In principle this is possible. But the data-* attributes are designed for  
private usage, and introducing a public usage means creating a risk of  
clashes that pollute RDFa data gathered this way. In other words, this is  
indeed feasible, but one would expect it to show that the data generated  
was unreliable (unless privately nobody is interested in basic terms like  
about). Such results have been used to suggest that poorly implemented  
features should be dropped, but this hypothetical case suggests to me that  
the argument is wrong, and that if in the face of reasons why the data  
would be bad people use them, one might expect better usage by formalising  
the status of such features and getting decent implementations.

>>> What is the cost of having different data use specialised formats?
>>
>> If the data model, or a part of it, is not explicit as in RDF but is  
>> implicit in code made to treat it (as is the case with using scripts to  
>> process things stored in arbitrarily named data-* attributes, and is  
>> also the case in using undocumented or semi-documented XML formats, it  
>> requires people to understand the code as well as the data model in  
>> order to use the data. In a corporate situation where hundreds or tens  
>> of thousands of people are required to work with the same data, this  
>> makes the data model very fragile.
>>
>
> I'm not sure RDF(a) solves such a problem. AIUI, RDFa just binds (xml)  
> properties and attributes (in the form of curies) to RDF concepts,  
> modelling a certain kind of relationships, whereas it relies on external  
> schemata to define such properties. Any undocumented or semi-documented  
> XML formats may lead to misuses and, thus, to unreliably modelled data,
...

> I think the same applies to data-* attributes, because _they_ describe  
> data (and data semantics) in a custom model and thus _they_ need to be  
> documented for others to be able to manipulate them; the use of a custom  
> script rather than a built-in parser does not change much from this  
> point of view.

RDFa binds data to RDF. RDF provides a well-known schema language with  
machine-processable definition of vocabularies, and how to merge  
information between them. In other words, if you get the underlying model  
for your data right enough, people will be able to use it without needing  
to know what you do.

Naturally not everyone will get their data model right, and naturally not  
all information will be reliable anyway. However, it would seem to me that  
making it harder to merge the data in the first place does not assist in  
determining whether it is useful. On the other hand, certain forms of RDF  
data such as POWDER, FOAF, Dublin Core and the like have been very  
carefully modelled, and are relatively well-known and re-used in other  
data models. Making it easy to parse this data and merge it, according to  
the existing well-developed models seems valuable.


>> Ian wrote:
>>> For search engines, I am not convinced. Google's experience is that
>>> natural language processing of the actual information seen by the  
>>> actual end user is far, far more reliable than any source of metadata.
>>> Thus from Google's perspective, investing in RDFa seems like a poorer
>>> investment than investing in natural language processing.
>>
>> Indeed. But Google is something of an edge case, since they can afford  
>> to run a huge organisation with massive computer power and many  
>> engineers to address a problem where a "near-enough" solution brings  
>> themn the users who are in turn the product they sell to advertisers.  
>> There are many other use cases where a small group of people want a way  
>> to reliably search trusted data.
>>
>
> I think the point with general purpose search engines is another one:  
> natural language processing, whereas being expensive, grants a far more  
> accurate solution than RDFa and/or any other kind of metadata can bring  
> to a problem requiring data must never need to be trusted (and, instead,  
> a data processor must be able to determine data's level of trust without  
> any external aid).

No, I don't think so. Google searches based on analysis of the open web  
are *not* generally more reliable than faceted searches over a reliable  
dataset, and in some instances are less reliable.

The point is that only a few people can afford to invest in being a  
general-purpose search engine, whereas many can afford to run a  
metadata-based search system over a chosen dataset, that responds to their  
needs (and doesn't require either publishing their data, or paying Google  
to index it).

> Since there is no "direct" relationship between the semantics expressed  
> by RDFa and the real semantics of a web page content, relying on RDFa  
> metadata would lead to widespread cheats, as it was when the keywords  
> meta tag was introduced.

Sure. There would also be many many cases of organisations using decent  
metadata, as with existing approaches. My point was that I don't expect  
Google to naively trust metadata it finds on the open web, and in the  
general case probably not even to look at it. However, Google is not the  
measure of the Web, it is a company that sells advertising based on  
information it has gleaned about users by offering them services.

So the fact that some things on the Web are not directly beneficial to  
Google isn't that important. I do not see how the presence of explicit  
metadata threatens google any more than the presence of plain text (which  
can also be misleading).

> Thus, a trust chain/evaluation mechanism (such as the use of signatures)  
> would be needed,

Indeed such a thing is needed for a general purpose search engine. But  
there are many cases where an alternative is fine. For example, T-mobile  
publish POWDER data about web pages. Opera doesn't need to believe all the  
POWDER data it finds on the Web in order to improve its offerings based on  
T-mobile's data, if we can decide how to read that specific data. Which  
can be done by deciding that we trust a particular set of URIs more than  
others. No signature necessary, beyond the already ubiquitous TLS and the  
idea that we trust people we have a relationship with and whose domains we  
know.

> My concern is that any data model requiring any level of trust to  
> achieve a good-working interoperability may address very small (and  
> niche) use cases, and even if a lot of such niche use cases might be  
> grouped in a whole category consistently addressed by RDFa (perhaps  
> beside other models), the result might not be an enough significant use  
> case fitting actual specification guidelines (which are somehow hostile  
> to (xml) extensibility, as far as I've understood them) -- though they  
> might be changed when and if really needed.

A concern of mine is that it is unclear what the required level of  
usefulness is. The "google highlight" element (once called m but I think  
it changed its name again) is currently in the spec, the longdesc  
attribute currently isn't.  I presume these facts boil down to judgement  
calls by the editor while the spec is still an early draft, but it is not  
easy to understand what information would determine whether something is  
"sufficiently important". Which makes it hard to determine whether it is  
worth the considerable investment of discussing in this group, or easier  
to just go through the W3C process of objecting later on.

cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

attached mail follows:



Calogero Alex Baldacchino wrote:
> ...
> This is why I was thinking about somewhat "data-rdfa-about", 
> "data-rdfa-property", "data-rdfa-content" and so on, so that, for the 
> purposes of an RDFa processor working on top of HTML5 UAs (perhaps in a 
> test phase, if needed at all, of course), an element dataset would give 
> access to "rdfa-about", instead of just "about", that is using the 
> prefix "rdfa-" as acting as a namespace prefix in xml (hence, as if 
> there were "rdfa:about" instead of "data-rdfa-about" in the markup).
> ...

That clashed with the documented purpose of data-*.

*If* we want to support RDFa, why not add the attributes the way they 
are already named???

> ...
> However, AIUI, actual xml serialization (xhtml5) allows the use of 
> namespaces and prefixed attributes, thus couldn't a proper namespace be 
> introduced for RDFa attributes, so they can be used, if needed, in 
> xhtml5 documents? I think such might be a valuable choice, because it 
> seems to me RDFa attributes can be used to address such cases where 
> metadata must stay as close as possible to correspondent data, but a 
> mistake in a piece of markup may trigger the adoption agency or foster 
> parenting algorithms, eventually causing a separation between metadata 
> and content, thus possibly breaking reliability of gathered 
> informations. From this perspective, a parser stopping on the very first 
> error might give a quicker feedback than one rearranging misnested 
> elements as far as it is reasonably possible (not affecting, and instead 
> improving, content presentation and users' "direct" experience, but 
> possibly causing side-effects with metadata).
> ...

That would make RDFa as used in XHTML 1.* and RDFa used in HTML 5 
incompatible. What for?

 > ...

BR, Julian

attached mail follows:



On Fri, Jan 9, 2009 at 5:46 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Calogero Alex Baldacchino wrote:
>>
>> ...
>> This is why I was thinking about somewhat "data-rdfa-about",
>> "data-rdfa-property", "data-rdfa-content" and so on, so that, for the
>> purposes of an RDFa processor working on top of HTML5 UAs (perhaps in a test
>> phase, if needed at all, of course), an element dataset would give access to
>> "rdfa-about", instead of just "about", that is using the prefix "rdfa-" as
>> acting as a namespace prefix in xml (hence, as if there were "rdfa:about"
>> instead of "data-rdfa-about" in the markup).
>> ...
>
> That clashed with the documented purpose of data-*.
>
> *If* we want to support RDFa, why not add the attributes the way they are
> already named???

Because the issue is that we don't yet know if we want to support
RDFa.  That's the whole point of this thread.  Nobody's given a useful
problem statement yet, so we can't evaluate whether there's a problem
we need to solve, or how we should solve it.

Alex's suggestion, while officially against spec, has the benefit of
allowing RDFa supporters to sort out their use cases through
experience.  That's the back door into the spec, after all; you don't
have to do as much work to formulate a problem statement if you can
point to large amounts of people hacking around a current lack, as
that's a pretty strong indicator that there *is* a problem needing to
be solved.  As an added benefit, the fact that there's already
multiple independent attempts at a solution gives us a wide pool of
experience to draw from in formulating the actual spec, so as to make
the use as easy as possible for authors.

(An example that comes to mind in this regard is rounded corners.
Usually you have to break semantics and put in junk elements to get
rounded corners on a flexible box.  This became so common that the
question of whether or not rounded corners were significant enough to
be added in CSS answered itself - people are trying hard to hack the
support in, so it's clearly something they want, and thus it's
worthwhile to spec a method (the border-radius property) to give them
it.  It solves a problem that authors, through their actions, made
extremely clear, and it does so in a way that is enormously simpler
99% of the time.  Win-win.)

~Tj

attached mail follows:



Tab Atkins Jr. wrote:
>> *If* we want to support RDFa, why not add the attributes the way they are
>> already named???
> 
> Because the issue is that we don't yet know if we want to support
> RDFa.  That's the whole point of this thread.  Nobody's given a useful
> problem statement yet, so we can't evaluate whether there's a problem
> we need to solve, or how we should solve it.

For the record: I disagree with that. I have the impression that no 
matter how many problems are presented, the answer is going to be: "not 
that stone -- fetch me another stone".

> Alex's suggestion, while officially against spec, has the benefit of
> allowing RDFa supporters to sort out their use cases through
> experience.  That's the back door into the spec, after all; you don't

If something that is against the spec is acceptable, then it's *much* 
easier to just use the already defined attributes. Better breaking the 
spec by using new attributes then abusing existing ones.

 > ...

BR, Julian

attached mail follows:



Julian Reschke wrote:
>> Because the issue is that we don't yet know if we want to support
>> RDFa.  That's the whole point of this thread.  Nobody's given a useful
>> problem statement yet, so we can't evaluate whether there's a problem
>> we need to solve, or how we should solve it.
> 
> For the record: I disagree with that. I have the impression that no
> matter how many problems are presented, the answer is going to be: "not
> that stone -- fetch me another stone".

For the record: I completely agree with Julian. This is why I haven't
jumped into this thread yet again.

The key piece of evidence here is SearchMonkey, a product by Yahoo that
specifically uses RDFa. Even its microformat support funnels everything
to an RDF-like metadata approach. With thousands of application
developers and some concrete examples that specifically use RDFa (the
Creative Commons application being one of them), the message from many
on this list remains "not good enough."

I'm not sure where the bar is, but it seems far from objective.

-Ben

attached mail follows:



On Fri, Jan 9, 2009 at 1:48 PM, Ben Adida <ben@adida.net> wrote:
> Julian Reschke wrote:
>>> Because the issue is that we don't yet know if we want to support
>>> RDFa.  That's the whole point of this thread.  Nobody's given a useful
>>> problem statement yet, so we can't evaluate whether there's a problem
>>> we need to solve, or how we should solve it.
>>
>> For the record: I disagree with that. I have the impression that no
>> matter how many problems are presented, the answer is going to be: "not
>> that stone -- fetch me another stone".
>
> For the record: I completely agree with Julian. This is why I haven't
> jumped into this thread yet again.
>
> The key piece of evidence here is SearchMonkey, a product by Yahoo that
> specifically uses RDFa. Even its microformat support funnels everything
> to an RDF-like metadata approach. With thousands of application
> developers and some concrete examples that specifically use RDFa (the
> Creative Commons application being one of them), the message from many
> on this list remains "not good enough."
>
> I'm not sure where the bar is, but it seems far from objective.

Actually, SearchMonkey is an excellent use case, and provides a
problem statement.

Problem
=======

Site owners want a way to provide enhanced search results to the
engines, so that an entry in the search results page is more than just
a bare link and snippet of text, and provides additional resources for
users straight on the search page without them having to click into
the page and discover those resources themselves.

For example (taken directly from the SearchMonkey docs), yelp.com may
want to provide additional information on restaurants they have
reviews for, pushing info on price, rating, and phone number directly
into the search results, along with links straight to their reviews or
photos of the restaurant.

Different sites will have vastly different needs and requirements in
this regard, preventing natural discovery by crawlers from being
effective.

(SearchMonkey itself relies on the user registering an add-in on their
Yahoo account, so spammers can't exploit this - the user has to
proactively decide they want additional information from a site to
show up in their results, then they click a link and the rest is
automagical.)


That really wasn't hard.  I'd never seen SearchMonkey before (it's
possible it was mentioned, but I know that it was never explicitly
described), but it's a really sweet app that helps both authors and
users.  That's a check mark in my book.

~TJ

attached mail follows:



Tab Atkins Jr. wrote:
> Actually, SearchMonkey is an excellent use case, and provides a
> problem statement.

I'm surprised, but very happily so, that you agree.

My confusion stems from the fact that Ian clearly mentioned SearchMonkey
in his email a few days ago, then proceeded to say it wasn't a good use
case.

-Ben


attached mail follows:



On Fri, Jan 9, 2009 at 2:17 PM, Ben Adida <ben@adida.net> wrote:
> Tab Atkins Jr. wrote:
>> Actually, SearchMonkey is an excellent use case, and provides a
>> problem statement.
>
> I'm surprised, but very happily so, that you agree.
>
> My confusion stems from the fact that Ian clearly mentioned SearchMonkey
> in his email a few days ago, then proceeded to say it wasn't a good use
> case.

I apologize; looking back into my archives, it appears there was an
entire subthread specifically about SearchMonkey!  Also, Ian did
indeed mention it in his first email in this thread.  He actually gave
it more attention than any other single use-case, though.  I'll quote
the relevant part:

> On Tue, 26 Aug 2008, Ben Adida wrote:
> >
> > Here's one example. This is not the only way that RDFa can be helpful,
> > but it should help make things more concrete:
> >
> >   http://developer.yahoo.com/searchmonkey/
> >
> > Using semantic markup in HTML (microformats and, soon, RDFa), you, as a
> > publisher, can choose to surface more relevant information straight into
> > Yahoo search results.
>
> This doesn't seem to require RDFa or any generic data syntax at all. Since
> the system is site-specific anyway (you have to list the URLs you wish to
> act against), the same kind of mechanism could be done by just extracting
> the data straight out of the page. This would have the advantage of
> working with any Web page without requiring the page to be written using a
> particular syntax.
>
> However, if SearchMonkey is an example of a use case, then we should
> determine the requirements for this feature. It seems, based on reading
> the documentation, that it basically boils down to:
>
>  * Pages should be able to expose nested lists of name-value pairs on a
>   page-by-page basis.
>
>  * It should be possible to define globally-unique names, but the syntax
>   should be optimised for a set of predefined vocabularies.
>
>  * Adding this data to a page should be easy.
>
>  * The syntax for adding this data should encourage the data to remain
>   accurate when the page is changed.
>
>  * The syntax should be resilient to intentional copy-and-paste authoring:
>   people copying data into the page from a page that already has data
>   should not have to know about any declarations far from the data.
>
>  * The syntax should be resilient to unintentional copy-and-paste
>   authoring: people copying markup from the page who do not know about
>   these features should not inadvertently mark up their page with
>   inapplicable data.
>
> Are there any other requirements that we can derive from SearchMonkey?

I agree with Ian in that SearchMonkey is not *necessarily* speaking in
favor of RDFa; that may be what caused you to think he was dismissing
it.  In truth, Ian is merely trying to take current examples of RDFa
use and distill them into their essence.  (To grab my previous
example, it is similar to seeing what all the various rounded-corners
hacks were doing, without necessarily implying that the final solution
will be anything like them.  It's important to distill the actual
problems that users are solving from the details of particular
solutions they are using.)

Like I said, I think SearchMonkey sounds absolutely awesome, and
genuinely useful on a level I haven't yet seen any apps of similar
nature reach.  I'm exclusively a Google user, but that's something I'd
love to have ported over.  It's similar in nature to IE8's
Accelerators, in that it's an opt-in application for users that
reduces clicks to get to information they actively decide they want.

However, Ian has a point in his first paragraph.  SearchMonkey does
*not* do auto-discovery; it relies entirely on site owners telling it
precisely what data to extract, where it's allowed to extract it from,
and how to present it.  It is likely that this can be done entirely
within the confines of current html, and the fact that SearchMonkey
can use Microformats suggests that this is true.  A possible approach
is a site-owner producing an ad-hoc microformat (little m) that the
crawler can match against pages and index the information of, and then
offer to the SearchMonkey application for presentation as the
developer wills.  This would require specified parsing rules for such
things (which, as mentioned in an earlier email, the big-m
Microformats community is working on).

The question is, would this be sufficient?  Are other approaches
easier for authors?  RDFa, as noted, already has a specified parsing
model.  Does this make it easier for authors to design data templates?
 Easier to communicate templates to a crawler?  Easier to deploy in a
site?  Easier to parse for a crawler?

SearchMonkey makes mention of developers producing SearchMonkey apps
without the explicit permission of site owners.  This use would almost
certainly be better served with a looser data discovery model than
RDFa, so that a site owner doesn't have to explicitly comply in order
for others to extract useful data from their pages.  How important is
this?


These are precisely the sort of questions I think Ian wants and needs
asked.  SearchMonkey is an awesome app; do we need to do anything to
support it and similar apps?  *Can* anything we do support it, or is
it best served by solutions that ignore us completely?  Yes,
SearchMonkey operates on metadata, and the problem space doesn't allow
natural-language processing to stand in for it; it is not clear,
though, that a strict markup approach is best for authors or users.
Nevertheless, it is an excellent use-case to distill requirements from
so we *can* determine if a spec-based solution is desirable.

~TJ

attached mail follows:



Tab Atkins Jr. wrote:
> However, Ian has a point in his first paragraph.  SearchMonkey does
> *not* do auto-discovery; it relies entirely on site owners telling it
> precisely what data to extract, where it's allowed to extract it from,
> and how to present it.

That's incorrect.

You can build a SearchMonkey infobar that is set to function on all URLs
(just use "*" in your URL field.)

For example, the Creative Commons SearchMonkey application:

http://gallery.search.yahoo.com/application?smid=kVf.s

(currently broken because of a recent change in the SearchMonkey PHP API
that we need to address, so here's a photo:

http://www.flickr.com/photos/ysearchblog/2869419185/
)

By adding the CC RDFa markup to your page, it will show up with the
infobar in Yahoo searches.

So site-specific microformats are clearly less powerful. And
vocabulary-specific microformats, while useful, are also not as useful
here (consider a SearchMonkey application that picks up CC-licensed
items, be they video, audio, books, scientific data, etc... Different
microformats = development hell.)

Have you read the RDFa Primer?
http://www.w3.org/TR/xhtml-rdfa-primer/

It describes (pre-SearchMonkey) the kind of applications that can be
built with RDFa. SearchMonkey is an ideal example, but it's by no means
the only one.

-Ben

attached mail follows:



On Fri, Jan 9, 2009 at 3:22 PM, Ben Adida <ben@adida.net> wrote:
> Tab Atkins Jr. wrote:
>> However, Ian has a point in his first paragraph.  SearchMonkey does
>> *not* do auto-discovery; it relies entirely on site owners telling it
>> precisely what data to extract, where it's allowed to extract it from,
>> and how to present it.
>
> That's incorrect.
>
> You can build a SearchMonkey infobar that is set to function on all URLs
> (just use "*" in your URL field.)
>
> For example, the Creative Commons SearchMonkey application:
>
> http://gallery.search.yahoo.com/application?smid=kVf.s
>
> (currently broken because of a recent change in the SearchMonkey PHP API
> that we need to address, so here's a photo:
>
> http://www.flickr.com/photos/ysearchblog/2869419185/
> )
>
> By adding the CC RDFa markup to your page, it will show up with the
> infobar in Yahoo searches.

Ah, hadn't considered a net-wide SearchMonkey script.  Interesting.

This brings up different issues, however.  Something I see
immediately: Say I'm a scammer.  I know that the CC SearchMonkey app
is in wide use (pretend, here).  I start putting CC-RDF data in spam
blog comments, with my own spammy stuff in the relevant fields.  Now
people don't even have to click on the blog link in the search results
and read my obviously spammy comment to be introduced to my offers for
discount Viagra!  They'll just see a little CC bar, click on it to
have it open in-place, and there I am.  I could even hide my link in
legitimate license data, so that people only hit my malicious site
when they click the link to see more information about the license.

Issues like these make wide-scale auto-trusted use of metadata
difficult.  It also makes me more reluctant to want it in the spec
yet.  I'd rather see the community work out these problems first.  It
may be that there's a relatively simple solution.  It may be that the
crawlers can reliably distinguish between ham and spam CC data.  But
then, it may be that there *is* no good solution enabling us to use
this approach, and this kind of metadata on arbitrary sites just can't
be trusted.

I, personally, don't know the answer to this yet.  I suspect that you
don't, either; if the arbitrary-site CC infobar works at all, it's
because few people *use* CC RDF yet, and so it's still limited to a
community with implicit trust.

> So site-specific microformats are clearly less powerful. And
> vocabulary-specific microformats, while useful, are also not as useful
> here (consider a SearchMonkey application that picks up CC-licensed
> items, be they video, audio, books, scientific data, etc... Different
> microformats = development hell.)

Indeed, they are less powerful.  As I explored above, though, too much
power can be damning. It may be that the site-specific little-m
microformat (or something equivalent, allowing a developer to extract
metadata through actively targeting site structure) is powerful enough
to be useful, but weak enough to *remain* useful in the face of abuse.

(Also, I know CC is sort of the darling of the RDFa community, but
there's significant enough debate over in-band vs out-of-band
licensing info, etc. that detracts from the core issues we're trying
to discuss here that it's probably not the best example to use.)

> Have you read the RDFa Primer?
> http://www.w3.org/TR/xhtml-rdfa-primer/
>
> It describes (pre-SearchMonkey) the kind of applications that can be
> built with RDFa. SearchMonkey is an ideal example, but it's by no means
> the only one.

Yup; I was an active participant in this discussion when it started
last August.  The example applications discussed in the paper,
unfortunately, are precisely the kind where trusting metadata is
likely a *bad* idea.  For example, finding reviews of shows produced
by friends of Alice, using foaf and hreview, is rife with opportunity
for spamming.  SearchMonkey seems to avoid this for the most part;
when designing applications for particular URLs, at least, you are
relying on relatively trustworthy data, not arbitrary data scattered
across the web.  Perhaps something similar has application within
trusted networks, but in that case it comprises a completely different
use case than what SearchMonkey hits, with possibly different
requirements.

~TJ

attached mail follows:



On Fri, Jan 9, 2009 at 5:13 PM, Ben Adida <ben@adida.net> wrote:
> Tab Atkins Jr. wrote:
>> This brings up different issues, however.
>
> Is inherent resistance to spam a condition (even a consideration) for
> HTML5? If so, where is the concern around <title>, which is clearly
> featured in search engine results?

Well, it's something that we probably want to keep in mind, because
it's so relevant for the success of any such proposal.  I wouldn't
want to lend support to a feature that turned out to be immediately
useless due to spam.  Lot of wasted effort on the WG's, Ian's, and
possibly browser developer's part.

To answer your specific question, <title> is under the control of the
site author, and search engines already have elaborate methods to tell
a spammy site from a hammy one, thus downranking them.

On the other hand, the hypothetical attack scenario I outlined was
about metadata that could be added to the page by external parties.

If we were today discussing adding <title> to HTML5 to help search
engines provide a short summary of a page, and part of the proposal
might allow blog commenters to change the title of pages on a whim,
I'd certainly be equally concerned.  ^_^

~TJ

attached mail follows:



Tab Atkins Jr. wrote:
> To answer your specific question, <title> is under the control of the
> site author, and search engines already have elaborate methods to tell
> a spammy site from a hammy one, thus downranking them.

And RDFa is also entirely under the control of the site author.

> On the other hand, the hypothetical attack scenario I outlined was
> about metadata that could be added to the page by external parties.

I thought your attack concerned both author markup and commenter markup.
But it seems we agree on author markup: no additional risk there.

So on to commenter markup.

Most blogging software already white-lists the HTML elements and
attributes they allow, otherwise they are easily hacked with XSS. This
means that, by default, most blogging software will strip RDFa from
comments, which is exactly the right approach, since comments should not
have authority over the structured data of the page.

-Ben

attached mail follows:



Ian Hickson wrote:
> We have to make sure that whatever we specify in HTML5 actually is going 
> to be useful for the purpose it is intended for. If a feature intended for 
> wide-scale automated data extraction is especially susceptible to spamming 
> attacks, then it is unlikely to be useful for wide-scale automated data 
> extraction.

It's no more susceptible to spam than existing HTML, as per my previous
response.

> Nobody is suggesting that user agents derive any behavior from <title>, so 
> it doesn't matter if <title> is spammed or not.

And RDFa does not mandate any specific behavior, only the ability to
express structure. The power lies in products like SearchMonkey that
make use of this structure with innovative applications.

Can one imagine tools that make poor use of this structured data so that
they incentivize spam? Absolutely. Is this the bar for HTML5? If bad or
poorly conceived applications can be imagined, then it's not in the
standard?

> It is less likely for a user to intentionally visit a 
> spammy page than for a user to visit a page that happens to contain spammy 
> content embedded within it (e.g. in blog comments).

You've done plenty of web security work, and I suspect you know well
that spammy RDFa is the least in a large set of problems that come with
accepting arbitrary markup in blog comments. This is a strawman.

> However, browsers don't do this kind of processing -- 
> indeed, this kind of processing appears to be exactly what RDFa proponents 
> are trying to enable (though to what end, I'm still trying to find out, 
> since nobody has actually replied to all the questions I asked yet [1]).

While client-side processing is indeed an important use case (Ubiquity,
Fuzzbot, etc...), it's not the only one. SearchMonkey, which you
continue to ignore, is an important use case.

Before I invest significant time in responding to your barrage of
questions, I'm looking for a hint of objective evaluation on your end. I
thought I saw an opportunity for productive discussion based on common
ground with SearchMonkey, but this has led again into a new and
close-to-bogus reason for blocking consideration of RDFa.

> Note that search engines aren't the problem here

Actually, we were discussing SearchMonkey, so I think it's very much the
context for this sub-thread. You continue to ignore SearchMonkey, for
reasons which, as I've pointed out in a response earlier today, are
factually incorrect.

-Ben

attached mail follows:



Ben Adida ha scritto:
> Ian Hickson wrote:
>   
>> We have to make sure that whatever we specify in HTML5 actually is going 
>> to be useful for the purpose it is intended for. If a feature intended for 
>> wide-scale automated data extraction is especially susceptible to spamming 
>> attacks, then it is unlikely to be useful for wide-scale automated data 
>> extraction.
>>     
>
> It's no more susceptible to spam than existing HTML, as per my previous
> response.
>
>   

Perhaps this is why general purpose search engines do not rely 
(entirely) on metadata and markup semantics to classify content, nor 
does Yahoo with SearchMonkey. SearchMonkey documentation points out that 
metadata never affects page ranks, nor is semantics interpreted for any 
purpose; metadata only affects additional informations presented to the 
user at the user will, and if the user chose to get informations of a 
certain kind (gathered by a certain data service), thus spammy metadata 
can be thought as circumscribed in this case, they might corrupt 
SearchMonkey additional data, but not the user's overall experience with 
the search engine. From this point of view, SearchMonkey is some kind of 
wide-range but small-scale use case (with respect to each tool and each 
site the user might enable), because the user can easily choose which 
sources to trust (e.g. which data services to use, or which sites to 
look for additional infos), and in any case he can get enough infos 
without metadata.

On the other hand, a client UA implementing a feature entirely based on 
metadata couldn't easily circumscribe abused metadata and bring valid 
informations to the user attention, nor could the average user take 
easily trusted and spammy sites apart, because he wouldn't understand 
the problem (and a site with spammy metadata might still contain 
informations users were interested in previously, or in a different 
context), whereas in SearchMonkey the average user would notice 
something doesn't work in enhanced results, but he'd also get the basic 
infos he was looking for. Thus there are different requirements to be 
taken into account for different scenarios (SearchMonkey and client UA 
are such different scenarios)

Moreover, SearchMonkey is a kind of centralised service based on 
distributed metadata, it doesn't need collaboration by any other UA 
(that is, it doesn't need support for metadata in other software) by 
default (whereas it allows custom data services to autonomously extract 
metadata, but always for the purposes of SearchMonkey), it only requires 
that web sites adhering to the project (or just willing to provide 
additional infos) embed some kind of metadata only for the purpose of 
making them available to SearchMonkey services, or at least that authors 
create appropriate metadata and send them to Yahoo (in the form of 
dataRSS embedded in a Atom document). That is, SearchMonkey seems to me 
a clear example of a use case for metadata not requiring any changes to 
html5 spec, since any kind of supported metadata are used by 
SearchMonkey as if they were custom, private metadata; whatever happens 
to such metadata client-side, even if they're just stripped by a 
browser, doesn't really matter.

Furthermore, SearchMonkey supports several kinds of metadata, not only 
RDFa, but also eRDF, microformats and dataRSS external to the document. 
So, why should SearchMonkey be the reason to introduce explicit support 
to RDFa and not also for eRDF, which doesn't require new attributes, but 
just a parser? One might think one solution is better than the other, 
and this might be true in theory, but what really counts is what people 
do find easier to use, and this might be determined by experience with 
SearchMonkey (that is, let's see what people use more often, then decide 
what's more needed).

Moreover, RDFa is thought for xhtml, thus it can't be introduced in html 
serialization just by defining a few new attributes: a processor would 
or might need some knowledge over /namespaces/, thus the whole "family" 
of *xmlns* attributes (with and without prefixes) should be specified 
for use with the html serialization, unless an alternative mechanism, 
similar to the one chosen for eRDF, were defined, and maybe such would 
result in a new, hybrid mechanism (stitching together pieces from eRDF 
and RDFa). Buf if we introduce xmlns and xmlns:<prefix> into html 
serialization, why not also prefixed attributes? That is, can RDFa be 
introduced into html serialization "as is", without resorting to the 
whole xml extensibility? This should be taken into account as well, 
because just adding new attributes to the language might work fine for 
xml-serialized documents, but might not for html-serialized ones. This 
means RDFa support might be more difficult than it may seem at first 
glance, whereas it might not be needed for custom and/or small scale use 
cases (and I think SearchMonkey is one such case).

>> Nobody is suggesting that user agents derive any behavior from <title>, so 
>> it doesn't matter if <title> is spammed or not.
>>     
>
> And RDFa does not mandate any specific behavior, only the ability to
> express structure. The power lies in products like SearchMonkey that
> make use of this structure with innovative applications.
>
> Can one imagine tools that make poor use of this structured data so that
> they incentivize spam? Absolutely. Is this the bar for HTML5? If bad or
> poorly conceived applications can be imagined, then it's not in the
> standard?
>
>   

I think the right question should be whether there are effective counter 
measures to circumscribe bad uses and make possible damages less 
significant then advantages from good uses. When a feature in the 
standard is thought to be a possible security (or privacy) issue, 
counter-measures are proposed. Since spam is a possible immediate issue 
for abused metadata, especially in wide-scale and automated data 
extraction, we should also think to possible counter-measures to be 
specc'ed out along with RDFa attributes.

WBR, Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Innammorarsi  facile con Meetic, milioni di single si sono iscritti, si sono conosciuti e hanno riscoperto l'amore. Tutto con Meetic, prova anche tu!
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8292&d=10-1

attached mail follows:



On 10/1/09 00:37, Ian Hickson wrote:
> On Fri, 9 Jan 2009, Ben Adida wrote:
>> Is inherent resistance to spam a condition (even a consideration) for
>> HTML5?
>
> We have to make sure that whatever we specify in HTML5 actually is going
> to be useful for the purpose it is intended for. If a feature intended for
> wide-scale automated data extraction is especially susceptible to spamming
> attacks, then it is unlikely to be useful for wide-scale automated data
> extraction.

I've been looking at such concerns a bit for RDFa. One issue (shared 
with HTML in general I think) is user-supplied content, eg. blog 
comments and 'rel=nofollow' scenarios).  Is there any way in HTML5 to 
indicate that a whole chunk of Web page is from an (in some 
to-be-defined sense) untrusted source?

I see http://www.whatwg.org/specs/web-apps/current-work/#link-type-nofollow

"The nofollow keyword indicates that the link is not endorsed by the 
original author or publisher of the page, or that the link to the 
referenced document was included primarily because of a commercial 
relationship between people affiliated with the two pages."

While I'm unsure about the "commercial relationship" clause quite 
capturing what's needed, the basic idea seems sound. Is there any 
provision (or plans) for applying this notion to entire blocks of 
markup, rather than just to simple hyperlinks? This would be rather 
useful for distinguishing embedded metadata that comes from the page 
author from that included from blog comments or similar.

Thanks for any pointers,

cheers,

Dan

--
http://danbri.org/

attached mail follows:



Ben Adida ha scritto:
> Tab Atkins Jr. wrote:
>   
>> Actually, SearchMonkey is an excellent use case, and provides a
>> problem statement.
>>     
>
> I'm surprised, but very happily so, that you agree.
>
> My confusion stems from the fact that Ian clearly mentioned SearchMonkey
> in his email a few days ago, then proceeded to say it wasn't a good use
> case.
>
> -Ben
>
>   

It seems to me that's a very custom use case - though requiring metadata 
to be embedded in a big number of pages, but that's an optional 
requirement, because search results don't rely only on metadata -  since 
metadata are used as an optional source for informations by the server 
and don't require any collaboration by other kinds of UA (excluding, at 
most, some custom data services - whereas, for instance, a search engine 
using the mark element to highlight a keyword would require a client UA 
to understand and style it properly -- I expect it not to be working on 
IE6, for instance, because IEx browsers deal with unknown elements as if 
their content where misplaced). That is, Yahoo might develop his own 
data model and work fine with sites implementing it; perhaps RDF(a) was 
chosen because they might think RDF is a natural way to model data which 
are sparse in a web page (and re-mapping microformats on RDF might 
result in an easier implementation); anyway, in this case the only UA 
needing to understand RDFa, in this case, is SearchMonkey itself, thus a 
client browser might just drop RDFa attributes without breaking 
SearchMonkey functionalities -- at least, this is my first impression.

Furthermore, it's a very recent (yet potentially interesting) 
application, so why not to wait and see how it grows, if the opt-in 
mechanism will effectively prevent spam (e.g. spammers might model data 
basing on widely diffused vocabularies and data services, and find a way 
to make such data available in searches when users asks for additional 
infos, for instance through an ad within a page of an accomplice author, 
or exploiting some kind of errors in authors' selection of URLs to be 
crawled for metadata, or the alike), or just which model become the most 
used among RDFa, eRDF, Microformats, Atom embedding dataRSS and whatever 
else Yahoo might decide to support, before choosing to include one or 
the other into html5 specification (or to include each one because 
equally diffused)? Moreover, it seems that some xml processing is needed 
to create a custom data service, thus it might be natural to use xhtml 
(possibly along with namespaces and prefixed attributes) to provide 
metadata to such a data service, which might rely on an xml parser 
instead of implementing one from scratch (and html parser might not 
support namespaces for the purpose to expose them through DOM 
interfaces, as I understand html serialization) -- the use of prefixed 
RDFa attributes, or perhaps even unprefixed ones, within an 
xml-serialized document, shouldn't require a formalization in html5 
spec, as far as there is no strict requirement for UAs to support RDF 
processing - as it is for the purposes of SearchMonkey and its related 
data services.

WBR, Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8551&d=9-1

attached mail follows:



On Sat, 10 Jan 2009 06:41:10 +1100, Julian Reschke <julian.reschke@gmx.de>  
wrote:

> Tab Atkins Jr. wrote:
>>> *If* we want to support RDFa, why not add the attributes the way they  
>>> are
>>> already named???
>>  Because the issue is that we don't yet know if we want to support
>> RDFa.  That's the whole point of this thread.  Nobody's given a useful
>> problem statement yet, so we can't evaluate whether there's a problem
>> we need to solve, or how we should solve it.
>
> For the record: I disagree with that. I have the impression that no  
> matter how many problems are presented, the answer is going to be: "not  
> that stone -- fetch me another stone".

There does appear to be some of this. I have no idea if that is just an  
impression or the truth. Hence my continued following of the thread.

>> Alex's suggestion, while officially against spec, has the benefit of
>> allowing RDFa supporters to sort out their use cases through
>> experience.  That's the back door into the spec, after all; you don't
>
> If something that is against the spec is acceptable, then it's *much*  
> easier to just use the already defined attributes. Better breaking the  
> spec by using new attributes then abusing existing ones.

Indeed. I the data-* attributes had some reserved values, then one might  
expect people to invest in them on the scale that they have typically made  
RDF investments. But then there would be no need to change the attribute  
names at all (nor, for that matter, to put much effort into other  
attribute names following the design pattern. It just becomes another  
approach to namespaces with another centralisation process required). The  
question is what would convince the editors of the spec that there is in  
fact a use case for RDF in HTML which is what has led to the request to  
include RDFa (a form of RDF carefully designed to fit into HTML).

cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

attached mail follows:



Julian Reschke ha scritto:
> Calogero Alex Baldacchino wrote:
>> ...
>> This is why I was thinking about somewhat "data-rdfa-about", 
>> "data-rdfa-property", "data-rdfa-content" and so on, so that, for the 
>> purposes of an RDFa processor working on top of HTML5 UAs (perhaps in 
>> a test phase, if needed at all, of course), an element dataset would 
>> give access to "rdfa-about", instead of just "about", that is using 
>> the prefix "rdfa-" as acting as a namespace prefix in xml (hence, as 
>> if there were "rdfa:about" instead of "data-rdfa-about" in the markup).
>> ...
>
> That clashed with the documented purpose of data-*.

Hmm, I'm not sure there is a clash, since I was suggesting a *custom* 
and essentially *private* mechanism to experiment with RDFa in 
conjunction with HTML serialization, for the *small-scale* needs of some 
organizations willing to embed RDFa metadata in text/html documents, and 
to exchange them with each other by using a convention likely avoiding 
name clashes with other private metadata. Since I think it's unlikely to 
find data-rdfa-* used with different semantics in the very same page, 
and in a small-scale scenario involving a few *selected* sources for 
RDFa-modelled information, it should be likely to know in advance that 
someone else is using the same conventions. Such a modelled document 
might be used in conjunction with an external RDFa processor, thus 
avoiding any direct support in a browser.

However, such a convention might be enough "clash-free" to work on a 
wider scale, thus it might become widespread and provide an evidence 
that the web /needs/, or at least /has chosen/ to use RDFa as (one of) 
the most common way to embed metadata in a document, and such might be 
enough to add a native support for the whole range of "RDFa" attributes, 
eventually along with support for earlier experimental ones (such as 
"data-rdfa-*" and "rdfa:*" ones, for backward compatibility). And 
actually I can't see much of a problem if a private-born feature became 
the base of a widespread and widely accepted convention (I'm not saying 
the spec should name data-rdfa-* as a mean to implement RDFa, instead I 
think that, if a general agreement on if and how RDFa must be spec'ed 
out and implemented can't be found, such an experiment might be proposed 
to the semantic web industry and wait for the results - given a lack in 
support might prevent any interested party to use RDFa and HTML5 
altogether).

>
> *If* we want to support RDFa, why not add the attributes the way they 
> are already named???
>

For instance, to experiment whether it is worth to change the "if we 
want" into "we do want", without requiring an early implementation and 
specification, nor relying on if and what a certain browser vendor might 
want to experiment differently from others (such a convention would only 
require support for HTML5 datasets and a script or a plugin capable to 
handle them as representing RDFa metadata). -- the point here is that 
after introducing data-* attributes as a mean to support custom 
attributes any browser vendors might decide to drop support for other 
kind of custom attributes in html serialization (that is, for attributes 
being neither part of the language nor data-* ones), therefore if they 
(or any of them) decided to avoid to support RDFa attributes until they 
were introduced in a specification there might be no mean to experiment 
with them (in general, that is cross-browser) without resorting either 
to data-* or to "rdfa:*" (the latter in xhtml).

Anyway, /in general/ what should a browser do with RDFa metadata, on a 
*wide scale*, other than classifying a portion of the open web (e.g. in 
its local history), eventually allowing users to select trusted sources?

Actually, I don't think such would bring enough benefits for *average* 
users, compared to the risk to get a lot of spam metadata from 
/heterogeneous/ sources. I really don't expect average users to 
understand how to filter sites basing on metadata reliability (and just 
for the purpose to use a metadata-based query interface, because a site 
with wrong metadata might still contain usefull informations); instead 
they might just try and use a query interface the same way they use a 
default search bar, get wrong results (once spam metadata became 
widespread) and decide the mechanism doesn't work fine (eventually 
complaining for that). A somewhat antispam filter might help, but I 
think that understanding if metadata are reliable, that is if they 
really correspond to a web page content, is an odd problem to be solved 
by a bot without a good degree of Artificial Intelligence (filtering 
emails by looking for suspicious patterns is far easier than 
implementing a filter capable to /understand/ metadata, /understand/ 
natural language and compare /semantics/ ).

As well, I don't expect the great majority of web pages to contain 
"valid" metadata: most people would not care of them, and a potentially 
growing number might copy&paste code containing metadata from other 
sites as a kind of template, then edit the content and ignore any 
metadata, thus breaking reliability. I do think wide-scale use of 
metadata coming from heterogeneous sources can be more harmful than 
useful. *If* we do agree that small-scale needs is the main context 
where RDFa can bring benefits, perhaps a custom mechanism and external 
plugins are all we need; otherwise, it should be proved that /misused/ 
and /abused/ metadata can be filtered out *easily* and *automatically*, 
without requiring average users to understand the problem, nor affecting 
the overall efficiency. IMHO.

>> ...
>> However, AIUI, actual xml serialization (xhtml5) allows the use of 
>> namespaces and prefixed attributes, thus couldn't a proper namespace 
>> be introduced for RDFa attributes, so they can be used, if needed, in 
>> xhtml5 documents? I think such might be a valuable choice, because it 
>> seems to me RDFa attributes can be used to address such cases where 
>> metadata must stay as close as possible to correspondent data, but a 
>> mistake in a piece of markup may trigger the adoption agency or 
>> foster parenting algorithms, eventually causing a separation between 
>> metadata and content, thus possibly breaking reliability of gathered 
>> informations. From this perspective, a parser stopping on the very 
>> first error might give a quicker feedback than one rearranging 
>> misnested elements as far as it is reasonably possible (not 
>> affecting, and instead improving, content presentation and users' 
>> "direct" experience, but possibly causing side-effects with metadata).
>> ...
>
> That would make RDFa as used in XHTML 1.* and RDFa used in HTML 5 
> incompatible. What for?
>
> > ...
>
> BR, Julian

Because I'm not sure RDFa can work fine with HTML serialization. To 
clarify that, let me take and modify an example from W3C Recommendation 
(without pretending it to be a good example to build a good worst-case 
scenario, but just to give an idea):

[...]
<p>
   I'm holding
   <span property="cal:summary">
     one last summer Barbecue
   </span>, to meet friends and have a party before the end of holidays
   on
   <span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
         datatype="xsd:dateTime">
     September 16th at 4pm
   </span>.
</p>
[...]


Now let consider it written as:

[...]
<p>
  I'm holding
  <span property="cal:summary">
    one last summer Barbecue
 <!-- now the </span> close tag is missing here -->,
  to meet friends and have a party before the end of holidays
  on
  <span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
        datatype="xsd:dateTime">
    September 16th at 4pm
  </span>.
</p>
[...]


The above would result in a parse error as an xml-serialized document, 
since the document isn't well formed. Instead, as part of an 
html-serialized document, the above fragment would be processed anyway, 
improving users' experience (with respect to a page stopping rendering 
on a missing close tag), but potentially causing metadata to be 
imprecisely binded to any data, thus potentially harming automated data 
extraction (for some purpose). Therefore, perhaps using such metadata 
only inside xml serialized pages might give a quick feedback on such a 
problem as soon as the author checked a page appearance (which I think 
would be the very first check, as well as I think about no one would 
check the _whole_ range of possible queries people might make over a 
document, to look for errors).

*If* this is meaningful, supporting RDFa attributes as "rdfa:*" might 
ensure that xml serialization is preferred by people really needing to 
use this kind of metadata (while leaving a chance to experiment RDFa 
with html serialization, because no one can be prohibited to use 
data-<prefix>-* for this purpose beside a proper script or plugin), 
whereas introducing "about", "property", "content", "datatype" and so on 
directly in html namespace, as attributes shared by all elements, would 
make the choice of one serialization or the other indifferent, thus 
leading to every possible side-effects html serialization may cause.

As a side note, It seems that people from the W3C are evaluating a 
resort to extensibility to introduce RDFa attributes into xml-serialized 
html documents, and they also have some doubts whether allow use of RDFa 
attributes within html serialization or not:

"The HTML WG is encouraged to provide a mechanism to permit 
independently developed vocabularies such as Internationalization Tag 
Set (ITS), Ruby, and RDFa to be mixed into HTML documents. /Whether this 
occurs through the extensibility mechanism of XML, *whether it is also 
allowed in the classic HTML serialization*, and whether it uses the DTD 
and Schema modularization techniques/, is for the HTML WG to determine."
(from <http://www.w3.org/2007/03/HTML-WG-charter#deliverables>)

WBR, Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Meetic: il leader italiano ed europeo per trovare l'anima gemella online. Provalo ora
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8291&d=9-1

attached mail follows:



Calogero Alex Baldacchino wrote:
> That is, choosing a proper level of integration for RDF(a) support into
> a web browser might divide success from failure. I don't know what's the
> best possible level, but I guess the deepest may be the worst, thus
> starting from an external support through out plugins, or scripts to be
> embedded in a webbapp, and working on top of other feature might work
> fine and lead to a better, native support by all vendors, yet limited to
> an API for custom applications

There seems to be a bit of confusion over what RDFa can and can't do as
well as the current state of the art. We have created an RDFa Firefox
plugin called Fuzzbot (for Windows, Linux and Mac OS X) that is a very
rough demonstration of how an browser-based RDFa processor might
operate. If you're new to RDFa, you can use it to edit and debug RDFa
pages in order to get a better sense of how RDFa works.

There is a primer[1] to the semantic web and an RDFa basics[2] tutorial
on YouTube for the completely un-initiated. The rdfa.info wiki[3] has
further information.

----------------
(sent to public-rdfa@w3.org earlier this week):

We've just released a new version of Fuzzbot[4], this time with packages
for all major platforms, which we're going to be using at the upcoming
RDFa workshop at the Web Directions North 2009 conference[5].

Fuzzbot uses librdfa as the RDFa processing back-end and can display
triples extracted from webpages via the Firefox UI. It is currently most
useful when debugging RDFa web page triples. We use it to ensure that
the RDFa web pages that we are editing are generating the expected
triples - it is part of our suite of Firefox web development plug-ins.

There are three versions of the Firefox XPI:

Windows XP/Vista (i386)
http://rdfa.digitalbazaar.com/fuzzbot/download/fuzzbot-windows.xpi

Mac OS X (i386)
http://rdfa.digitalbazaar.com/fuzzbot/download/fuzzbot-macosx-i386.xpi

Linux (i386) - you must have xulrunner-1.9 installed
http://rdfa.digitalbazaar.com/fuzzbot/download/fuzzbot-linux.xpi

There is also very preliminary support for the Audio RDF and Video RDF
vocabularies, demos of which can be found on YouTube[6][7].

To try it out on the Audio RDF vocab, install the plugin, then click on
the Fuzzbot icon at the bottom of the Firefox window (in the status bar):

http://bitmunk.com/media/6566872

There should be a number of triples that show up in the frame at the
bottom of the screen as well as a music note icon that shows up in the
Firefox 3 AwesomeBar.

To try out the Video RDF vocab, do the same at this URL:

http://rdfa.digitalbazaar.com/fuzzbot/demo/video.html

Please report any installation or run-time issues (such as the plug-in
not working on your platform) to me, or on the librdfa bugs page:

http://rdfa.digitalbazaar.com/librdfa/trac

-- manu

[1] http://www.youtube.com/watch?v=OGg8A2zfWKg
[2] http://www.youtube.com/watch?v=ldl0m-5zLz4
[3] http://rdfa.info/wiki
[4] http://rdfa.digitalbazaar.com/fuzzbot/
[5] http://north.webdirections.org/
[6] http://www.youtube.com/watch?v=oPWNgZ4peuI
[7] http://www.youtube.com/watch?v=PVGD9HQloDI

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.

blog: Fibers are the Future: Scaling Past 100K Concurrent Requests
http://blog.digitalbazaar.com/2008/10/21/scaling-webservices-part-2

attached mail follows:



Manu Sporny ha scritto:
> Calogero Alex Baldacchino wrote:
>   
>> That is, choosing a proper level of integration for RDF(a) support into
>> a web browser might divide success from failure. I don't know what's the
>> best possible level, but I guess the deepest may be the worst, thus
>> starting from an external support through out plugins, or scripts to be
>> embedded in a webbapp, and working on top of other feature might work
>> fine and lead to a better, native support by all vendors, yet limited to
>> an API for custom applications
>>     
>
> There seems to be a bit of confusion over what RDFa can and can't do as
> well as the current state of the art. We have created an RDFa Firefox
> plugin called Fuzzbot (for Windows, Linux and Mac OS X) that is a very
> rough demonstration of how an browser-based RDFa processor might
> operate. If you're new to RDFa, you can use it to edit and debug RDFa
> pages in order to get a better sense of how RDFa works.
>
>   

The concern is about every kind of metadata with respect to their 
possible uses; but, while it's been stated that Microforamts (for 
instance) don't require any purticular support by UAs (thus they're 
backward compatible), RDFa would be a completely new feature, thus html5 
specification should say what UAs are espected to do with such new 
attributes.

Shall UAs just "accept" them and expose an API to extract triples, so 
that a web application can build a query mechanism upon such an API? 
This might work fine, and fulfill small-scale scenarios, such as 
organization-wise data modelling and interchanging, as suggested by 
Charls McCathieNevile; this can also be accomplished by an external plugin.

Shall UAs (browsers) also provide an interface to view bare triples (as 
does Fuzzbot), as a kind of debugging tool? As above.

Shall UAs (browsers) also provide metadata-based features, such as a 
query interface to look for content in a local history? This is a wider 
scale application, and also a use case where problems may arise. From 
this angle, metadata can't be assumed as reliable apriori (instead, 
their reliability is uncertain), nor can users be deemed capable to 
understand the problem and filter out wrong/misused/abused metadata (in 
general). This is the scenario were spammy metadata may become an issue. 
For instance, some code like,

<div  typeof="foaf:Person">
    <p property="foaf:name" content="Manu Sporny">We sell
        <a href="http://www.cheatingcarseller.com" 
rel="foaf:homepage">cars</a>
    </p>
</div>

would produce the following triples,

_:bnode0     rdf:type     http://xmlns.com/foaf/0.1/Person
_:bnode0     foaf:homepage     http://www.cheatingcarseller.com
_:bnode0     foaf:name     Manu Sporny

(this is exactly what Fuzzbot outputs)

thus, a metadata-based search feature might output a link to a 
"metadata-spammy" site when queried for "Manu Sporny". That is, cheating 
a metadata-based bot by the mean of fake metadata can be very easy.

Metadata-based features, but this is true for most of xml-related 
technologies (such as RDF/RDFa), work fine if properly used. Unluckily, 
"things must be used properly to work fine" is not the basic principle 
of the web (and this is specially true for html and related 
technologies), which instead has always been about "people will mess 
everything up, but UAs will work fine as well", that is "robustness 
before all, as far as possible". For what concerns html serialization, 
in particular, I'd consider some code like,

<p typeof="cal:Vevent">
  I'm holding
  <span property="cal:summary">
    one last summer Barbecue
  <!-- /span -->, to meet friends and have a party before the end of 
holidays
  on
  <span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
        datatype="xsd:dateTime">
    September 16th at 4pm
  </span>.
</p>

(taken from <http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/> and 
purposedly modified)

which is rendered properly, but produces,

_:bnode1     rdf:type     http://www.w3.org/2002/12/cal/icaltzd#Vevent
_:bnode1     cal:dtstart     2007-09-16T16:00:00-05:00
_:bnode1     cal:summary     one last summer Barbecue , to meet friends 
and have a party before the end of holidays on <span 
xmlns:cal="http://www.w3.org/2002/12/cal/icaltzd#" 
xmlns:foaf="http://xmlns.com/foaf/0.1/" 
xmlns:xsd="http://www.w3.org/2001/XMLSchema#" 
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
datatype="xsd:dateTime" datatype="xsd:dateTime" 
content="2007-09-16T16:00:00-05:00" property="cal:dtstart">September 
16th at 4pm</span>

(taken from Fuzzbot keeping namespace declarations in the root element; 
without xmlns:* attributes all triples are lost)

which is not the desired result. Perhaps it might work better as an xml 
feature on a "strict" xml parser (aborting with an error because of a 
missing end tag), even considering RDFa relies on namespaces (thus, 
adding RDFa attributes to HTML5 spec would require some features from 
xml extensibility to be added to html serialization). But RDFa in an 
XHTML document might look like "rdfa:about", "rdfa:property", 
"rdfa:content", and so on, that is as an external module, thus not 
requiring any changes to the spec.

WBR, Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Incrementa la visibilita' della tua azienda con l'invio di newsletter e campagne email marketing.
* Con investimento di soli 250 Euro puoi incrementare la tua visibilita'
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8350&d=11-1

attached mail follows:



On 09.01.2009, at 01:54, Calogero Alex Baldacchino wrote:
>
> This is why I was thinking about somewhat "data-rdfa-about", "data- 
> rdfa-property", "data-rdfa-content" and so on, so that, for the  
> purposes of an RDFa processor working on top of HTML5 UAs

One can also use <link rel="alternate" href="description.rdf">. I  
don't see why RDF metadata must be in the HTML document. It could be  
in a separated file, maybe embedded in RSS/Atom feeds (RSS1.0 is  
pretty close already).

Websites that have a lot of useful data to share usually keep it in a  
database, and this allows them to easily generate RDF as separate  
documents without risk of getting out of sync with the HTML version.

IMHO even RDFa metadata is invisible, and errors in RDFa wouldn't be  
much easier to spot than erorrs in external RDF files, e.g.:

<section typeof="atom:Entry" xmlns:foaf="http://xmlns.com/foaf/1.0/"
xmlns:atom="http://purl.org/atom/ns#">
  <address rel="atom:author">
    On <time property="atom:published" content="2009-01-10"
     >10 Jan 2009</time>,
    <a property="foaf:name" rel="foaf:page"
    href="http://joe.example.com">Joe Bloggs</a> wrote:
  </address>

-- 
regards, Kornel




attached mail follows:



Kornel Lesi„ski ha scritto:
> On 09.01.2009, at 01:54, Calogero Alex Baldacchino wrote:
>>
>> This is why I was thinking about somewhat "data-rdfa-about", 
>> "data-rdfa-property", "data-rdfa-content" and so on, so that, for the 
>> purposes of an RDFa processor working on top of HTML5 UAs
>
> One can also use <link rel="alternate" href="description.rdf">. I 
> don't see why RDF metadata must be in the HTML document. It could be 
> in a separated file, maybe embedded in RSS/Atom feeds (RSS1.0 is 
> pretty close already).
>
> Websites that have a lot of useful data to share usually keep it in a 
> database, and this allows them to easily generate RDF as separate 
> documents without risk of getting out of sync with the HTML version.
>

In principle, I agree (also, Atom 1.0 embedding RDFa as dataRSS is the 
base of SearchMonkey). But if people feel the need to embed metadata in 
their documents and to use them as a distributed database, well, let's 
give them a chance to do so. :-P

eRDF might be a working compromise, because it doesn't need any changes 
to the spec; RDFa covers a wider range of RDF semantics, but requires 
new attributes and also namespaces (a sort of hybrid beteween them might 
avoid the need to bring namespaces - xmlns:* attributes - into html 
serialization). My suggestion was meant as a mean to test RDFa in HTML 
documents without changing the spec (perhaps in conjunction with 
data-xmlns-*, data-xmlns-prefixes="rdfa foaf <whatever>" to "emulate" 
namespaces - an ugly hack, I know, but at least would avoid changes to 
html serialization, at least in a test phase) -- even if I think that 
xml serialization should work better for such rdf metadata.

WBR, Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8549&d=11-1

attached mail follows:



On 11/1/09 02:51, Calogero Alex Baldacchino wrote:
> eRDF might be a working compromise, because it doesn't need any changes
> to the spec

It's not possible to author conforming HTML5 that functions as eRDF 
since eRDF requires a 'profile' attribute, but HTML5 has removed the 
attribute.

http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml

; RDFa covers a wider range of RDF semantics, but requires
> new attributes and also namespaces (a sort of hybrid beteween them might
> avoid the need to bring namespaces - xmlns:* attributes - into html
> serialization).

To avoid xmlns:* attributes, one could drop CURIEs in the text/html 
serialization and use markup like:

<div>
   <div about="http://dbpedia.org/resource/Albert_Einstein">
     ...
   </div>
</div>

instead of

<div xmlns:db="http://dbpedia.org/">
   <div about="[db:resource/Albert_Einstein]">
     ...
   </div>
</div>

There's no data loss.

> My suggestion was meant as a mean to test RDFa in HTML
> documents without changing the spec (perhaps in conjunction with
> data-xmlns-*, data-xmlns-prefixes="rdfa foaf <whatever>" to "emulate"
> namespaces - an ugly hack, I know, but at least would avoid changes to
> html serialization, at least in a test phase) -- even if I think that
> xml serialization should work better for such rdf metadata.

I really can't see anybody violating the spec in that way rather than 
violating the spec by just adding the RDFa attributes outright, 
especially given that there are already people publishing these 
attributes in text/html so the "namespace" has already been polluted and 
we already have services like SearchMonkey not only using these 
attributes but promoting them. It may therefore already be problematic 
for a future version of HTML to use these attributes as extension points 
without breaking existing sites. The "test" is already in progress, for 
better or worse. HTML5 conformance checkers don't have to bless this 
test, of course, any more than CSS validators have to give the all clear 
to vendor-specific properties.

Moreover, the damage done by immediately breaking the principle that 
data-* should be for private use only and turning it into a distributed 
extension point may be worse than the alternatives.

--
Benjamin Hawkes-Lewis

attached mail follows:



Benjamin Hawkes-Lewis ha scritto:
> On 11/1/09 02:51, Calogero Alex Baldacchino wrote:
>> eRDF might be a working compromise, because it doesn't need any changes
>> to the spec
>
> It's not possible to author conforming HTML5 that functions as eRDF 
> since eRDF requires a 'profile' attribute, but HTML5 has removed the 
> attribute.
>

I didn't noticed that before, thanks for the info :-)

However, actually it's the same for RDFa attributes, because they're not 
in the spec. From this point of view, introducing six new attributes, or 
resorting to an older one is not very different, thus (again) why RDFa 
and not eRDF? Or why not both? Or not also RDFa embedded in Atom 
embedded, in turn, in HTML (like SVG or MathML)? It seems to me, for 
instance, that at this stage SearchMonkey might be a reason to consider 
all of them.

>
> ; RDFa covers a wider range of RDF semantics, but requires
>> new attributes and also namespaces (a sort of hybrid beteween them might
>> avoid the need to bring namespaces - xmlns:* attributes - into html
>> serialization).
>
> To avoid xmlns:* attributes, one could drop CURIEs in the text/html 
> serialization and use markup like:
>
> <div>
>   <div about="http://dbpedia.org/resource/Albert_Einstein">
>     ...
>   </div>
> </div>
>
> instead of
>
> <div xmlns:db="http://dbpedia.org/">
>   <div about="[db:resource/Albert_Einstein]">
>     ...
>   </div>
> </div>
>
> There's no data loss.
>

Well, that's a chance, of course, but that's *not* RDFa as specified by 
W3C; for instance, @property is specified as accepting _only_ CURIEs 
(whereas @about can accept also URIs - and eRDF allows curies, even if 
in a different format than what specified for RDFa and what is used for 
XML in general). That is, to do that not one, but _two_ specifications 
need to be changed, current HTML5 (which is a draft, thus  not a 
problem) and RDFa (which now is a Recommendation, thus, might it be more 
difficoult? should a different specification be derived?), unless we 
want that to be just an unofficial, yet widely accepted, convention - 
and I think that an unofficial convention is worth the others (any 
processors conforming to standard RDFa would need deep changes to cope 
with that - it doesn't work in Fuzzbot when CURIEs are expected, for 
instance). I'm the first to say that my suggestion was an ugly hack, but 
at least it would have been working and conformant without changing 
anything.

>> My suggestion was meant as a mean to test RDFa in HTML
>> documents without changing the spec (perhaps in conjunction with
>> data-xmlns-*, data-xmlns-prefixes="rdfa foaf <whatever>" to "emulate"
>> namespaces - an ugly hack, I know, but at least would avoid changes to
>> html serialization, at least in a test phase) -- even if I think that
>> xml serialization should work better for such rdf metadata.
>
> I really can't see anybody violating the spec in that way rather than 
> violating the spec by just adding the RDFa attributes outright, --

Indeed, current specs are violated, and I was just considering a way to 
use RDFa without such violations before deciding if it's worth to be 
added to the spec, no more (and I don't want to push that hack anymore, 
just trying to point out my aim).

> --especially given that there are already people publishing these 
> attributes in text/html so the "namespace" has already been polluted 
> and we already have services like SearchMonkey not only using these 
> attributes but promoting them.

It seems to me that SearchMontky doesn't promote RDFa more than it 
promotes Microformats, eRDF and dataRSS (RDFa embedded in external Atom 
feeds). It's also a very recent feature, and I really can't guess which 
kind of RDF serialization is going to "win the battle" (that is, 
choosing one against the others *might* be a premature choice right now, 
as well as introducing all of them).

> It may therefore already be problematic for a future version of HTML 
> to use these attributes as extension points without breaking existing 
> sites. The "test" is already in progress, for better or worse. HTML5 
> conformance checkers don't have to bless this test, of course, any 
> more than CSS validators have to give the all clear to vendor-specific 
> properties.

It's the same with every possible existing custom (non-standard) 
attributes and elements out there, since there is no standard for them, 
and instead data-* has been created; it's also the same for accesskey, 
actually, since it's not in current spec (whereas it was in HTML4). 
After all, support for unknown attributes/elements has never been a 
standard "de jure", but more of a quirk, and there are no grants it will 
work fine in the future (as well as actually it doesn't work 
consistently for unknown elements cross-browsers -- there are strong 
differences between IE and other browsers with this respect).

Moreover, the use of such attributes /for the purposes of SearchMonkey/ 
is a very, very custom use case, since they're used just for server-side 
computations, thus no collaboration is required by other UAs; if 
browsers just ignored and dropped such attributes (as they do with 
unknown, proprietary CSS extensions), no page would be broken, whereas 
SearchMonkey would work as fine. Problems might arise if they were used 
in different contexts (e.g. as CSS selectors - but dropping unknown CSS 
rules is allowed by CSS spec), but who cares of them might just run a 
regex tool to map them to a new, standard-compliant version (given that, 
for instance, "data-rdfa-about", "rdfa:about" and "about" are in a 
1-to-1 correspondence, thus such might be done very easily by UAs as a 
quirk).

 From this point of view, SearchMonkey might use its own custom dataset 
and model without any changes to its functionalities (AIUI, the basic 
format for RDF metadata in SearchMonkey is dataRSS). Since there are 
standards for embedding RDF into (x)html documents, it just makes sense 
to support them all for Yahoo.

>
> Moreover, the damage done by immediately breaking the principle that 
> data-* should be for private use only and turning it into a 
> distributed extension point may be worse than the alternatives.
>
> -- 
> Benjamin Hawkes-Lewis

I really don't see the problem if a *custom* convention became widely 
accepted and reused by other people (given that my idea started from a 
Charls McCathieNevile's mail presenting small-scale scenarios, such as 
organizations' internal use and external interchange with other selected 
organizations, as a main context for RDFa - and I've never said HTML 
specification should even mention it, I was thinking to it just as an 
unofficial convention to experiment with in such scenarios).

I really can't get, right now, why it should be different, for instance, 
from the case of a freely reusable widget using a custom data model 
based on private data-* attributes inserted by people in thousands of 
websites (the widget with relitive metadata, I mean), then liked by 
other people and reused in different contexts (the same data model based 
on data-*, now), unless we agree this should be avoided, but I can't 
guess how to prevent people from reusing a "private-only" data model 
they happened to like (unless it resulted in a copyright infringment, 
but I'm not sure this may happen because of the mere use of the same 
name for some "variables" elaborated by a similar script, yet different 
in source code -- given that copyright is evaluated at source code 
level, not per the resulting functionalities, as far as I know).

WBR, Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Partecipa al concorso Danone Activia e vinci MacBook Air e Nokia N96. Prova
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8552&d=11-1

attached mail follows:



On 11/1/09 16:52, Calogero Alex Baldacchino wrote:

> Well, that's a chance, of course, but that's *not* RDFa as specified by
> W3C; for instance, @property is specified as accepting _only_ CURIEs

Good point; I hadn't spotted that.

> It's the same with every possible existing custom (non-standard)
> attributes and elements out there, since there is no standard for them,
> and instead data-* has been created;

Emphatically, data-* has been created for private use data encoding 
(basically for scripting purposes) not as a replacement for the existing 
practices of adding new elements and attributes to HTML without going 
through W3C/WHATWG.

Existing custom attributes intended for use by scripts (e.g. "action" in 
Gmail and Yahoo! Mail), have a direct migration path open for them (i.e. 
to "data-action" or a HTML5-native feature). Proprietary attributes 
intended for use by user agents (e.g. "autocomplete"), on the other 
hand, must be adopted by HTML5 if they are not to be remain non-conforming.

> it's also the same for accesskey,
> actually, since it's not in current spec (whereas it was in HTML4).

I suspect the behavior for "accesskey" will ultimately be defined by the 
spec, whether or not it is made conforming.

> After all, support for unknown attributes/elements has never been a
> standard "de jure", but more of a quirk

Depends what you mean by "support" I guess.

> I really don't see the problem if a *custom* convention became widely
> accepted and reused by other people

Then you I think you don't agree with the fundamental design principle 
of the "data-*" attribute. The theory is that extensions to HTML benefit 
from going through a community process like WHATWG or W3C, and blessing 
extension points encourages people to circumvent that process, with the 
result that browsers have to support poorly designed features in order 
to have an interoperable web.

> I really can't get, right now, why it should be different, for instance,
> from the case of a freely reusable widget using a custom data model
> based on private data-* attributes inserted by people in thousands of
> websites (the widget with relitive metadata, I mean), then liked by
> other people and reused in different contexts (the same data model based
> on data-*, now)

Reuse of "data-*" by DHTML widgets would not impose any additional 
requirements on user agents, so it would be fine from the perspective 
elaborated above. It wouldn't change the language by the back door.

--
Benjamin Hawkes-Lewis

attached mail follows:



Benjamin Hawkes-Lewis wrote:
> On 11/1/09 16:52, Calogero Alex Baldacchino wrote:
> 
>> Well, that's a chance, of course, but that's *not* RDFa as specified by
>> W3C; for instance, @property is specified as accepting _only_ CURIEs
> 
> Good point; I hadn't spotted that.
> 
>> It's the same with every possible existing custom (non-standard)
>> attributes and elements out there, since there is no standard for them,
>> and instead data-* has been created;
> 
> Emphatically, data-* has been created for private use data encoding 
> (basically for scripting purposes) not as a replacement for the existing 
> practices of adding new elements and attributes to HTML without going 
> through W3C/WHATWG.

It should, perhaps set alarm bells ringing that almost every time data-* 
attributes come up, people suggest using them to publish data to the web 
at large rather than as internal scripting hooks. Since the restrictions 
on data-* are not machine checkable, even the majority of "standards 
aware" authors are unlikely to heed them. Therefore the net effect of 
the restriction will be to prevent conscientious standards bodies from 
using data-* attributes in their specifications. It is quite possible 
that popular technologies will arise from sources other than such 
standards organisations and so use of data-* for more than just private 
scripting may be inevitable.

It is also possible that features that start off as private scripting 
hooks will evolve into data publishing features. This again would lead 
to the natural breaking of the restriction of data-* attributes.

(I know I have said this before but I forget whether I posted it or just 
discussed it on IRC.)


attached mail follows:



James Graham wrote:
> It should, perhaps set alarm bells ringing that almost every time data-* 
> attributes come up, people suggest using them to publish data to the web 
> at large rather than as internal scripting hooks. Since the restrictions 
> on data-* are not machine checkable, even the majority of "standards 
> aware" authors are unlikely to heed them. Therefore the net effect of 
> the restriction will be to prevent conscientious standards bodies from 
> using data-* attributes in their specifications. It is quite possible 
> that popular technologies will arise from sources other than such 
> standards organisations and so use of data-* for more than just private 
> scripting may be inevitable.
> 
> It is also possible that features that start off as private scripting 
> hooks will evolve into data publishing features. This again would lead 
> to the natural breaking of the restriction of data-* attributes.
> 
> (I know I have said this before but I forget whether I posted it or just 
> discussed it on IRC.)

Agreed.

So what does this tell us about the point of view that distributed 
extensibility should not be supported by HTML5?

Best regards, Julian




attached mail follows:



Benjamin Hawkes-Lewis ha scritto:
>
>> After all, support for unknown attributes/elements has never been a
>> standard "de jure", but more of a quirk
>
> Depends what you mean by "support" I guess.
>

I just mean that, as far as I know, there is no official standard 
requiring UAs to support (parse and expose through the DOM) attributes 
and elements which are not part of the HTML language but are found in 
text/html documents. Usually, browsers support them for robustness sake 
and/or backward compatibility with existing pages, but they might do it 
with significant differences (actually it happens for unknown elements 
but not for unknown attributes, but one shouldn't assume such common 
behavior might not change in the future, or that will be adopted by 
newer vendors (even if that might be a quite safe assumption), thus any 
hack to the language /for custom purposes and script elaboration/ should 
be done by the mean of existing attributes/elements instead of creating 
new ones (I mean, "data-rdfa-about" might be a bit safer than just 
"about" to use a conservative approach based on the assumption "I know 
what happens today, not what will happen tomorrow") -- before data-* it 
was possible through the class attribute, now also data-* can be used 
for custom hacks)

>> I really don't see the problem if a *custom* convention became widely
>> accepted and reused by other people
>
> Then you I think you don't agree with the fundamental design principle 
> of the "data-*" attribute. The theory is that extensions to HTML 
> benefit from going through a community process like WHATWG or W3C, and 
> blessing extension points encourages people to circumvent that 
> process, with the result that browsers have to support poorly designed 
> features in order to have an interoperable web.
>

Yet it is *possible* to use data-* attributes to define a proper 
*private* convention by choosing names carefully in order to avoid 
clashes with other private conventions (for instance, a widget might 
need metadata to be put within the host page, and a careful choice of 
data-* names might avoid clashes with other metadata needed by other 
widgets or by the page itself). More people might find a certain 
convention useful and enough reusable for their purposes (because of 
non-clashing names), and the result would be a clearer "caw path" that 
community "cawboys" might follow to catch the free problem running away 
from standards.

The *only* difference with "data-rdfa-*" here would be that a higher 
number of authors/developers should agree with such a convention from 
the beginning, but only if they were interested in exchanging the same 
metadata with each others for their respective *custom* uses (through a 
custom script or plugin, either developed independently or shared). From 
this point of view, the only difference between "data-rdfa-about" and 
"about" - as used for the purposes of SearchMonkey - is that the former 
is immediately conforming to HTML5 spec and, thus, surely exposed 
through the DOM by every possible HTML5 compliant UA, as it happens for 
classes used by Microformats. I've never thought to any requirements for 
UAs not coming from a clearly traced "caw path", the same way there is 
no requirement for UAs not involved in SearchMonkey to support any kind 
of metadata - for the purposes of SearchMonkey itself.

Unless one thinks that everyone facing a problem not solved (at all or 
enough for his purposes) by an official standard should either create a 
private hack disregarding any possible hacks for similar problems he 
might have happened to find on the web, or start a new community process 
eventually without knowing if other people are facing the same problem, 
or a similar one, I really can't understand why a *custom* and 
*born-private* (eventually within a group of authors/developers) and 
then become a widely accepted convention should be a problem, as far as 
it is based on existing, standard features and doesn't require any 
additional support and results in a possible cawpath to be then 
standardized as needed. And I really don't understand why class="xyz" is 
a good hack whereas "data-some-thing" is not, assuming both are designed 
for and used by "caws opening a path" ( :-P )

>> I really can't get, right now, why it should be different, for instance,
>> from the case of a freely reusable widget using a custom data model
>> based on private data-* attributes inserted by people in thousands of
>> websites (the widget with relitive metadata, I mean), then liked by
>> other people and reused in different contexts (the same data model based
>> on data-*, now)
>
> Reuse of "data-*" by DHTML widgets would not impose any additional 
> requirements on user agents, so it would be fine from the perspective 
> elaborated above. It wouldn't change the language by the back door.

Really? Is it so much different from the case of the pattern attribute 
(which addresses, at the UA and language level, a problem earlier solved 
by scripts -- e.g. getting elements by their ids)? I don't think it's 
very different. From this perspective, if data-* attributes existed 
before the pattern attribute, someone might have used them to declare a 
regex then used by a script implementing a generic checking, and such 
might have been a good reason to add the pattern attribute to form 
inputs, requiring UAs to contrast the input value to its relative 
regular expression (a solution wich also works for UAs not supporting 
scripts, for instance).

I guess closing a language to every kind of "back-door changes" may be 
in contrast with the principle of paving a cawpath. I also guess that, 
if microformats experience (or the "realworld semantics" they claim to 
be based on) had suggested the need to add a new element/attribute to 
the language, a new element/attribute would have been added.

WBR, Alex

 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Partecipa al concorso Danone Activia e vinci MacBook Air e Nokia N96. Prova
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8550&d=12-1

attached mail follows:



On 12/1/09 20:26, Calogero Alex Baldacchino wrote:
> I just mean that, as far as I know, there is no official standard
> requiring UAs to support (parse and expose through the DOM) attributes
> and elements which are not part of the HTML language but are found in
> text/html documents.

Perhaps, but then prior to HTML5, much of what practical user agents 
must do with HTML has not been required by any official standard. ;)

RFC 2854 does say that "Due to the long and distributed development of 
HTML, current practice on the Internet includes a wide variety of HTML 
variants. Implementors of text/html interpreters must be prepared to be 
'bug-compatible' with popular browsers in order to work with many HTML 
documents available the Internet."

http://tools.ietf.org/html/rfc2854

HTML 4.01 does recommend that "[i]f a user agent encounters an element 
it does not recognize, it should try to render the element's content" 
and "[i]f a user agent encounters an attribute it does not recognize, it 
should ignore the entire attribute specification (i.e., the attribute 
and its value)".

http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.2

Clearly these suggestions are incompatible with respect to attributes; 
AFAIK all popular UAs insert unrecognized attributes into the DOM and 
plenty of web content depends on that behaviour.

>> Reuse of "data-*" by DHTML widgets would not impose any additional
>> requirements on user agents, so it would be fine from the perspective
>> elaborated above. It wouldn't change the language by the back door.
>
> Really? Is it so much different from the case of the pattern attribute
> (which addresses, at the UA and language level, a problem earlier solved
> by scripts -- e.g. getting elements by their ids)? I don't think it's
> very different. From this perspective, if data-* attributes existed
> before the pattern attribute, someone might have used them to declare a
> regex then used by a script implementing a generic checking, and such
> might have been a good reason to add the pattern attribute to form
> inputs, requiring UAs to contrast the input value to its relative
> regular expression (a solution wich also works for UAs not supporting
> scripts, for instance).

Just like proprietary elements/attributes introduced with user agent 
behaviours (marquee, autocomplete, canvas), scripted uses of "data-*" 
might suggest new features to be added to HTML, which would then become 
requirements for UAs.

But unlike proprietary elements/attributes introduced with user agent 
behaviors, scripted uses of "data-*" do not impose new processing 
requirements on UAs.

Therefore, unlike proprietary elements/attributes introduced with user 
agent behaviors, scripted uses of "data-*" impose _no_ design 
constraints on new features.

Establishing user agent behaviours with "data-*" attributes, on the 
other hand, imposes almost as many design constraints as establishing 
them with proprietary elements and attributes. (There's just less 
pollution of the primary HTML "namespace".)

If no RDFa was in deployment, you could argue it would be less wrong 
(from this perspective) to abuse "data-*" than introduce new attributes.

But to the extent that these attributes are already in use in text/html 
and standardized within the "http://www.w3.org/1999/xhtml" namespace, 
processing requirements are effectively already being imposed on user 
agents (such as not introducing conflicting treatment of the "about" 
attribute). All that adding user agent behaviours with "data-rdfa*" 
attributes would do at this point is add _more_ requirements, without 
rescuing the polluted attributes.

 > I also guess that,
> if microformats experience (or the "realworld semantics" they claim to
> be based on) had suggested the need to add a new element/attribute to
> the language, a new element/attribute would have been added.

I'm not really sure what you mean.

(It's watching the microformats community struggle with the problem of 
encoding machine data equivalents, for things like dates and telephone 
number types and measurements, that persuaded me HTML5 should include a 
generic machine data attribute, because it seems likely to me that the 
problem will be recurrent.)

--
Benjamin Hawkes-Lewis

attached mail follows:



Benjamin Hawkes-Lewis ha scritto:
> On 12/1/09 20:26, Calogero Alex Baldacchino wrote:
>> I just mean that, as far as I know, there is no official standard
>> requiring UAs to support (parse and expose through the DOM) attributes
>> and elements which are not part of the HTML language but are found in
>> text/html documents.
>
> Perhaps, but then prior to HTML5, much of what practical user agents 
> must do with HTML has not been required by any official standard. ;)
>
> RFC 2854 does say that "Due to the long and distributed development of 
> HTML, current practice on the Internet includes a wide variety of HTML 
> variants. Implementors of text/html interpreters must be prepared to 
> be 'bug-compatible' with popular browsers in order to work with many 
> HTML documents available the Internet."
>
> http://tools.ietf.org/html/rfc2854
>
> HTML 4.01 does recommend that "[i]f a user agent encounters an element 
> it does not recognize, it should try to render the element's content" 
> and "[i]f a user agent encounters an attribute it does not recognize, 
> it should ignore the entire attribute specification (i.e., the 
> attribute and its value)".
>
> http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.2
>
> Clearly these suggestions are incompatible with respect to attributes; 
> AFAIK all popular UAs insert unrecognized attributes into the DOM and 
> plenty of web content depends on that behaviour.
>

Very, very true. HTML 4.01 also says the recommended behaviours are ment 
"to facilitate experimentation and interoperability between 
implementations of various versions of HTML", whereas the "specification 
does not define how conforming user agents handle general error 
conditions, including how user agents behave when they encounter 
elements, attributes, attribute values, or entities not specified in 
this document", and since "user agents may vary in how they handle error 
conditions, authors and users must not rely on specific error recovery 
behavior". I just think the last sentence defines a best practice 
everyone should follow instead of relying on a common quirk supporting 
invalid markup. However, beside something being a good or bad practice, 
there will always be authors doing whatever they please, therefore it is 
quite safe to assume UAs will always expose invalid/unrecognized 
attributes (that's unavoidable, given the need for backward compatibility).

>
> Just like proprietary elements/attributes introduced with user agent 
> behaviours (marquee, autocomplete, canvas), scripted uses of "data-*" 
> might suggest new features to be added to HTML, which would then 
> become requirements for UAs.
>
> But unlike proprietary elements/attributes introduced with user agent 
> behaviors, scripted uses of "data-*" do not impose new processing 
> requirements on UAs.
>
> Therefore, unlike proprietary elements/attributes introduced with user 
> agent behaviors, scripted uses of "data-*" impose _no_ design 
> constraints on new features.
>
> Establishing user agent behaviours with "data-*" attributes, on the 
> other hand, imposes almost as many design constraints as establishing 
> them with proprietary elements and attributes. (There's just less 
> pollution of the primary HTML "namespace".)
>
> If no RDFa was in deployment, you could argue it would be less wrong 
> (from this perspective) to abuse "data-*" than introduce new attributes.

Oh, well, I don't want to argue about that. For me the idea to use 
"data-rdfa-*" can rest in peace, since in practice it's not different 
from using RDFa attributes as they are, at least as far as they're 
handled by scripts, either client- or server-side. However I think that,

* actually it seems not to be enough clear what UAs not involved in a 
particular project should do with RDFa attributes, beside exposing their 
content for the purpose of a script elaboration, whereas a precise 
behaviour should be defined, as well as an eventual class of UAs clearly 
identified as not required to support it, and eventual caveats on 
possible problems and relative solutions, before introducing any new 
elements/attributes in a formal specification;

* actual deployment might be harmed by the use of xml namespaces in html 
serialization.

Also, I see design suggestions more than impositions. If a new (and 
proprietary/private) attribute/element/convention is convincingly 
useful/needed, it is supported by other UAs and introduced in a 
specification, otherwise, if a not enough significant number of pages 
would be broken, it might even be redefined for use with a different 
semantics. And a possible process involving data-* attributes 
would/could be experiment privately => extend the scale involving other 
people finding it useful for their needs => get it in the primary 
namespace of an official specification (discarding the "data-" part and 
any other useless parts of the experimental name), so that existing 
pages may still work with their custom scripts or easily migrate to the 
new standard (and benefit of the new default support) by running a 
simple regex.

>
> But to the extent that these attributes are already in use in 
> text/html and standardized within the "http://www.w3.org/1999/xhtml" 
> namespace, processing requirements are effectively already being 
> imposed on user agents (such as not introducing conflicting treatment 
> of the "about" attribute). All that adding user agent behaviours with 
> "data-rdfa*" attributes would do at this point is add _more_ 
> requirements, without rescuing the polluted attributes.
>

For what concerns html serialization, introducing xml namespaces (and, 
thus, xml extensibility - as a whole or partly) might be worse than 
breaking current experimentaions. Since xhtml about all W3C production 
has converged towards XML, suggesting a direction the web didn't 
embraced completely, and instead causing objections with respect to xml 
features felt as useless or unwanted by a good number of people, herein 
namespaces and extensibility, hence the need to evolve html 
serialization to address new demands without forcing a migration towards 
xml. Therefore, introducing pieces of xml inside text/html documents may 
be problematic; of course, other surrogate mechanisms might be defined 
to indicate a namespace for the sole purposes of RDFa, but this would 
rise consitence issues between html and xhtml (as reported by Henri 
Sivonen), perhaps solvable by specifing a double mechanism as working 
for xhtml (the html specific one, and the "classic" xml one), but such a 
choice might add complexity to UAs and be confusing for authors.

For what concerns XHTML, I disagree with the introduction of RDFa 
attribute into the basic namespace, and I wouldn't encourage the same in 
HTML5 spec. In first place, I think there is a possible conflict with 
respect to the "content" attribute semantics, because it now requires a 
different processing when used as an RDFa attribute and as a <meta> 
attribute associated to an "http-equiv" or a "name" value (for instance).

In second place, it might be confusing for authors and lead to the 
misconception that every xhtml 1.x processor is also capable to process 
rdfa metadata (this is a limit of namespace + dtd/schema based 
modularization, because one can define the structure of a document, but 
not "orthogonal" behaviours requiring a specific support, not covered by 
the basic document model - such as collecting rdf triples declared by 
rdfa attributes, or calling a plugin and embedding its output - however, 
defining a proper namespace, maybe including its creation date somehow, 
may suggest what to expect from UAs).

In third place, creating a different namespace would have resulted in a 
far easier introduction of RDFa attributes into other xml languages 
without having to change the language to host them (by the way, the 
xhtml namespace and a related prefix can be used, but this require a 
more specific support due to the "content" attribute issue, especially 
by UAs not supporting DTDs or schemata - that is, what should happen if 
an element were declared with both xhtml:name or xhtml:http-equiv, 
xhtml:content and xhtml:datatype, in an xml document accepting any 
attributes from external namespaces? of course, this is solvable, but 
rdfa:content, rdfa:datatype and so on would make things easier, or at 
least _cleaner_ and less confusing for authors having to understand that 
an XML and RDF processor can/must support the xhtml namespace and its 
_whole_ semantics, not just dom-related structures, but limited to RDFa 
attributes, so that no <meta> or <object> or <link> can be used hoping 
their semantics is supported, despite the support for the xhtml 
namespace...). Also there might have been fewer attributes, each one 
with a different semantic (assuming someone might not find useful to 
have a link with rel="stylesheet" representing a triple, for instance).

Of course, this is my opinion.

> > I also guess that,
>> if microformats experience (or the "realworld semantics" they claim to
>> be based on) had suggested the need to add a new element/attribute to
>> the language, a new element/attribute would have been added.
>
> I'm not really sure what you mean.
>
> (It's watching the microformats community struggle with the problem of 
> encoding machine data equivalents, for things like dates and telephone 
> number types and measurements, that persuaded me HTML5 should include 
> a generic machine data attribute, because it seems likely to me that 
> the problem will be recurrent.)
>
> -- 
> Benjamin Hawkes-Lewis

If there were a general agreement, a new element/attribute would be 
introduced as a result of a "bottom up" process (starting from 
experimentations) integrated with a "top down" community evaluation - 
for specific purposes, not generic machine exposure, I mean.

(I'm not sure a generic machine data attribute - in general, not just 
referring to rdfa - would solve that, because each new occurrence of the 
problem might require a "brand new" datatype that only newer, updated 
UAs would understand (older ones would just parse the attribute and 
provide it as a string for further elaboration by a script, at most, but 
this might not be much better than using a data-* attribute for private 
script consumption), therefore, that wouldn't be necessarily different 
than creating a new appropriate attribute/element as needed and 
providing such new feature in newer, compliant UAs).


WBR, Alex

 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Blu American Express: gratuita a vita! 
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8613&d=4-2

attached mail follows:



On 4/2/09 03:15, Calogero Alex Baldacchino wrote:
> For what concerns XHTML, I disagree with the introduction of RDFa
> attribute into the basic namespace, and I wouldn't encourage the same in
> HTML5 spec. In first place, I think there is a possible conflict with
> respect to the "content" attribute semantics, because it now requires a
> different processing when used as an RDFa attribute and as a <meta>
> attribute associated to an "http-equiv" or a "name" value (for instance).

What conflict?

1. Attributes in XHTML can be distinguished by the elements they apply 
to as well as their name (e.g. the "name" attribute).

2. In XHTML+RDFa, "content" actually means the same thing on "meta" as 
on any other element in XHTML, which is presumably why they reused that 
attribute rather than introducing a new (better-named?) one:

http://www.w3.org/TR/rdfa-syntax/#rdfa-attributes

> In second place, it might be confusing for authors and lead to the
> misconception that every xhtml 1.x processor is also capable to process
> rdfa metadata (this is a limit of namespace + dtd/schema based
> modularization, because one can define the structure of a document, but
> not "orthogonal" behaviours requiring a specific support, not covered by
> the basic document model - such as collecting rdf triples declared by
> rdfa attributes, or calling a plugin and embedding its output - however,
> defining a proper namespace, maybe including its creation date somehow,
> may suggest what to expect from UAs).

There's no way to query a user agent about support for the 
specifications associated with a particular namespace, and namespaces 
are an unreliable guide to what user agents actually support, so I don't 
buy this concern.

Existing XHTML 1.x user agents don't always implement all the features 
of XHTML 1.x (e.g. exposing "longdesc" and "cite" to the user). HTML5 is 
introducing new elements and attributes into the same namespace, and 
authors would be wrong to assume that any XHTML-supporting browser will 
know what to do with them beyond inserting them into the DOM. XHTML 
modularization means you can't count on an XHTML user agent to implement 
any particular feature in the XHTML namespace.

A more reliable guide to what user agents support is looking at the list 
of supported features (as opposed to namespaces or modules or any other 
proxy) in their documentation.

> In third place, creating a different namespace would have resulted in a
> far easier introduction of RDFa attributes into other xml languages
> without having to change the language to host them (by the way, the
> xhtml namespace and a related prefix can be used, but this require a
> more specific support due to the "content" attribute issue, especially
> by UAs not supporting DTDs or schemata - that is, what should happen if
> an element were declared with both xhtml:name or xhtml:http-equiv,
> xhtml:content and xhtml:datatype, in an xml document accepting any
> attributes from external namespaces?

I cannot understand how RDFa attributes in a different namespace would 
be easier to reuse either in another language or a XML document where 
the host is not XHTML.

"content" and "datatype" mean the same on all elements, so your 
particular example seems like a non-problem to me - at least from the 
perspective of RDFa, which doesn't define processing for "name" or 
"http-equiv".

In so far as there is a problem, it's already a problem with 
bog-standard XHTML. How should <myml:bar xhtml:name="foo" 
xhtml:http-equiv="baz" xhtml:content="quux"> be processed?

> of course, this is solvable, but
> rdfa:content, rdfa:datatype and so on would make things easier, or at
> least _cleaner_ and less confusing for authors having to understand that
> an XML and RDF processor can/must support the xhtml namespace and its
> _whole_ semantics, not just dom-related structures, but limited to RDFa
> attributes, so that no <meta> or <object> or <link> can be used hoping
> their semantics is supported, despite the support for the xhtml
> namespace...).

An "XML and RDF processor" doesn't have to support XHTML or RDFA - XML 
and RDF are independent specifications.

A conforming XHTML+RDFa UA "user agent MUST support all of the features 
required in this specification. A conforming user agent must also 
support the User Agent conformance requirements as defined in XHTML 
Modularization [XHTMLMOD] section on "XHTML Family User Agent Conformance".

http://www.w3.org/TR/rdfa-syntax/#uaconf

Those further requirements can be read at:

http://www.w3.org/TR/xhtml-modularization/conformance.html#s_conform_user_agent

An XHTML+RDFa conforming user agent does not have to implement "meta", 
"object", or "link", and as a explained above, authors cannot assume 
support for particular features based on namespaces.

> Also there might have been fewer attributes, each one
> with a different semantic (assuming someone might not find useful to
> have a link with rel="stylesheet" representing a triple, for instance).

I don't follow. link with rel="stylesheet" _does_ represent information 
expressible as a triple, why would it be useful to pretend otherwise? 
And how would doing so make for fewer attributes?

> If there were a general agreement, a new element/attribute would be
> introduced as a result of a "bottom up" process (starting from
> experimentations) integrated with a "top down" community evaluation -
> for specific purposes, not generic machine exposure, I mean.

There is no general agreement to that AFAICT, and I doubt think using 
unstandardized elements or attributes or using data-* for public use 
would be good approaches to extending HTML: the former blocks potential 
extension points (e.g. "canvas") and the later pointlessly introduces 
the risk that a private use might be confused with a public one.

> (I'm not sure a generic machine data attribute - in general, not just
> referring to rdfa - would solve that, because each new occurrence of the
> problem might require a "brand new" datatype that only newer, updated
> UAs would understand (older ones would just parse the attribute and
> provide it as a string for further elaboration by a script, at most, but
> this might not be much better than using a data-* attribute for private
> script consumption), therefore, that wouldn't be necessarily different
> than creating a new appropriate attribute/element as needed and
> providing such new feature in newer, compliant UAs).

It would be very different in practice, because (like new "class" 
names), new "content" values wouldn't need to go through the W3C/WHATWG 
standards process.

That has a cost of course. You might end up with a worse design, 
especially if you don't go through a community like microformats. But 
that cost arguably isn't so bad when you're talking about embedding 
arbitrary data rather than features like "canvas" or "datagrid" that 
require new parsing, DOM APIs, and user interface from popular user 
agents. This cost appears to be acceptable in the case of microformat 
"class" names, for example. Now, you could already embed data with a bad 
design using HTML5's other extension mechanisms (e.g. "script"). It's 
just that microformats choose to abuse other attributes ("title") 
instead, partly because they allow you to wrap some human-readable 
content with its machine-readable equivalent (i.e. it's a more 
"markup-like" way of doing things). My feeling is that the cost of bad 
designs for embedded data is (1) unavoidable and (2) less than the 
benefits of avoiding misuse of other (X)HTML features for embedding data.

--
Benjamin Hawkes-Lewis

attached mail follows:



On Jan 11, 2009, at 18:52, Calogero Alex Baldacchino wrote:

> However, actually it's the same for RDFa attributes, because they're  
> not in the spec. From this point of view, introducing six new  
> attributes, or resorting to an older one is not very different, thus  
> (again) why RDFa and not eRDF?


eRDF is very different in not relying on attributes whose qname  
contains the substring "xmlns".

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



On Fri, 09 Jan 2009 12:54:08 +1100, Calogero Alex Baldacchino  
<alex.baldacchino@email.it> wrote:

> I admit I'm not very expert in RDF use, thus I have a few questions.  
> Specifically, maybe I can guess the advantages when using the same  
> (carefully modelled, and well-known) vocabulary/ies; but when two  
> organizations develop their own vocabularies, similar yet different, to  
> model the same kind of informations, is merging of data enough? Can a  
> processor give more than a collection of triples, to be then interpreted  
> basing on knowledge on the used vocabulary/ies?

RDF consists of several parts. One of the key parts explains how to make  
an RDF vocabulary self-describing in terms of other vocabularies.

>  I mean, I assume my tools can extract RDF(a) data from whatever  
> document, but my query interface is based on my own vocabulary: when I  
> merge informations from an external vocabulary, do I need to translate  
> one vocabulary to the other (or at least to modify the query backend, so  
> that certain curies are recognized as representing the same concepts -  
> e.g. to tell my software that 'foaf:name' and 'ex:someone' are  
> equivalent, for my purposes)? If so, merging data might be the minor  
> part of the work I need to do, with respect to non-RDF(a) metadata (that  
> is, I'd have tools to extract and merge data anyway, and once I  
> translated external metadata to my format, I could use my own tools to  
> merge data), specially if the same model is used both by mine and an  
> external organization (therefore requiring an easier translation).

If a vocabulary is described, then you can do an automated translation  
 from one RDF vocabulary to another by using your original query based in  
your original vocabulary. This is one of the strengths of RDF.

>  Thus, I'm thinking the most valuable benefit of using RDF/RDFa is the  
> sureness that both parties are using the very same data model, despite  
> the possible use of different vocabularies -- it seems to me that the  
> concept of triples consisting of a subject, a predicate and an object is  
> somehow similar to a many-to-many association in a database, whereas one  
> might prefer a one-to-many approach - though, the former might be a  
> natural choice to model data which are usually sparse, as in a document  
> prose.

I don't see the ananlogy, but yes, I think the big benefit is being able  
to ensure that you know the data model without knowing the vocabulary a  
priori - since this is sufficient to automate the process of merging data  
into your model.

cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

attached mail follows:



Charles McCathieNevile ha scritto:
> On Fri, 09 Jan 2009 12:54:08 +1100, Calogero Alex Baldacchino 
> <alex.baldacchino@email.it> wrote:
>
>> I admit I'm not very expert in RDF use, thus I have a few questions. 
>> Specifically, maybe I can guess the advantages when using the same 
>> (carefully modelled, and well-known) vocabulary/ies; but when two 
>> organizations develop their own vocabularies, similar yet different, 
>> to model the same kind of informations, is merging of data enough? 
>> Can a processor give more than a collection of triples, to be then 
>> interpreted basing on knowledge on the used vocabulary/ies?
>
> RDF consists of several parts. One of the key parts explains how to 
> make an RDF vocabulary self-describing in terms of other vocabularies.
>
>>  I mean, I assume my tools can extract RDF(a) data from whatever 
>> document, but my query interface is based on my own vocabulary: when 
>> I merge informations from an external vocabulary, do I need to 
>> translate one vocabulary to the other (or at least to modify the 
>> query backend, so that certain curies are recognized as representing 
>> the same concepts - e.g. to tell my software that 'foaf:name' and 
>> 'ex:someone' are equivalent, for my purposes)? If so, merging data 
>> might be the minor part of the work I need to do, with respect to 
>> non-RDF(a) metadata (that is, I'd have tools to extract and merge 
>> data anyway, and once I translated external metadata to my format, I 
>> could use my own tools to merge data), specially if the same model is 
>> used both by mine and an external organization (therefore requiring 
>> an easier translation).
>
> If a vocabulary is described, then you can do an automated translation 
> from one RDF vocabulary to another by using your original query based 
> in your original vocabulary. This is one of the strengths of RDF.
>

Certainly, this is a strong benefit. However, when comparing different 
vocabularies in depth to their basic description (if any), I guess there 
may be a chance to find vocabularies which are not described in terms of 
each other, or of a third common vocabulary, thus a translation might be 
needed anyway. This might be true for small-time users developing a 
vocabulary for internal use before starting an external partnership, or 
regardless of the partnership (sometimes, small-time users may find it 
easier/faster to "reinvent the wheel" and modify it to address evolving 
problems; potentially someone might be unable to afford an extensive 
investigation to find an existing vocabulary fulfilling his requirments, 
or to develope a new one in conjunction with a partner having similar 
but slightly different needs, and thus potentially leading to a longer 
process to mediate respective needs. In such a case, I wouldn't expect 
that such a person will look for existing, more generic vocabularies 
which can describe the new one in order to ensure the widest possible 
interchange of data - that is, until a requirement for interchange 
arises, designing a vocabulary for that might be an overengineered task, 
and once the requirement was met, addressing it with a translation or 
with a description in term of a vocabulary known to be involved (each 
time the problem recurres) might be easier/faster than engineering a 
good description once and for all).

Anyway, let's assume we're going to deal with well-described 
vocabularies. Is the automated translation a task of a parser/processor 
creating a graph of triples, or a task of a query backend? And what are 
the requirements for a UA, from this perspective? Must it just parse the 
triples and create a graph or also take care of a vocabulary 
description? Must it be a complete query backend? Must it also provide a 
query interface? How much basic or advanced must the interface be? I 
think we should answer questions like this, and try and figure out 
possible problems arising with each answer and possible related 
solutions, because the concern here should be what UAs must do with RDF 
embedded in a non-RDF (and non-XML) document.

>>  Thus, I'm thinking the most valuable benefit of using RDF/RDFa is 
>> the sureness that both parties are using the very same data model, 
>> despite the possible use of different vocabularies -- it seems to me 
>> that the concept of triples consisting of a subject, a predicate and 
>> an object is somehow similar to a many-to-many association in a 
>> database, whereas one might prefer a one-to-many approach - though, 
>> the former might be a natural choice to model data which are usually 
>> sparse, as in a document prose.
>
> I don't see the ananlogy, but yes, I think the big benefit is being 
> able to ensure that you know the data model without knowing the 
> vocabulary a priori - since this is sufficient to automate the process 
> of merging data into your model.
>

I understand the benefit with respect to well-known and/or 
well-described vocabularies, but I wonder if an average small-time user 
would produce a well-described or a very-custom vocabulary. In the 
latter case, a good knowledge of a foreing vocabulary should be needed 
before querying it and I guess the translation can't be automated, but 
requires an understanding level which might be close to the one needed 
to translate from a (more or less) different model. In this case, the 
benefit of an automated merging of data from similar models might be 
lost in front of a non-automated translation which might be as difficult 
as translating from different models (but with a sufficient verbal 
documentation - that is with a natural language description, which 
should be easier to produce than a code-level description), given that 
translated data should be easy to merge.

I'm pushing this concept because I think it should be clear what 
scenario is more likely to happen, to avoid to introduce features 
perfectly designed for the same people who can develop a "perfect" 
vocabulary with a "perfect" generic description, and I suppose to be the 
same who can afford to develop a generic toolkit on their own, or to 
adjust an existing one (thus, they might be pleased with a basic support 
and a basic API), but not for most small-time users, who might develop a 
custom vocabulary the same way they develop a custom model, thus needing 
more custom tools (again, a basic support and a basic API might satisfay 
their needs, more than a complete backend working fine with 
well-described vocabularies but not with completely unknown ones, thus 
requiring a custom developement anyway).

Assuming this is true, there should be an evidence that the same people 
who'd produce a "bad" vocabulary do not prefer a completely custom 
model, because, if they were the great majority, we would risk to invest 
resources (on the UAs side, if we made of it a general requirement) to 
help people who may be pleased with the help, but not really need it 
(because they're not small-time users maybe, and can do it on their own 
without too much effort -- this doesn't mean that their requirements are 
less significant and worth to be taken into account, but in general UA 
developers might not be very happy to invest their resources to 
implement something which is or appear overengineered with respect to 
the real needs "in the wild", thus we should carefully establish how 
strong is the need to support RDFa and accurately define support 
requirements for UAs).

> cheers
>
> Chaals
>

WBR, Alex

 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Blu American Express: gratuita a vita! 
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8613&d=4-2

attached mail follows:



The use cases for RDFa are pretty much the same as those for  
Microformats.

For example, if a person's name and contact details are marked up on  
a web page using hCard, the user-agent can offer to, say, add the  
person to your address book, or add them as a friend on a social  
networking site, or add a reminder about that person's birthday to  
your calendar.

If an event is marked up on a web page using hCalendar, then the user- 
agent could offer to add it to a calendar, or provide the user with a  
map of its location, or add it to a timeline that the user is  
building for their school history project.

Providing rich semantics for the information on a web page allows the  
user-agent to know what's on a page, and step in and perform helpful  
tasks for the user.

So why RDFa and not Microformats?

Firstly, RDFa provides a single unified parsing algorithm that  
Microformats do not. Separate parsers need to be created for  
hCalendar, hReview, hCard, etc, as each Microformat has its own  
unique parsing quirks. For example, hCard has N-optimisation and ORG- 
optimisation which aren't found in hCalendar. With RDFa, a single  
algorithm is used to parse everything: contacts, events, places,  
cars, songs, whatever.

Secondly, as the result of having one single parsing algorithm,  
decentralised development is possible. If I want a way of marking up  
my iguana collection semantically, I can develop that vocabulary  
without having to go through a central authority. Because URIs are  
used to identify vocabulary terms, I can be sure that my vocabulary  
won't clash with other people's vocabularies. It can be argued that  
going through a community to develop vocabularies is beneficial, as  
it allows the vocabulary to be built by "many minds" - RDFa does not  
prevent this, it just gives people alternatives to community  
development.

Lastly, there are a lot of parsing ambiguities for many Microformats.  
One area which is especially fraught is that of scoping. The editors  
of many current draft Microformats[1] would like to allow page  
authors to embed licensing data - e.g. to say that a particular  
recipe for a pie is licensed under a Creative Commons licence.  
However, it has been noted that the current rel=license Microformat  
can not be re-used within these drafts, because virtually all  
existing rel=license implementations will just assume that the  
license applies to the whole page rather than just part of it. RDFa  
has strong and unambiguous rules for scoping - a license, for  
example, could apply to a section of the page, or one particular image.

RDFa was largely borne of looking at Microformats, looking at what  
was successful about them, considering problems with them, and  
finding ways to resolve those problems.

____
1. It has been discussed in hAudio, figure, hRecipe and others.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


attached mail follows:



On 2009-01-01 15:24, Toby A Inkster wrote:
> The use cases for RDFa are pretty much the same as those for Microformats.

Right, but microformats can be used without any changes to the HTML 
language, whereas RDFa requires such changes.  If they fulfill the same 
use cases, then there's not much point in adding RDFa.

> For example, if a person's name and contact details are marked up on a
> web page using hCard, the user-agent can offer to, say, add the person
> to your address book, or add them as a friend on a social networking
> site, or add a reminder about that person's birthday to your calendar.
>
> If an event is marked up on a web page using hCalendar, then the
> user-agent could offer to add it to a calendar, or provide the user with
> a map of its location, or add it to a timeline that the user is building
> for their school history project.
>
> Providing rich semantics for the information on a web page allows the
> user-agent to know what's on a page, and step in and perform helpful
> tasks for the user.
>
> So why RDFa and not Microformats?
>
> Firstly, RDFa provides a single unified parsing algorithm that
> Microformats do not. Separate parsers need to be created for hCalendar,
> hReview, hCard, etc, as each Microformat has its own unique parsing
> quirks. For example, hCard has N-optimisation and ORG-optimisation which
> aren't found in hCalendar. With RDFa, a single algorithm is used to
> parse everything: contacts, events, places, cars, songs, whatever.

This is not necessarily beneficial.  If you have separate parsing 
algorithms, you can code in shortcuts for common use-cases and thus 
optimise the authoring experience.  Also, as has been pointed out before 
in the distributed extensibility debate, parsing is a very small part of 
doing useful things with content.

> Secondly, as the result of having one single parsing algorithm,
> decentralised development is possible. If I want a way of marking up my
> iguana collection semantically, I can develop that vocabulary without
> having to go through a central authority.

You can develop vocabularies without going through a central authority 
already, via class or id, and many people already do.

> Because URIs are used to
> identify vocabulary terms, I can be sure that my vocabulary won't clash
> with other people's vocabularies.

Again, you can do this with class, by putting your domain name in the 
class attribute.  It also depends on how much of an issue you think 
clashes will be with an iguana collection-- I would suggest that due to 
the specialised nature of the markup, clashes would be quite unlikely.

> It can be argued that going through a
> community to develop vocabularies is beneficial, as it allows the
> vocabulary to be built by "many minds" - RDFa does not prevent this, it
> just gives people alternatives to community development.

RDFa does not give anything over what the class attribute does in terms 
of community vs individual development, so this doesn't really speak in 
RDFa's favour.

> Lastly, there are a lot of parsing ambiguities for many Microformats.
> One area which is especially fraught is that of scoping. The editors of
> many current draft Microformats[1] would like to allow page authors to
> embed licensing data - e.g. to say that a particular recipe for a pie is
> licensed under a Creative Commons licence. However, it has been noted
> that the current rel=license Microformat can not be re-used within these
> drafts, because virtually all existing rel=license implementations will
> just assume that the license applies to the whole page rather than just
> part of it. RDFa has strong and unambiguous rules for scoping - a
> license, for example, could apply to a section of the page, or one
> particular image.

Are there other cases where this granularity of scoping would be 
genuinely helpful?  If not, it would seem better to work out a solution 
for scoping licence information instead of bringing in a whole new 
vocabulary to solve it.

What would you do with scoped copyright information, anyway?  I can see 
images being an issue, but ideally information about a resource should 
be kept in that resource, and as such the licence should be embedded in 
the image rather than given by a Web page.  In the case of particular 
sections having particular licences, is there any practical use of 
marking up different sections with different licences over just doing 
that with text?

> RDFa was largely borne of looking at Microformats, looking at what was
> successful about them, considering problems with them, and finding ways
> to resolve those problems.

Andi

attached mail follows:



On Fri, 02 Jan 2009 05:43:05 +1100, Andi Sidwell <andi@takkaria.org> wrote:

> On 2009-01-01 15:24, Toby A Inkster wrote:
>> The use cases for RDFa are pretty much the same as those for  
>> Microformats.
>
> Right, but microformats can be used without any changes to the HTML  
> language, whereas RDFa requires such changes.  If they fulfill the same  
> use cases, then there's not much point in adding RDFa.
...
>> So why RDFa and not Microformats?

(I think the question should be why RDFa is needed *as well as* µformats)

>> Firstly, RDFa provides a single unified parsing algorithm that
>> Microformats do not. ...

> This is not necessarily beneficial.  If you have separate parsing  
> algorithms, you can code in shortcuts for common use-cases and thus  
> optimise the authoring experience.

On the other hand, you cannot parse information until you know how it is  
encoded, and information encoded in RDFa can be parsed without knowing  
more.

And not only can you optimise your parsing for a given algorithm, you can  
also do for a known vocabulary - or you can optimise the post-parsing  
treatment.

>  Also, as has been pointed out before in the distributed extensibility  
> debate, parsing is a very small part of doing useful things with content.

Yes. However many of the use cases that I think justify the inclusion of  
RDFa are already very small on their own, and valuable when several  
vocabularies are combined. So being able to do off-the-shelf parsing is  
valuable, compared to working out how to parse a combination of formats  
together.

>> Secondly, as the result of having one single parsing algorithm,
>> decentralised development is possible. If I want a way of marking up my
>> iguana collection semantically, I can develop that vocabulary without
>> having to go through a central authority.
>
> You can develop vocabularies without going through a central authority  
> already, via class or id, and many people already do.
>
>> Because URIs are used to
>> identify vocabulary terms, I can be sure that my vocabulary won't clash
>> with other people's vocabularies.
>
> Again, you can do this with class, by putting your domain name in the  
> class attribute.  It also depends on how much of an issue you think  
> clashes will be with an iguana collection-- I would suggest that due to  
> the specialised nature of the markup, clashes would be quite unlikely.

It depends how many people work on iguana collections - or Old Norse and  
Anglo Saxon text, which was the use case that got me involved in the Web  
in the very early 90s. It turns out that people don't, in the µformats  
world, use unambiguous names, especially when they are privately  
developing their own information. By contrast, those who come from an RDF  
world do this by habit.

>> It can be argued that going through a
>> community to develop vocabularies is beneficial, as it allows the
>> vocabulary to be built by "many minds" - RDFa does not prevent this, it
>> just gives people alternatives to community development.
>
> RDFa does not give anything over what the class attribute does in terms  
> of community vs individual development, so this doesn't really speak in  
> RDFa's favour.

In principle no, but in real world usage the class attribute is considered  
something that is primarily local, whereas RDFa is generally used by  
people who have a broader outlook on the desirable permanence and  
re-usability of their data.

>> Lastly, there are a lot of parsing ambiguities for many Microformats.
>> One area which is especially fraught is that of scoping. The editors of
>> many current draft Microformats[1] would like to allow page authors to
>> embed licensing data - e.g. to say that a particular recipe for a pie is
>> licensed under a Creative Commons licence. However, it has been noted
>> that the current rel=license Microformat can not be re-used within these
>> drafts, because virtually all existing rel=license implementations will
>> just assume that the license applies to the whole page rather than just
>> part of it. RDFa has strong and unambiguous rules for scoping - a
>> license, for example, could apply to a section of the page, or one
>> particular image.
>
> Are there other cases where this granularity of scoping would be  
> genuinely helpful?  If not, it would seem better to work out a solution  
> for scoping licence information...

Yes.

Being able to describe accessibility of various parts of content, or point  
to potential replacement content for particular use cases, benefits  
enormously from such scoping (this is why people who do industrial-scale  
accessibility often use RDF as their infrastructure). ARIA has already  
taken the approach of looking for a special-purpose way to do this, which  
significantly bloats HTML but at least allows important users to satisfy  
their needs to be able t produce content with certain information included.

Government and large enterprises produce content that needs to be  
maintained, and being able to include production, cataloguing, and similar  
metadata directly, scoped to the document, would be helpful. As a trivial  
example, it would be useful to me in working to improve the Web content we  
produce at Opera to have a nice mechanism for identifying the original  
source of various parts of a page.

> What would you do with scoped copyright information, anyway?  I can see  
> images being an issue, but ideally information about a resource should  
> be kept in that resource, and as such the licence should be embedded in  
> the image rather than given by a Web page.  In the case of particular  
> sections having particular licences, is there any practical use of  
> marking up different sections with different licences over just doing  
> that with text?

Mash-ups. If they have a use-case, and I think it is widely accepted that  
they do, then it would seem obvious that being able to identify the source  
of each part, and any conditions that vary between different sources, is a  
use case.

cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

attached mail follows:



On Fri, Jan 2, 2009 at 12:12 AM, Charles McCathieNevile
<chaals@opera.com> wrote:
> On Fri, 02 Jan 2009 05:43:05 +1100, Andi Sidwell <andi@takkaria.org> wrote:
>
>> On 2009-01-01 15:24, Toby A Inkster wrote:
>>>
>>> The use cases for RDFa are pretty much the same as those for
>>> Microformats.
>>
>> Right, but microformats can be used without any changes to the HTML
>> language, whereas RDFa requires such changes.  If they fulfill the same use
>> cases, then there's not much point in adding RDFa.
>
> ...

Why the non-response?  This is precisely the point of contention.
Things aren't added to the spec on a whim.  Things get added when it
is demonstrated that authors will significantly benefit from the
inclusion of the feature in the language.  Microformats (used as an
example only) use only features already in the language, and thus do
not need any spec support.  If they already solve the problem
adequately, then there is no need to go further.

>>> So why RDFa and not Microformats?
>
> (I think the question should be why RDFa is needed *as well as* formats)

This is correct.  Microformats exist already.  They solve current
problems.  Are there further problems that Microformats don't address
which can be solved well by RDFa?  Are these problems significant
enough to authors to be worth addressing in the spec, or can we wait
and let the community work out its own solutions further before we
make a move?  We generally want to wait until a given item is truly
established before speccing it, so that we can work with existing
use-cases and solve known problems.  To do otherwise risks us
inventing use-cases that don't commonly exist in reality, solving
non-problems while leaving gaping holes that will cause authors
problems down the line.

For an example (used several times, but that's because it's a really
good example), consider <video>.  Flash-based video players are
already extremely common.  We know how people use them, we know what
authors generally expect from them, and we know what problems exist
with how they are currently implemented and used.  We also feel that
extending the language would allow us to solve these problems, and
help authors significantly.  Thus, <video>.

Microformats are the metadata equivalent of Flash-based video players.
 They are hacks used to allow authors to accomplish something not
explicitly accounted for in the language.  Are there significant
problems with this approach?  Is metadata embedding used widely enough
to justify extending the language for it, or are the current hacks
(Microformats, in this case) enough?  Are current metadata embedding
practices mature enough that we can be relatively sure we're solving
actual problems with our extension?  These are all questions that must
be asked of any extention to the language.

>>> Firstly, RDFa provides a single unified parsing algorithm that
>>> Microformats do not. ...
>
>> This is not necessarily beneficial.  If you have separate parsing
>> algorithms, you can code in shortcuts for common use-cases and thus optimise
>> the authoring experience.
>
> On the other hand, you cannot parse information until you know how it is
> encoded, and information encoded in RDFa can be parsed without knowing more.
>
> And not only can you optimise your parsing for a given algorithm, you can
> also do for a known vocabulary - or you can optimise the post-parsing
> treatment.

What is the benefit to authors of having an easily machine-parsed
format?  (Note: this is completely separate from the question of the
benefits of metadata at all.)  Are they greater than the benefits of a
format that is harder to parse, but easier for authors to write?

>
>>  Also, as has been pointed out before in the distributed extensibility
>> debate, parsing is a very small part of doing useful things with content.
>
> Yes. However many of the use cases that I think justify the inclusion of
> RDFa are already very small on their own, and valuable when several
> vocabularies are combined. So being able to do off-the-shelf parsing is
> valuable, compared to working out how to parse a combination of formats
> together.

Can you provide these use-cases?  The discussion has an astonishing
dearth of use-cases by which we can evaluate the effectiveness of
proposals.

>>> Secondly, as the result of having one single parsing algorithm,
>>> decentralised development is possible. If I want a way of marking up my
>>> iguana collection semantically, I can develop that vocabulary without
>>> having to go through a central authority.
>>
>> You can develop vocabularies without going through a central authority
>> already, via class or id, and many people already do.
>>
>>> Because URIs are used to
>>> identify vocabulary terms, I can be sure that my vocabulary won't clash
>>> with other people's vocabularies.
>>
>> Again, you can do this with class, by putting your domain name in the
>> class attribute.  It also depends on how much of an issue you think clashes
>> will be with an iguana collection-- I would suggest that due to the
>> specialised nature of the markup, clashes would be quite unlikely.
>
> It depends how many people work on iguana collections - or Old Norse and
> Anglo Saxon text, which was the use case that got me involved in the Web in
> the very early 90s. It turns out that people don't, in the formats world,
> use unambiguous names, especially when they are privately developing their
> own information. By contrast, those who come from an RDF world do this by
> habit.

Is this a problem that needs to be solved in the spec, or is it one
that can be solved socially?  More importantly, is it a problem that
needs to be solved at all?  Is there any indication that use of
ambiguous names produces significant problems for authors?

>>> It can be argued that going through a
>>> community to develop vocabularies is beneficial, as it allows the
>>> vocabulary to be built by "many minds" - RDFa does not prevent this, it
>>> just gives people alternatives to community development.
>>
>> RDFa does not give anything over what the class attribute does in terms of
>> community vs individual development, so this doesn't really speak in RDFa's
>> favour.
>
> In principle no, but in real world usage the class attribute is considered
> something that is primarily local, whereas RDFa is generally used by people
> who have a broader outlook on the desirable permanence and re-usability of
> their data.

Can we extract a requirement from this, then?

>>> Lastly, there are a lot of parsing ambiguities for many Microformats.
>>> One area which is especially fraught is that of scoping. The editors of
>>> many current draft Microformats[1] would like to allow page authors to
>>> embed licensing data - e.g. to say that a particular recipe for a pie is
>>> licensed under a Creative Commons licence. However, it has been noted
>>> that the current rel=license Microformat can not be re-used within these
>>> drafts, because virtually all existing rel=license implementations will
>>> just assume that the license applies to the whole page rather than just
>>> part of it. RDFa has strong and unambiguous rules for scoping - a
>>> license, for example, could apply to a section of the page, or one
>>> particular image.
>>
>> Are there other cases where this granularity of scoping would be genuinely
>> helpful?  If not, it would seem better to work out a solution for scoping
>> licence information...
>
> Yes.
>
> Being able to describe accessibility of various parts of content, or point
> to potential replacement content for particular use cases, benefits
> enormously from such scoping (this is why people who do industrial-scale
> accessibility often use RDF as their infrastructure). ARIA has already taken
> the approach of looking for a special-purpose way to do this, which
> significantly bloats HTML but at least allows important users to satisfy
> their needs to be able t produce content with certain information included.
>
> Government and large enterprises produce content that needs to be
> maintained, and being able to include production, cataloguing, and similar
> metadata directly, scoped to the document, would be helpful. As a trivial
> example, it would be useful to me in working to improve the Web content we
> produce at Opera to have a nice mechanism for identifying the original
> source of various parts of a page.

Can we distill this into use-cases, then?  You, as an author, want to
be able to specify the original source of a piece of content.  What's
the practical use of this?  Does it require an embedded,
machine-readable vocabulary to function?  Are existing solutions
adequate (frex, footnotes)?

>> What would you do with scoped copyright information, anyway?  I can see
>> images being an issue, but ideally information about a resource should be
>> kept in that resource, and as such the licence should be embedded in the
>> image rather than given by a Web page.  In the case of particular sections
>> having particular licences, is there any practical use of marking up
>> different sections with different licences over just doing that with text?
>
> Mash-ups. If they have a use-case, and I think it is widely accepted that
> they do, then it would seem obvious that being able to identify the source
> of each part, and any conditions that vary between different sources, is a
> use case.

Not quite.  Specifically, is there any practical use for marking up
various sections of a site with licensing information specific to that
section *in an embedded, machine-readable manner*?  Are the existing
solutions adequate (frex, simply putting a separate copyright notice
on each section, or noting the various copyrights on a licensing
page)?

(Note: I responded to your email rather than the OP because it
presented better points to respond to.)

~TJ

attached mail follows:



On Fri, Jan 2, 2009 at 12:02 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Tab Atkins Jr. wrote:
>>>>
>>>> Right, but microformats can be used without any changes to the HTML
>>>> language, whereas RDFa requires such changes.  If they fulfill the same
>>>> use
>>>> cases, then there's not much point in adding RDFa.
>>>
>>> ...
>>
>> Why the non-response?  This is precisely the point of contention.
>> Things aren't added to the spec on a whim.  Things get added when it
>> is demonstrated that authors will significantly benefit from the
>> inclusion of the feature in the language.  Microformats (used as an
>> example only) use only features already in the language, and thus do
>> not need any spec support.  If they already solve the problem
>> adequately, then there is no need to go further.
>> ...
>
> I think the supporters of RDFa (me included) claim that Microformats only
> address a subset of the problem solved by RDFa.

The next step, then, is to list these problems, establish that they
truly aren't solved by existing solutions (not just Microformats),
establish that solving them would be of significant benefit to
authors, and finally that solving them within HTML is the most
appropriate course of action.

>>>>> So why RDFa and not Microformats?
>>>
>>> (I think the question should be why RDFa is needed *as well as* formats)
>>
>> This is correct.  Microformats exist already.  They solve current
>> problems.  Are there further problems that Microformats don't address
>> which can be solved well by RDFa?  Are these problems significant
>> enough to authors to be worth addressing in the spec, or can we wait
>> and let the community work out its own solutions further before we
>> make a move?  We generally want to wait until a given item is truly
>> established before speccing it, so that we can work with existing
>
> Oh really? That's news to me.
>
> If this is principle we agree on that we really should start cutting lots of
> things from the spec.

It is a general principle, though not a necessary one.  As Ian noted
in his earlier email, speccing a solution too early runs the risk of
solving the wrong problem, and then poisoning that area of the
solution space entirely.  If we wait for authors to develop their own
hacks around features missing in the language, we can be sure that
we're solving a problem authors want solved, and we have some measure
of implementation experience already (even if just in author-deployed
Javascript) that we can learn from.  Other groups use this principle
as well - browser vendors prefix their early versions of new CSS
properties, for example, so that authors using these early versions
don't poison the space and prevent problems from being addressed that
would 'break' uses of the property.

Most the additions in HTML5 are designed on this principle.  For
example, the <video> element and the additional values for <input> are
drawn directly from javascript and flash-based solutions currently in
use, with the intent to make them easier for authors to use.  Others,
such as the additional sectioning elements and the new header-parsing
algorithm, were meant to embrace and bless well-established authoring
practices (splitting your content into header, footer, and content
<div>s, or building documents from smaller fragments which use <h1>
and such with clear intent to create a .  Finally, some additions (the
Workers spec, the SQL spec) have little in the way of current-practice
analogues (though much of those things are presaged in Gears, frex)
because they are designed to specifically address a current lack and
enable future uses.  These, though, still solve well-defined problems
and bring benefits which significantly outweigh their downsides.

RDFa isn't a well-established authoring pattern needing to be blessed
and made explicit.  That means it's either a simplification of
existing widespread hacks (Microformats?) intended to make authors
lives easier, or it's intended to fill a gaping hole that can be
established to be of significant benefit to authors to have filled.
Either way, one needs some justification.

Frex, it's possible that the language *could* be extended to make
Microformat-type things easier to use for authors.  We'd need to
establish that Microformats (or some other embedded metadata) really
are commonly used, though, and that the proposed simplification is
really significant enough (existing validation and video libraries,
for example, are much more complex to use than <input type="email"> or
<video>, and can impose significant extra bandwidth costs which are
undesirable).

It's also possible that embedded metadata support *is* a gaping hole
that needs to be filled.  We'd still need to (a) establish the problem
clearly (so we can evaluate possible solutions) and (b) decide that
RDFa is a good solution to the problem as stated before we add it into
the language.

~TJ

attached mail follows:



On Sat, 03 Jan 2009 04:52:35 +1100, Tab Atkins Jr. <jackalmage@gmail.com>  
wrote:

> On Fri, Jan 2, 2009 at 12:12 AM, Charles McCathieNevile
> <chaals@opera.com> wrote:
>> On Fri, 02 Jan 2009 05:43:05 +1100, Andi Sidwell <andi@takkaria.org>  
>> wrote:
>>
>>> On 2009-01-01 15:24, Toby A Inkster wrote:
>>>>
>>>> The use cases for RDFa are pretty much the same as those for
>>>> Microformats.
>>>
>>> Right, but microformats can be used without any changes to the HTML
>>> language, whereas RDFa requires such changes.  If they fulfill the  
>>> same use
>>> cases, then there's not much point in adding RDFa.
>>
>> ...
>
> Why the non-response?

Because the response comes in the next paragraph, to the first question  
that was worth asking.

>>>> So why RDFa and not Microformats?
>>
>> (I think the question should be why RDFa is needed *as well as*  
>> µformats)
>
> This is correct.  Microformats exist already.  They solve current
> problems.

(Elsewhere in this thread you wrote
[[[
It has not yet been established that there is a problem worth solving that  
metadata would address at all.
]]]
Do you consider that µformats do not encode metadata? Otherwise, I am not  
sure how to reconcile these statements. In any case I would greatly  
appreciate clarification of what you think microformats do, since I do  
believe that microformats are very explicitly directed to allowing the  
encoding of metadata, anbd therefore it is not clear that we are  
discussing from similar premises).

>  Are there further problems that Microformats don't address
> which can be solved well by RDFa?  Are these problems significant
> enough to authors to be worth addressing in the spec, or can we wait
> and let the community work out its own solutions further before we
> make a move?

In my opinion, yes there are further problems µformats don't solve (that  
RDFa does), yes they are significant, and the community has come up with a  
way to solve them - RDFa.

> Microformats are the metadata equivalent of Flash-based video players.
>  They are hacks used to allow authors to accomplish something not
> explicitly accounted for in the language.  Are there significant
> problems with this approach?

Yes. The problems are that they rely on precoordination on a  
per-vocabulary basis before you can do anything useful with the data. In  
practical usage they rely on choosing attribute names that hopefully don't  
clash with anything - in other words, trying to solve the problem of  
disambiguation that namespaces solves, but by choosing names that are  
wierd enough not to clash or by circumscribing the problem spaces that can  
be addressed to the extent that you can expect no clashes.

(This is hardly news, by the way).

> Is metadata embedding used widely enough
> to justify extending the language for it, or are the current hacks
> (Microformats, in this case) enough?  Are current metadata embedding
> practices mature enough that we can be relatively sure we're solving
> actual problems with our extension?

Current metadata embedding is done using µformats, and it's pretty clear  
that they are not sufficient. A large body of work uses RDF data models  
(Dublin Core, IMS, LOM, FOAF, POWDER are all large-scale formats. The  
people who are testing RDF engines with hundreds of millions of triples  
and more are doing it with real data, not stuff generated for the  
experiment).

It is also clear that people would like to develop further small-scale  
formats, and that µformats through its requirement for community  
consultation is effectively too heavyweight for the purposes of many  
developers.

>  These are all questions that must
> be asked of any extention to the language.
>
>>>> Firstly, RDFa provides a single unified parsing algorithm that
>>>> Microformats do not. ...
>>
>>> This is not necessarily beneficial.  If you have separate parsing
>>> algorithms, you can code in shortcuts for common use-cases and thus  
>>> optimise the authoring experience.
>>
>> On the other hand, you cannot parse information until you know how it is
>> encoded, and information encoded in RDFa can be parsed without knowing  
>> more.
>>
>> And not only can you optimise your parsing for a given algorithm, you  
>> can also do for a known vocabulary - or you can optimise the
>> post-parsing treatment.
>
> What is the benefit to authors of having an easily machine-parsed
> format?

Assuming that the format is sufficiently easy to write, and to generate, I  
am not sure what isn't obvious about the answer to the question.

(In case I am somehow very clever, and others aren't, the benefit is that  
it is easy to machine parse and use the information).

> Are they greater than the benefits of a
> format that is harder to parse, but easier for authors to write?

For a certain set of authors, yes the benefits are greater.

>>>  Also, as has been pointed out before in the distributed extensibility
>>> debate, parsing is a very small part of doing useful things with  
>>> content.
>>
>> Yes. However many of the use cases that I think justify the inclusion of
>> RDFa are already very small on their own, and valuable when several
>> vocabularies are combined. So being able to do off-the-shelf parsing is
>> valuable, compared to working out how to parse a combination of formats
>> together.
>
> Can you provide these use-cases?  The discussion has an astonishing
> dearth of use-cases by which we can evaluate the effectiveness of
> proposals.

The small-scale use cases are difficult to provide, since they are based  
on the fact that people do something quickly because they need it. One set  
of potential use cases is all the microformats that haven't been blessed  
by the µformats community as formally agreed "standards" - writing them in  
RDFa is sufficient to have them be usable.

Another use case is noting the source of data in mashups. This enables  
information to be carried about the licensing, the date at which the data  
was mashed (or smushed, to use the older terminology from the Semantic  
Web), and so on.

Another (the second time I have noted it in two emails) is to provide  
information useful for improving the accessibility of Web content.

The set of use cases that led to the development of GRDDL are also use  
cases for RDFa - since RDFGa allows a direct extraction to RDF without  
having to develop a new parser for each data model, authors can simplify  
the way they extract data by using RDFa to encode it, saving themselves  
the bother of explaining how to extract it. This time saving means that  
they can afford to develop a smaller, more specialised vocabulary.

> Is there any indication that use of
> ambiguous names produces significant problems for authors?

Not that I am aware of, although I think the question is poorly considered  
so I haven't given it much thought. There is plenty of evidence (for  
example the attempts to use Dublin Core within existing HTML mechanisms)  
that it causes problems for data consumers.

>>>> It can be argued that going through a
>>>> community to develop vocabularies is beneficial, as it allows the
>>>> vocabulary to be built by "many minds" - RDFa does not prevent this,  
>>>> it
>>>> just gives people alternatives to community development.
>>>
>>> RDFa does not give anything over what the class attribute does in  
>>> terms of
>>> community vs individual development, so this doesn't really speak in  
>>> RDFa's
>>> favour.
>>
>> In principle no, but in real world usage the class attribute is  
>> considered something that is primarily local, whereas RDFa is generally
>> used by people who have a broader outlook on the desirable permanence
>> and re-usability of their data.
>
> Can we extract a requirement from this, then?

A poor formulation (I hope that those who are better at very detailed  
requirements can help improve my phrasing) could be:

Provide an easy mechanism to encode new data in a way that can be  
machine-extracted without requiring any explanation of the data model.

>>>> Lastly, there are a lot of parsing ambiguities for many Microformats.
>>>> One area which is especially fraught is that of scoping. The editors  
>>>> of
>>>> many current draft Microformats[1] would like to allow page authors to
>>>> embed licensing data - e.g. to say that a particular recipe for a pie  
>>>> is
>>>> licensed under a Creative Commons licence. However, it has been noted
>>>> that the current rel=license Microformat can not be re-used within  
>>>> these
>>>> drafts, because virtually all existing rel=license implementations  
>>>> will
>>>> just assume that the license applies to the whole page rather than  
>>>> just
>>>> part of it. RDFa has strong and unambiguous rules for scoping - a
>>>> license, for example, could apply to a section of the page, or one
>>>> particular image.
>>>
>>> Are there other cases where this granularity of scoping would be  
>>> genuinely
>>> helpful?  If not, it would seem better to work out a solution for  
>>> scoping
>>> licence information...
>>
>> Yes.
>>
>> Being able to describe accessibility of various parts of content, or  
>> point
>> to potential replacement content for particular use cases, benefits
>> enormously from such scoping (this is why people who do industrial-scale
>> accessibility often use RDF as their infrastructure). ARIA has already  
>> taken
>> the approach of looking for a special-purpose way to do this, which
>> significantly bloats HTML but at least allows important users to satisfy
>> their needs to be able t produce content with certain information  
>> included.
>>
>> Government and large enterprises produce content that needs to be
>> maintained, and being able to include production, cataloguing, and  
>> similar
>> metadata directly, scoped to the document, would be helpful. As a  
>> trivial
>> example, it would be useful to me in working to improve the Web content  
>> we
>> produce at Opera to have a nice mechanism for identifying the original
>> source of various parts of a page.
>
> Can we distill this into use-cases, then?

Sure. It just takes a small amount of thinking. How many use cases would  
you think will be sufficient to demonstrate that this would be important.  
Or do you measure it by how many people each use case applies to? (It is  
far easier to justify the cost of developing use cases where there is more  
clarity about the goals for those use cases - and it enables people to  
decide whether to develop their own, or go find the people who are doing  
this and ask them to provide the information).

>  You, as an author, want to
> be able to specify the original source of a piece of content.  What's
> the practical use of this?  Does it require an embedded,
> machine-readable vocabulary to function?  Are existing solutions
> adequate (frex, footnotes)?
...
> Not quite.  Specifically, is there any practical use for marking up
> various sections of a site with licensing information specific to that
> section *in an embedded, machine-readable manner*?  Are the existing
> solutions adequate (frex, simply putting a separate copyright notice
> on each section, or noting the various copyrights on a licensing
> page)?

Let me treat these as the same question since I don't think they introduce  
anything usefully different between them. I will add to that Henri's  
questions about my use case for this already published elsewhere in this  
thread.

A practical use case is in an organisation where different people are  
responsible for different parts of content. Instead of having to look up,  
myself, who is responsible for each piece, and what rights are associated  
with it, I can automate the process. (This is one of the value  
propositions offered by content management systems. I hope we can agree  
that these are sufficiently widely used to a priori assume a use case, but  
if not please say so). This means that instead of manually checking many  
pages for things like accessibility or being up to date, and then having  
to find which part of the page was produced by which part of the  
organisation (which is what I do at Opera) I can simply have this  
information trawled and presented as I please by a program (which many  
large organisations do, or partially do).

Another example is that certain W3C pages (the list of specifications  
produced by W3C, for example, and various lists of translations) are  
produced from RDF data that is scraped from each page through a customised  
and thus fragile scraping mechanism. Being able to use RDFa would free  
authors of the draconian constraints on the source-code formatting of  
specifications, and merely require them to us the right attributes, in  
order to maintain this data.

An example of how this data can be re-used is that it is possible to  
determine many of the people who have translated W3C specifications or  
other documents - and thus to search for people who are familiar with a  
given technology at least at some level, and happen to speak one or more  
languages of interest. This is at least as important to me in looking for  
potential people to recruit as any free-text search I can do - and has the  
benefit that while I don't have the resources to develop large-scale  
free-text searching, I do have the resources to develop simple queries  
based on a standardised data model and an encoding of it.

Alternatively I could use the same information to seed a reputation  
manager, so I can determine which of the many emails I have no time to  
read in WHAT-WG might be more than usually valuable.

cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

attached mail follows:



On Jan 1, 2009, at 17:24, Toby A Inkster wrote:

> So why RDFa and not Microformats?

There's a possibility that this is a false dichotomy and both are bad.

> Firstly, RDFa provides a single unified parsing algorithm that  
> Microformats do not. Separate parsers need to be created for  
> hCalendar, hReview, hCard, etc, as each Microformat has its own  
> unique parsing quirks. For example, hCard has N-optimisation and ORG- 
> optimisation which aren't found in hCalendar. With RDFa, a single  
> algorithm is used to parse everything: contacts, events, places,  
> cars, songs, whatever.

More to the point, Microformats not only require per-format processing  
but the processing required for each Microformat isn't specified at  
all. That's bad.

RDFa, on the other hand, uses CURIEs, which is bad. (More generally, I  
think using URIs as identifiers instead of using them for above-TCP- 
layer protocol addressing is bad, but relying on the namespace mapping  
context is even worse.)

Have there been any attempts to remove the badness of Microformats  
without introducing the badness of RDFa in the process? That is, have  
there been attempts of defining unified parsing while retaining the  
feel of Microformats without relying on the namespace mapping context  
from the layer below?

If not, why not? I'm assuming that people in the Microformat community  
have clue. Yet, on the face of it, viewed from outside the community,  
their formats seem to have a big problem. Why hasn't the community  
fixed it? Is it a non-problem after all in practice?

> Lastly, there are a lot of parsing ambiguities for many  
> Microformats. One area which is especially fraught is that of  
> scoping. The editors of many current draft Microformats[1] would  
> like to allow page authors to embed licensing data - e.g. to say  
> that a particular recipe for a pie is licensed under a Creative  
> Commons licence. However, it has been noted that the current  
> rel=license Microformat can not be re-used within these drafts,  
> because virtually all existing rel=license implementations will just  
> assume that the license applies to the whole page rather than just  
> part of it. RDFa has strong and unambiguous rules for scoping - a  
> license, for example, could apply to a section of the page, or one  
> particular image.

Is the problem in the case of recipes that the provider of the page  
navigation around the recipe is unwilling to license the navigation  
bits under the same license as the content proper?

In the case of images, why should a program inferring something about  
licensing trust assertions made in a different HTTP resource (possibly  
even from a different Origin)?

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



On 2/1/09 10:38, Henri Sivonen wrote:
> More to the point, Microformats not only require per-format processing
> but the processing required for each Microformat isn't specified at all.
> That's bad.

Some do have processing specified (at least to some degree):

http://microformats.org/wiki/hcard-parsing

For the rest, this seems like something fixable, so I'm not sure how 
this is more to the point?

 > That is, have
> there been attempts of defining unified parsing while retaining the feel
> of Microformats without relying on the namespace mapping context from
> the layer below?

I suppose -

* http://microformats.org/wiki/design-patterns (reusable microformat 
components)

* http://microformats.org/wiki/parsing-brainstorming (attempt to 
actually specify precise parsing rules for all microformats)

* 
http://microformats.org/discuss/mail/microformats-discuss/2008-August/012435.html 
(proposal for specifying generic mapping of microformats to RDF - I 
think there's been more detailed work by various parties in this regard, 
but I'm not sure where best to link to)

- are approaching this problem from three different angles.

> Why hasn't the community fixed it?

I think the microformats community moves slowly, for better or worse, 
even when it agrees that there's a problem to solve. For example, 
progress on the problems with the abbr-design-pattern has been 
snail-like while losing the community an important user (the BBC), 
although admittedly the problems are basically intractable in HTML4/XHTML1.

I'm not sure how far the community as a whole does or doesn't view the 
lack of unified parsing as one of its bigger problems; I'm no spokesman 
though.

> Is it a non-problem after all in practice?

It's an additional barrier to creating and using (especially new) 
microformats or other extractable patterns.

The microformats community isn't there to support the creation of new 
extractable patterns outside the microformats community, which is where 
an iguana database pattern would likely need to be.

It could of course be the RDFa curie is worse than the disease.

An advantage of RDFa that is not related to curies and for which the 
three approaches towards unified extraction mentioned above are not a 
substitute is that RDFa provides a generic way to include hidden 
machine-friendly equivalents to human-readable information in the form 
of the (not especially well-named) "content" attribute.

http://www.w3.org/TR/rdfa-syntax/#rdfa-attributes

In general, this is something microformats rightly try to avoid:

http://microformats.org/wiki/principles

But sometimes it's unavoidable:

http://microformats.org/wiki/machine-data

http://microformats.org/wiki/value-excerption-pattern-issues

I do not believe that HTML5 as currently specified would remove the need 
to employ similar hacks as are mentioned on those pages, although it 
will remove the need in many cases (e.g. for datetimes within a given 
range), which is an improvement.

> Is the problem in the case of recipes that the provider of the page
> navigation around the recipe is unwilling to license the navigation bits
> under the same license as the content proper?

I thought Toby's example was that each recipe on the page needed a 
different licence, rather than a distinction between the main content 
area and the navigation.

> In the case of images, why should a program inferring something about
> licensing trust assertions made in a different HTTP resource (possibly
> even from a different Origin)?

Why should it trust assertions made in the same resource?

For example, presumably you could download an image, change its 
licencing metadata, and host it at your own Origin? Admittedly, that's a 
little more work than just hotlinking.

--
Benjamin Hawkes-Lewis



attached mail follows:



On Jan 2, 2009, at 14:01, Benjamin Hawkes-Lewis wrote:

> On 2/1/09 10:38, Henri Sivonen wrote:
>> More to the point, Microformats not only require per-format  
>> processing
>> but the processing required for each Microformat isn't specified at  
>> all.
>> That's bad.
>
> Some do have processing specified (at least to some degree):
>
> http://microformats.org/wiki/hcard-parsing

That's still not a proper parsing spec. Do all microformat consumers  
with significant market share do it that way?

> For the rest, this seems like something fixable, so I'm not sure how  
> this is more to the point?

HTML parsing is fixable, too, but actually fixing it is something that  
didn't happen until the fixing effort was taken to the spec level.

> * http://microformats.org/wiki/parsing-brainstorming (attempt to  
> actually specify precise parsing rules for all microformats)

This one I hadn't seen before. It's clearly a step into a more spec- 
like direction.

> It could of course be the RDFa curie is worse than the disease.

I suspect that is the case.

>> Is the problem in the case of recipes that the provider of the page
>> navigation around the recipe is unwilling to license the navigation  
>> bits
>> under the same license as the content proper?
>
> I thought Toby's example was that each recipe on the page needed a  
> different licence, rather than a distinction between the main  
> content area and the navigation.

Oh. That can be solved by giving each recipe its own URI & HTML page  
and scraping those pages instead of summary pages that might contain  
multiple recipes.

>> In the case of images, why should a program inferring something about
>> licensing trust assertions made in a different HTTP resource  
>> (possibly
>> even from a different Origin)?
>
> Why should it trust assertions made in the same resource?
>
> For example, presumably you could download an image, change its  
> licencing metadata, and host it at your own Origin? Admittedly,  
> that's a little more work than just hotlinking.

Good point. That's a problem if you are examining a previously unknown  
and untrusted site that might have all its content copied from  
somewhere else. Trusting the origin of the data for it licensing does  
help, though, if you are browsing a site you believe to be reputable  
and clueful and want to automate the license discovery part only.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



On Mon, 05 Jan 2009 01:21:33 +1100, Henri Sivonen <hsivonen@iki.fi> wrote:
> On Jan 2, 2009, at 14:01, Benjamin Hawkes-Lewis wrote:
>> On 2/1/09 10:38, Henri Sivonen wrote:

>>> Is the problem in the case of recipes that the provider of the page
>>> navigation around the recipe is unwilling to license the navigation  
>>> bits under the same license as the content proper?
>>
>> I thought Toby's example was that each recipe on the page needed a  
>> different licence, rather than a distinction between the main content  
>> area and the navigation.
>
> Oh. That can be solved by giving each recipe its own URI & HTML page and  
> scraping those pages instead of summary pages that might contain  
> multiple recipes.

Sure. In which case the problem becomes "doing mashups where data needs to  
have different metadata associated is impossible", so the requirement is  
"enable mashups to carry different metadata about bits of the content that  
are from different sources.

A use case for this:

There are mapping organisations and data producers and people who take  
photos, and each may place different policies. Being able to keep that  
policy information helps people with further mashups avoiding violating a  
policy.

For example, if GreatMaps.com has a public domain policy on their maps,  
CoolFotos.org has a policy that you can use data other than images for  
non-commercial purposes, and Johan Ichikawa has a photo there of my  
brother's café, which he has licensed as "must pay money", then it would  
be reasonable for me to copy the map and put it in a brochure for the  
café, but not to copy the data and photo from CoolFotos. On the other  
hand, if I am producing a non-commercial guide to cafés in Melbourne, I  
can add the map and the location of the cafe photo, but not the photo  
itself.

Another use case:
My wife wants to publish her papers online. She includes an abstract of  
each one in a page, but because they are under different copyright rules,  
she needs to clarify what the rules are. A harvester such as the Open  
Access project can actually collect and index some of them with no  
problem, but may not be allowed to index others. Meanwhile, a human finds  
it more useful to see the abstracts on a page than have to guess from a  
bunch of titles whether to look at each abstract.

cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

attached mail follows:



Charles McCathieNevile ha scritto:
> On Mon, 05 Jan 2009 01:21:33 +1100, Henri Sivonen <hsivonen@iki.fi> 
> wrote:
>> On Jan 2, 2009, at 14:01, Benjamin Hawkes-Lewis wrote:
>>> On 2/1/09 10:38, Henri Sivonen wrote:
>
>>>> Is the problem in the case of recipes that the provider of the page
>>>> navigation around the recipe is unwilling to license the navigation 
>>>> bits under the same license as the content proper?
>>>
>>> I thought Toby's example was that each recipe on the page needed a 
>>> different licence, rather than a distinction between the main 
>>> content area and the navigation.
>>
>> Oh. That can be solved by giving each recipe its own URI & HTML page 
>> and scraping those pages instead of summary pages that might contain 
>> multiple recipes.
>
> Sure. In which case the problem becomes "doing mashups where data 
> needs to have different metadata associated is impossible", so the 
> requirement is "enable mashups to carry different metadata about bits 
> of the content that are from different sources.
>
> A use case for this:
>
> There are mapping organisations and data producers and people who take 
> photos, and each may place different policies. Being able to keep that 
> policy information helps people with further mashups avoiding 
> violating a policy.
>
> For example, if GreatMaps.com has a public domain policy on their 
> maps, CoolFotos.org has a policy that you can use data other than 
> images for non-commercial purposes, and Johan Ichikawa has a photo 
> there of my brother's café, which he has licensed as "must pay money", 
> then it would be reasonable for me to copy the map and put it in a 
> brochure for the café, but not to copy the data and photo from 
> CoolFotos. On the other hand, if I am producing a non-commercial guide 
> to cafés in Melbourne, I can add the map and the location of the cafe 
> photo, but not the photo itself.
>

It seems a scenario where a human should carefully evaluate each licence 
and perhaps put a careful and human readable prose into the mashed-up 
page, or a link to such a prose. Metadata may or may not be accurate 
(e.g. may be misplaced and not contain the whole license, or refer to a 
wrong kind of license, different from the one stated in the prose), but 
the whole prose (and perhaps only that) is legally binding for sure (I'm 
not aware of any international law recognizing metadata and/or 
machine-processable/machine-friendly extracted content as a valid legal 
agreement/notice - in your example, Johan Ichikawa might put the "must 
pay money" license in a span containing a metadata reference to a 
creative commons license, but only the "must pay money" license is 
surely valid as a legal notice, as far as I can tell).


> Another use case:
> My wife wants to publish her papers online. She includes an abstract 
> of each one in a page, but because they are under different copyright 
> rules, she needs to clarify what the rules are. A harvester such as 
> the Open Access project can actually collect and index some of them 
> with no problem, but may not be allowed to index others. Meanwhile, a 
> human finds it more useful to see the abstracts on a page than have to 
> guess from a bunch of titles whether to look at each abstract.
>
>

I'm not strongly for one solution or the other in this case (an actual 
choice may depend on several considerations, such as harvesters 
reputation, or the need to use metadata anyway for private purposes), 
but this case might be addressed by embedding each abstract in an 
iframe, so that human users would get all of them in a single page, 
while a harvester would need to navigate each page to index/copy it, and 
a proper metadata might be put into each page, or each page might have a 
different rule to restrict access (e.g. through a robot file, or the 
Access-Control semantics, or any kind of white- or black- lists), 
specially to prevent a malicious harvester (that is one deliberately 
ignoring metadata and licenses) from accessing certain contents.

WBR, Alex

 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Email.it offre alle aziende il servizio di Email Marketing con pacchetti di invio a 10.000 utenti a soli 250 Euro
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8352&d=9-1

attached mail follows:



Calogero Alex Baldacchino wrote:

> My concern is: is RDFa really suitable for everyone and for Web
> automation? My own answer, at first glance, is no. That's because  
> RDF(a)
> can perhaps address nicely very niche needs, where determining how  
> much
> data can be trusted is not a problem, but in general misuses AND
> deliberate abuses may harm automation heavily

If your agent isn't going to trust the data gleaned from RDFa, then  
why should it trust the data gleaned from the web page's natural  
language? If the page has been authored by a reprobate that cannot be  
trusted to put honest and correct data in a few RDFa attributes, why  
should we trust their prose text?

An oft-quoted answer is that the prose text is "visible" whereas the  
RDFa is somehow "invisible". Apart from the fact that UIs which make  
use of data pulled in from RDFa will make this data visible, there is  
also the fact that RDFa, unlike an external RDF/XML file, or some  
metadata embedded in a <script> block, makes use of as much visible  
data as possible: visible links, visible text, etc.

	<p>My name is <span property="foaf:name"
	  about="#me">Toby Inkster</span>.</p>

If you can't trust someone to correctly mark up what their name is,  
then why trust them to mark up what deserves <em>phasis? Why believe  
the <address> they provide? What if the instance they marked up with  
<dfn> is not really the defining one? What if a <var> is really a  
constant?

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>



attached mail follows:



Toby A Inkster ha scritto:
> Calogero Alex Baldacchino wrote:
>
>> My concern is: is RDFa really suitable for everyone and for Web
>> automation? My own answer, at first glance, is no. That's because RDF(a)
>> can perhaps address nicely very niche needs, where determining how much
>> data can be trusted is not a problem, but in general misuses AND
>> deliberate abuses may harm automation heavily
>
> If your agent isn't going to trust the data gleaned from RDFa, then 
> why should it trust the data gleaned from the web page's natural 
> language? If the page has been authored by a reprobate that cannot be 
> trusted to put honest and correct data in a few RDFa attributes, why 
> should we trust their prose text?
>

If you sell computers but your site talks about cars I'll never buy a 
notebook from you; thus you're not cheating me, but yourself and 
damaging your business. But if you believe cars are searched more often 
than computers (just an example), one may use false metadata to cheat 
any UAs relying on metadata instead of prose, and take me on a store 
selling computers instead of cars.

Reliability of metadata (with respect to the described data) is an issue 
separated from reliability of content: it's not up to any UA to 
understand AND filter content basing on the author being trusted to be 
saing the truth (such would be a form of censorship), but if I ask the 
UA to bring me a page talking about horses, I don't want it to bring me 
a page talking about v.i.a.g.r.a. (that's spam), thus it is up to any UA 
relying on metadata to understand AND filter them basing on their 
reliability.

> An oft-quoted answer is that the prose text is "visible" whereas the 
> RDFa is somehow "invisible". Apart from the fact that UIs which make 
> use of data pulled in from RDFa will make this data visible, there is 
> also the fact that RDFa, unlike an external RDF/XML file, or some 
> metadata embedded in a <script> block, makes use of as much visible 
> data as possible: visible links, visible text, etc.
>
>     <p>My name is <span property="foaf:name"
>       about="#me">Toby Inkster</span>.</p>
>
> If you can't trust someone to correctly mark up what their name is, 
> then why trust them to mark up what deserves <em>phasis? Why believe 
> the <address> they provide? What if the instance they marked up with 
> <dfn> is not really the defining one? What if a <var> is really a 
> constant?
>

I don't really need a proper markup to understand a name is a name, a 
variable is a variable, a definition is a definition, and so on; you can 
use plain text and I'll understand your content the same way. If one 
makes a mistake when combining a <dfn> with an anchor, the result may be 
a broken link, perhaps making me look for a better site. If one's 
misusing <var> or <em>, the worst possible consequence is a bad 
presentation, and a bad presentation can be an attempt to cheat a UA (as 
when people puts a lot of keywords in a page and style them with the 
same color as the background to cheat search engines), but such is only 
if it is a deliberate choice, not a misuse (and I'm concerning mainly on 
abuses) -- anyway, it is easier to cheat a UA by the mean of false 
metadata than cheating a human person by the mean of wrong markup.

If some markup is like,

<p>We sell <a href="www.cheatingcarseller.com" property="foaf:name" 
content="Toby Inkster">cars</a></p>

in any advertisement, I'll notice it's about cars and I'll choice 
whether to follow it or not, basing on my interest at the moment, but if 
I query "Toby Inkster" to a semantic UA blindly relying on metadata, I 
might get a page of a cars webstore instead of your homepage (for instance).

Furthermore, I started my replies from a Charles McCathieNevile's mail, 
explicitly talking about trusted data and (mainly) small use cases, not 
a wide-scale web automation. If there's no agreement about what kind of 
needs are best addressed by RDFa, maybe I have to agree with people 
saying that technology must grow and become more mature (or, at least, 
better understood) before it is merged into HTML5 specification (and 
2023 is far enough to accomplish such a goal :-) ). And I re-throw my 
suggestion to map RDFa attribute to data-rdfa-* attributes and build 
RDFa processor plugins for most common browsers, to test HTML5 and RDFa 
convergence in a wider scale before having browser natively supporting 
RDFa in HTML5 documents (for the purpose of a test - but not only - I 
don't think "data-rdfa-property" vs "rdfa:property" vs "property" would 
be much of a problem).

I'm not saying RDFa is a bad thing, or it is useless, I just don't think 
any kind of markup can fit perfectly the semantic of "random" content 
for the purposes of a "global", wide-scale and automatic classification 
of content.

Best regards,
Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Incrementa la visibilita' della tua azienda con l'invio di newsletter e campagne email marketing.
* Con investimento di soli 250 Euro puoi incrementare la tua visibilita'
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8350&d=4-1

attached mail follows:



Dan Brickley wrote:

> While I'm unsure about the "commercial relationship" clause quite
> capturing what's needed, the basic idea seems sound. Is there any
> provision (or plans) for applying this notion to entire blocks of
> markup, rather than just to simple hyperlinks? This would be rather
> useful for distinguishing embedded metadata that comes from the page
> author from that included from blog comments or similar.

While that might be useful for natural language processing, for RDFa  
it is actually completely unneeded. The syntax of RDFa allows for  
blocks of markup to be made "invisible" by making an ancestor node  
into an XMLLiteral.

For example, a comment might be marked up as:

<section typeof="atom:Entry" xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:atom="http://bblfish.net/work/atom-owl/2006-06-06/#">
   <address rel="atom:author">
     On <time property="atom:published" content="2009-01-10"
     >10 Jan 2009</time>,
     <a property="foaf:name" rel="foaf:page"
     href="http://joe.example.com">Joe Bloggs</a> wrote:
   </address>
   <div rel="atom:content">
     <blockquote property="atom:xhtml">
       <!-- The comment goes here. -->
     </blockquote>
   </div>
</section>

The RDFa processing instructions say that as the blockquote doesn't  
have an explicit datatype set, it is to be treated entirely as a  
string literal (if it doesn't have any child elements) or an XML  
literal (if it does), and that parsers must not look inside it for  
triples. Thus spammers can't use the comment form for stuffing  
triples into the page.

It should be noted in this case that RDFa also allows natural  
language parsers to be made more useful. By looking at the RDFa which  
marks up the author's name and website, they may be able to determine  
that the comment has been written by someone other than the page's  
main author, and thus not afford it the same level of trust granted  
to the rest of the page. So the natural language processing can  
benefit from RDFa.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


attached mail follows:



Toby A Inkster ha scritto:
>
> It should be noted in this case that RDFa also allows natural language 
> parsers to be made more useful. By looking at the RDFa which marks up 
> the author's name and website, they may be able to determine that the 
> comment has been written by someone other than the page's main author, 
> and thus not afford it the same level of trust granted to the rest of 
> the page. So the natural language processing can benefit from RDFa.
>

That's true only if one can assume metadata are trustful, but they are 
only if they can be under a strict control, that is on a small-scale 
application. On a wider scale, one needs to make the opposite 
assumption, because it would or might be more common to find fake 
metadata with "honest" content (the prose of an advertisement does not 
lie, but related metadata can tell it's a different think to cheat a 
metadata-based UA), either because a site author can be a party to the 
spammer, or because authors can mess up metadata (yeah, they can mess up 
html code too, but that's either not a problem, because a UA can present 
the content as well, or it is but it might damage the author more than 
it may harm the user). If metadata are created/used for external 
consumption, they can be just ignored by authors, who instead may just 
copy&paste code or reuse templates in different contexts, without being 
able to set proper metadata for the new content. Thus UAs can't rely on 
metadata /in general/, while they might /in some/, small-scale scenarios.

WBR, Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8547&d=10-1

attached mail follows:





Hi Steven,

(cc www-archive, libby)

Re the alumni/people page scenario, I asked on the whatwg list about 
whether html5 is attempting any particular mechanism for saying which 
bits of a page are 'comments' or untrusted. But it seems from Toby's 
reply that RDFa is quite handy here.

I've been thinking about how one might use the hypertext path from 
http://www.w3.org/ to /People and ..etc/Alumni to indicate that they 
have the same creator/publisher.

1st idea - use a custom relation like 'alumniPage'
2nd idea - generalise that - 'staffInfoPage', 'aboutOrg page'
3rd idea - generalise further - use RDF to state that those pages have a 
dc:creator / foaf:maker which is the organization W3C
4th idea - use POWDER to claim that all pages matching some URI prefix 
have these properties

I think 4. is probably the way to go, but haven't dug into current state 
of POWDER. The others would cause needless proliferation of properties 
and clutter each hyperlink with additional link-typing annotations.

This would allow some Org (companies, nonprofits, whatever) to say in 
RDF on their homepage "all HTML pages whose URI matches 
http://eg.example.com/aboutus/*html" are pages whose foaf:maker is the 
organization whose homepage is http://eg.example.com/ and whose name is 
"E.G. Org.".

The point of this being that we need a way of picking out those pages 
(and pieces of pages) whose provenance/source is the main publisher, 
versus other things on the site (or in the page) that might be user 
supplied. On w3.org, the msgid: proxy that includes all of lists.w3.org 
into www.w3.org is a good use case; but also various W3C-linked people, 
WG/IG members etc., have write access to bits of the site.

In parallel to this I'm still exploring the xmldsig route. Here is a 
test (linked by wot:assurance from foaf.rdf) signing of my foaf file:
http://danbri.org/foaf.rdf.sigdata ... although done with a random 
generated key that I didn't write the java code to manage properly.

Use case for that is: how do we know whether to believe the foaf:tipjar 
property claim in http://danbri.org/foaf.rdf and buy danbri a book?

Hope this makes some sense! So I think next step is to check out POWDER. 
http://www.w3.org/TR/2008/WD-powder-primer-20081114/

I think they're using GRDDL due to the need to include quoted fragments 
of full RDF within each site 'label', something that's ugly to do in 
pure RDF (we tried in the earlier WCL design)...

cheers,

Dan


-------- Original Message --------
Subject: Re: [whatwg] Trying to work out the problems solved by RDFa
Date: Sat, 10 Jan 2009 13:51:26 +0000
From: Toby A Inkster <mail@tobyinkster.co.uk>
To: whatwg@lists.whatwg.org

Dan Brickley wrote:

> While I'm unsure about the "commercial relationship" clause quite
> capturing what's needed, the basic idea seems sound. Is there any
> provision (or plans) for applying this notion to entire blocks of
> markup, rather than just to simple hyperlinks? This would be rather
> useful for distinguishing embedded metadata that comes from the page
> author from that included from blog comments or similar.

While that might be useful for natural language processing, for RDFa
it is actually completely unneeded. The syntax of RDFa allows for
blocks of markup to be made "invisible" by making an ancestor node
into an XMLLiteral.

For example, a comment might be marked up as:

<section typeof="atom:Entry" xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:atom="http://bblfish.net/work/atom-owl/2006-06-06/#">
   <address rel="atom:author">
     On <time property="atom:published" content="2009-01-10"
     >10 Jan 2009</time>,
     <a property="foaf:name" rel="foaf:page"
     href="http://joe.example.com">Joe Bloggs</a> wrote:
   </address>
   <div rel="atom:content">
     <blockquote property="atom:xhtml">
       <!-- The comment goes here. -->
     </blockquote>
   </div>
</section>

The RDFa processing instructions say that as the blockquote doesn't
have an explicit datatype set, it is to be treated entirely as a
string literal (if it doesn't have any child elements) or an XML
literal (if it does), and that parsers must not look inside it for
triples. Thus spammers can't use the comment form for stuffing
triples into the page.

It should be noted in this case that RDFa also allows natural
language parsers to be made more useful. By looking at the RDFa which
marks up the author's name and website, they may be able to determine
that the comment has been written by someone other than the page's
main author, and thus not afford it the same level of trust granted
to the rest of the page. So the natural language processing can
benefit from RDFa.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>




attached mail follows:



Henri Sivonen wrote:

> eRDF is very different in not relying on attributes whose qname
> contains the substring "xmlns".


eRDF is very different in that it is incredibly annoying to use in  
real world scenarios (i.e. not hypothetical "Hello World" examples).

Calogero Alex Baldacchino wrote:

> I guess closing a language to every kind of "back-door changes" may be
> in contrast with the principle of paving a cawpath. I also guess that,
> if microformats experience (or the "realworld semantics" they claim to
> be based on) had suggested the need to add a new element/attribute to
> the language, a new element/attribute would have been added.

But Microformats experience *does* suggest that new attributes are  
needed for semantics. Look at the debate around accessibility within  
Microformats which has been going on for ages. Because of the  
Microformats process of working *within* existing HTML standards it  
has not been solved, and I can't see a solution reaching consensus in  
the foreseeable future. HTML5's <time> goes part of the way to  
solving this, but it doesn't address the whole problem like RDFa's  
"content" attribute does.

Another reason the Microformat experience suggests new attributes are  
needed for semantics is the overloading of an attribute (class)  
previously mainly used for private convention so that it is now used  
for public consumption. Yes, in real life, there are pages that use  
class="vcard" for things other than encoding hCard. (They mostly use  
it for linking to VCF files.) Incredibly, I've even come across pages  
that use class="vcard" for non-hCard uses, *and* hCard - yes, on the  
same page! As the Microformat/POSHformat space becomes more crowded,  
accidental collisions in class names become ever more likely.

The Microformats community hasn't added any new attributes for  
Microformats, because that was one of the guiding principles when the  
community was established: however, that does not mean it hasn't  
shown that new attributes are needed for encoding rich semantics in  
HTML. On the contrary, I think it's proved that they are.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>


attached mail follows:



On 2009-01-12 23:15, Toby A Inkster wrote:
> Henri Sivonen wrote:
>
>> eRDF is very different in not relying on attributes whose qname
>> contains the substring "xmlns".
>
>
> eRDF is very different in that it is incredibly annoying to use in real
> world scenarios (i.e. not hypothetical "Hello World" examples).
>
> Calogero Alex Baldacchino wrote:
>
>> I guess closing a language to every kind of "back-door changes" may be
>> in contrast with the principle of paving a cawpath. I also guess that,
>> if microformats experience (or the "realworld semantics" they claim to
>> be based on) had suggested the need to add a new element/attribute to
>> the language, a new element/attribute would have been added.
>
> But Microformats experience *does* suggest that new attributes are
> needed for semantics. Look at the debate around accessibility within
> Microformats which has been going on for ages. Because of the
> Microformats process of working *within* existing HTML standards it has
> not been solved, and I can't see a solution reaching consensus in the
> foreseeable future. HTML5's <time> goes part of the way to solving this,
> but it doesn't address the whole problem like RDFa's "content" attribute
> does.

Right, so some microformats brought to attention a need which HTML5 
could easily solve by adding <time>.  Why does this mean that RDFa 
should be added?

> Another reason the Microformat experience suggests new attributes are
> needed for semantics is the overloading of an attribute (class)
> previously mainly used for private convention so that it is now used for
> public consumption.

But HTML4 itself says that class can be used "for general purpose 
processing by user agents", so this seems to be a weird argument.  If we 
introduced RDFa and it got used, would you argue you need something more 
than RDFa, because it is being used for what it is specced for?

> Yes, in real life, there are pages that use
> class="vcard" for things other than encoding hCard. (They mostly use it
> for linking to VCF files.) Incredibly, I've even come across pages that
> use class="vcard" for non-hCard uses, *and* hCard - yes, on the same
> page! As the Microformat/POSHformat space becomes more crowded,
> accidental collisions in class names become ever more likely.

Right, but is it much of an issue?  If you have a hCard extractor, the 
user can see easily that it's not useful data.  And if doesn't follow 
any of the other rules for an hCard, then the UA can safely ignore it 
(e.g. it has no fields).  In practice, this kind of collision seems 
fairly non-problematic.

> The Microformats community hasn't added any new attributes for
> Microformats, because that was one of the guiding principles when the
> community was established: however, that does not mean it hasn't shown
> that new attributes are needed for encoding rich semantics in HTML. On
> the contrary, I think it's proved that they are.

Given that the only example of the microformats process needing an 
addition to the HTML language has been <time>, I'm not sure that's a 
conclusive proof.

Andi

attached mail follows:



Toby A Inkster ha scritto:
>
> Another reason the Microformat experience suggests new attributes are 
> needed for semantics is the overloading of an attribute (class) 
> previously mainly used for private convention so that it is now used 
> for public consumption.

Maybe this is true, but, personally, I prefere this approach to the 
addition of new features/attributes/elements to an official 
specification without a clear support requirement for UAs beside just 
parsing. A similar (if not stronger) argument may be raised against the 
reuse of the content attribute in the context of RDFa, which I think has 
caused a significant change with respect to its original semantics (now 
it should be shared by every element, originally it was a <meta> 
specific attribute; now it should be part of an RDF _triple_, in origin 
it was - and is still - part of a _pair_ when used in conjunction with 
the "name" attribute, and constitutes a pragma directive in conjunction 
with the "http-equiv" attribute, which is somehow closer to an XML 
processing instruction than to an RDF triple - the same applies to a 
<link> with rel="stylesheet", for instance).

> Yes, in real life, there are pages that use class="vcard" for things 
> other than encoding hCard. (They mostly use it for linking to VCF 
> files.) Incredibly, I've even come across pages that use class="vcard" 
> for non-hCard uses, *and* hCard - yes, on the same page! As the 
> Microformat/POSHformat space becomes more crowded, accidental 
> collisions in class names become ever more likely.
>

Indeed, that's a possible source of troubles. I think that's the same if 
people misused prefixes, e.g. if after merging some content from 
different documents they got a different namespace binded to a 
previously declared prefix in a scope where both namespaces are involved 
(in an xhtml document). Also, a custom script may distinguish between 
different uses of "vcard" by the mean of a further, private classname, 
or by enveloping elements in containers (divs) with proper ids, which 
may be a good solution in some cases, and not in other ones; a more 
generic parser, being specialized by design, has a chance to recognize a 
correct structure for a given format and to discard wrong informations, 
which may work fine in some cases, but not in others. As always, each 
choice has its own downsides, and what counts is the costs/benefits 
ratio; it seems that any solution not requiring to be supported has the 
lowest costs for UA implementors.

I do not doubt xml extensibility (which effectively is the base of 
curies) has its own benefits, it's flexible and suitable for a quick 
developement of custom solutions, but it's also got its own downsides, 
such as leading to a possible heavy fragmentation, being difficoult to 
understand and use for many people (who are usually fooled by the 
concept of namespaces) and thus potentially causing misuses and errors. 
It doesn't seems that xml extensibility brought more benefits than 
costs, and a proof can lay in the majority of the web not having 
followed the envisioned xml-alike evolution.

Anyway, I'm not strongly against RDFa in HTML, instead, I can be quite 
neutral (I'd live with it); I'm not convinced it is worth to add it to 
the spec at this stage and until it would be possible to establish what 
UAs must do with them beside parsing (and how to deal with namespaces 
while parsing). Also, I'm not fully convinced by the need to embed 
metadata in a page and keep them in sync with that page. For instance, 
it require that every page reporting the same informations must 
duplicate the same metadata structure, and this doesn't grant that those 
informations, in first place, are in sync with real world (some pages 
might be out-of-date, others might be up-to-date). Instead, a separate 
file containing metadata to be linked when appropriate might solve both 
the problems: it doesn't require duplicates and can have a somewhat 
versioning to keep trace of changes and to present updated 
machine-friendly information to help users visiting an outdated page 
(assuming users can trust those metadata). Of course, this solution has 
its own downsides too.

WBR, Alex

 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Blu American Express: gratuita a vita! 
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8615&d=4-2

attached mail follows:




1. THE WAR OF THE WORLDS

The Semantic Web is based on the concept of being able to express in a
standard format relations between different data and different entities
(including real-world entities) [1]. Today this is mainly based on RDF.

The traditional web focuses on web-pages ranging from interchange of
static documents to dynamic applications [2]. Today this is mainly
based on HTML.

It is clear that there is a missing link between this two worlds,
because the data are, most of the time, included in documents
in ways that are non-structured and non-standardized
or the data is managed by applications in a proprietary/closed manner.

The risk is that the two worlds (semantic web and traditional web)
instead of collaborate will be in competition.
Today we are witnessing a lot of talks (not only in this list)
where there are visionary-supporters of the "unspecified wonders"
of the "semantic-web-that-will-be" which are opposed by
pragmatic-supporters of Traditions that believes in
"unlimited-evolutionism" of full-text searches
and consider a "titanic&impossible" enterprise
the rdf-izing of the world :-).

But this conflict has any reason to exist?
We need to leave, on both sides,
all preconceived positions.

I believe the two worlds are being developed as separate reality
and this is a concrete problem that we have to resolve.

Today we have the opportunity to do so with HTML5.

2. LOWER THE BARRIER

It is clear that publishing simple web documents and applications
it is easier that structuring information in a semantic manner
but we must find ways to make this possible in a unified framework:

  documents + applications + semantics = HTML5

If we want the promises of semantic web to become a reality
we must lower the entry level for generic users.

HTML5 certainly must not solve problems that today we can't prefigure.
But there is clearly a problem that HTML5 faces today:
there is no widespread use of semantic tools
because the barrier to use them is too high for users.
This is the main reason behind the fact that semantic web
is being developed as a world in itself, mainly yet academic.

As well as the original HTML has enabled users to easily publish
hypertext documents, today HTML5 must allow users
to easily semantify their data, documents & applications.

At the moment, an user who wants to create/use
semantically structured informations finds browsers
that, natively, don't give him solutions to do that.

The user is forced to move in a "jungle" of tools
(without GUI or with poor usability), plugins and languages
that are not widespread standards.

Exactily the same situation faced by an user
who had tried to create hypertexts in 1990.

3. LINKS AND BEYOND

As well as the power of the traditional web is in "hypertextual-links"
among documents identified by URLs, the power of semantic web
is in "semantic-links" between documents/data/entities identified by URL/URI.

We must give users an easy way to create these semantic-links,
in a way that is as simple as creating classic hyperlinks.

Semantic-links could be collected by search engines (machines)
to enhance their functionalities, and could be used in other automatic
processing.
But, first of all, can represent a big value for the browser's user (human)
if we find in HTML5 a standard way to visualize/interact with these
semantic-links.

We could define a "semantic-link" as a connection to
"semantically structured informations" (embedded or in external resource),
that is presented to the user in a fashion similar
(but not the same) to classic hyperlinks. A semantic-link
could be considered as a sort of "semantic annotation"
enhancing the main content delivered to the user and
enabling him further interactions with "linked data".

We absolutely need for this a "common minimum standard"
although nothing will prevents to continue developing
additional or alternative ways of visualization/interaction
(via plugins, proprietary implementations in browsers,
new languages versions).

4. OVERVIEW OF SCENARIO'S USE CASES

With respect to use cases, are certainly to be considered
all the use cases developed by RDFa [3] but also
those developed by the "Semantic Web Activity" [4],
and other could be derived for each one of microformats [5]
or in the scenarios described by Adrian Holovaty in the article
"A fundamental way newspaper sites need to change" [6].

For example, would be interesting to have a standard for
a) structuring b) normally visualize in the page (via CSS)
b) have the possibility to interact/manipulate via the browser,
the data present in "Wikipedia's Infobox" [7].
Another example could be a standard for the visualization
of "access doors" to semantically structured informations "hidden" in the pages
and the "possible user's actions" (see "IE8 Activities" [8])?

Other interesting issues, in terms of user interface,
are raised by Alex Faaborg in the article
"User interface of microformat detection" [9],
and from the fact that we need something more user-friendly
and standardized of "bookmarklets" [10], from
the fact that structured information can improve
features in scenarios raised by projects like Ubiquity [11],
and, last but not least, some evaluation recently
exposed by Ian Hickson in WHATWG [12].

5. TWO REAL PROBLEMS

I think it's good, first of all, to abstract from single use cases
depicted above and find a solution to two fundamental problems
that lie at the root of the use cases, two problems that, today,
have no solution in the current version of HTML:

I) User agents must allow users to see that there are "semantic-links"
(connections to semantically structured informations)
in a HTML document/application. Consequently
user agents must allow users to "follow" the semantic-link,
(access/interact with the linked data, embedded or external)
and this involves primarily the ability to:
a) view the informations
b) select the informations
c) copy the informations in the clipboard
d) drag and drop the informations
e) send that informations
to another web application
(or to OS applications)
selected by the user.

II) User agents must allow users to "semantically annotate"
an existing HTML document (insert a semantic link and linked data)
and this involves primarily the ability to:
a) editing the document to insert semantically structured informations
(starting from the existing text or from information
already structured in the edited portion of the page)
b) send the result of the editing
to another web application
(or to OS applications)
selected by the user.

Solving the first problem we will spread to *all* users
the possibility to access the semantic web in normal browser
(target impossible to achieve simply through
microformats & plugins and without an effective
standard incorporation in HTML).

Solving the second problem we will spread to *all* users
(to all interested users) the possibility
to access the semantic potential at personal level
(for examle build an archive of personal semantic annotation)
and at social level (for example contribute to collective effort to
"semantify" originally unstructured web resources).

6. SEARCHING POSSIBLE SOLUTIONS

The first solution that we can think of
is a new attribute @semantic
(don't focus on his name) used like this

   <A href =".." semantic =".." class =".."
   <DIV semantic =".." class =".."

in @semantic we can have:

a) URL of a resource that semantically describes
the content (in RDF, RDFa, JSON, CSV) like this

   semantic="http://www.foo.com/desc.rdf"

b) direct semantically structured information, in @style manner,
probably something like this (thinking at RDFa)

   semantic="property: ..; about: ..;"

Furthermore, in the hypothesis of some sort of
"Cascading Semantics" (see for example cRDF [13])
we can also think  to create a new element SEMANTIC
like this

   <SEMANTIC Type=".."> ...</ SEMANTIC>

to embed semantically structured information
along the way in a CSS manner in several format.

Naturally we need further investigation on *all points*.

But, probably, we need some new properties/elements
because not all the exposed problems are simply solvable
through a generic extension mechanism [14]
that makes possible to insert RDFa in HTML.

A generic extension mechanism remains desirable
for other reasons (MathML, SVG, etc.), but we need
also a very different thing, set in the heart of HTML,
that makes it possible to bridge the gap between the two worlds
of semantic Web and traditional web...
to make them become one.

[1] http://www.w3.org/2001/sw/
[2] http://dev.w3.org/html5/spec/Overview.html#scope
[3] http://www.w3.org/TR/xhtml-rdfa-scenarios/
[4] http://www.w3.org/2001/sw/sweo/public/UseCases/
[5] http://microformats.org/wiki/Main_Page
[6] http://www.holovaty.com/blog/archive/2006/09/06/0307
[7] http://en.wikipedia.org/wiki/Help:Infobox
[8] http://blogs.msdn.com/ie/archive/2008/03/06/activities-and-webslices-in-internet-explorer-8.aspx
[9] http://blog.mozilla.com/faaborg/2007/02/04/microformats-part-4-the-user-interface-of-microformat-detection/
[10] http://en.wikipedia.org/wiki/Bookmarklet
[11] http://ubiquity.mozilla.com/
[12] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-December/018023.html
[13] http://www.xanthir.com/rdfa-vs-crdf.php
[14] http://www.w3.org/html/wg/tracker/issues/41

-- 
Giovanni Gentili - giovanni.gentili@gmail.com


attached mail follows:




Tom Morris wrote:

> You can do that already with HTML 4 and XHTML 1.x using GRDDL. GRDDL
> no longer works in HTML 5 as the profile attribute has been removed.
> (We get some nonsense about GRDDL still working but just not
> 'requiring' profile. This is nonsense. It's a bit like saying that
> you've taken the wheels off the car but it still works because you can
> turn the engine on.)

I think much of this nonsense has arisen because GRDDL effectively  
uses the profile attribute for two purposes, and that has gotten  
people confused. By taking care of one of the purposes in HTML5, it  
has been assumed that GRDDL will thus work in HTML5; tick it off the  
list; done; taken care of.

GRDDL uses the profile attribute firstly as a flag which says "this  
page has some GRDDL transformations linked from it". HTML5 has said  
that all pages may have GRDDL transformations linked from them, thus  
this flag is not needed in HTML5. Fair enough, you can say "this  
works in HTML5 without requiring a profile" and that will work. It  
introduces incompatibilities between how GRDDL is processed in HTML5  
and how it is processed in earlier versions of (X)HTML, which is  
annoying (particularly as XHTML does not require a DOCTYPE, so there  
is no easy way of differentiating between XHTML5 and earlier versions  
of XHTML) but still doable.

But GRDDL uses the profile attribute in another manner: a GRDDL agent  
is supposed to loop through all the page's profiles, perform an HTTP  
request for each one, and use the data it finds in them. If you say  
"this works in HTML5 without requiring a profile" then that is  
clearly nonsense. To loop through the profiles, there has to be some  
profiles! rel="profile" does work as a substitute here, but again it  
introduces inconsistencies between HTML5 and previous versions of (X) 
HTML.

This syntax change for linking to profiles seems to be an entirely  
gratuitous one: if profiles are going to be allowed, then why change  
the syntax for them? As an analogy, there are many very good  
arguments both ways for dropping or retaining the <b> and <i>  
elements, but to suggest renaming them instead to <bold> and <italic>  
would be silly - even if the unabbreviated names would be clearer,  
the headaches caused by the syntax change would be massive.  
Similarly, rel="profile" does seem like a slightly nicer syntax than  
the profile attribute, but that small advantage comes at a cost of  
breaking compatibility with existing tools that use profiles.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>




attached mail follows:



Calogero Alex Baldacchino wrote:

> The concern is about every kind of metadata with respect to their
> possible uses; but, while it's been stated that Microforamts (for
> instance) don't require any purticular support by UAs (thus they're
> backward compatible), RDFa would be a completely new feature, thus  
> html5
> specification should say what UAs are espected to do with such new
> attributes.

RDFa doesn't require any special support beyond the special support  
that is required for Microformats. i.e. nothing. User agents are free  
to ignore the RDFa attributes. In that sense, RDFa already "works" in  
pretty much every existing browser, even going back to dinosaurs like  
Mosaic and UdiWWW.

Agents are of course free to offer more than that. Look at what they  
do with Microformats: Firefox for instance offers an API to handle  
Microformats embedded on a page; Internet Explorer offers its "Web  
Slices" feature.

> For what concerns html serialization, in particular, I'd consider  
> some code like [...] which is rendered properly


Is it though? Try adding the following CSS:

	span[property="cal:summary"] { font-weight: bold; }

And you'll see that CSS doesn't cope with a missing ending tag in  
that situation either.

If you miss out a non-optional end tag, then funny things will happen  
- RDFa isn't immune to that problem, but neither is the DOM model,  
CSS, microformats, or anything else that relies on knowing where  
elements end. A better comparison would be a missing </p> tag, which  
is actually allowed in HTML, and HTML-aware RDFa processors can  
generally handle just fine.

> considering RDFa relies on namespaces (thus,
> adding RDFa attributes to HTML5 spec would require some features from
> xml extensibility to be added to html serialization).


RDFa *does not* rely on XML namespaces. RDFa relies on eight  
attributes: about, rel, rev, property, datatype, content, resource  
and typeof. It also relies on a CURIE prefix binding mechanism. In  
XHTML and SVG, RDFa happens to use XML namespaces as this mechanism,  
because they already existed and they were convenient. In non-XML  
markup languages, the route to define CURIE prefixes is still to be  
decided, though discussions tend to be leaning towards something like:

<html prefix="dc=http://purl.org/dc/terms/ foaf=http://xmlns.com/foaf/ 
0.1/">
<address rel="foaf:maker" rev="foaf:made">This document was made by  
<a href="http://joe.example.com" typeof="foaf:Person"  
rel="foaf:homepage" property="foaf:name">Joe Bloggs</a>.</address>
</html>

This discussion seems to be about "should/can RDFa work in HTML5?"  
when in fact, RDFa already can and does work in HTML5 - there are  
approaching a dozen interoperable implementations of RDFa, the  
majority of which seem to handle non-XHTML HTML. Assuming that people  
see value in RDFa, and assuming that the same people see value in  
using HTML5, then these people will use RDFa in HTML5. The question  
we should be discussing is not "should it work?" (because it already  
does), but rather, "should it validate?"

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>




attached mail follows:



Toby A Inkster wrote:
> Calogero Alex Baldacchino wrote:
> 
>> The concern is about every kind of metadata with respect to their
>> possible uses; but, while it's been stated that Microforamts (for
>> instance) don't require any purticular support by UAs (thus they're
>> backward compatible), RDFa would be a completely new feature, thus html5
>> specification should say what UAs are espected to do with such new
>> attributes.
> 
> RDFa doesn't require any special support beyond the special support that 
> is required for Microformats. i.e. nothing. User agents are free to 
> ignore the RDFa attributes. In that sense, RDFa already "works" in 
> pretty much every existing browser, even going back to dinosaurs like 
> Mosaic and UdiWWW.
> 
> Agents are of course free to offer more than that. Look at what they do 
> with Microformats: Firefox for instance offers an API to handle 
> Microformats embedded on a page; Internet Explorer offers its "Web 
> Slices" feature.
> 

If it is true that RDFa can work today with no ill-effect in downlevel 
user-agents, what's currently blocking its implementation? Concern for 
validation?

It seems to me that many HTML extensions are implemented first and 
specified later[1], so perhaps it would be in the interests of RDFa 
proponents to get some implementations out there and get RDFa adopted, 
at which point it will hopefully seem a much more useful proposition for 
inclusion in HTML5.

In the short term the RDFa community can presumably provide a 
specialized "HTML5 + RDFa" validator for adopters to use until RDFa is 
incorporated into the core spec and tools.

It would seem that it's much easier to get into the spec when your 
feature is proven to be useful by real-world adoption.



[1] canvas, keygen, frames and script are examples of this phenomenon.


attached mail follows:



Martin Atkins wrote:
> ...
> If it is true that RDFa can work today with no ill-effect in downlevel 
> user-agents, what's currently blocking its implementation? Concern for 
> validation?
> 
> It seems to me that many HTML extensions are implemented first and 
> specified later[1], so perhaps it would be in the interests of RDFa 
> proponents to get some implementations out there and get RDFa adopted, 
> at which point it will hopefully seem a much more useful proposition for 
> inclusion in HTML5.
> 
> In the short term the RDFa community can presumably provide a 
> specialized "HTML5 + RDFa" validator for adopters to use until RDFa is 
> incorporated into the core spec and tools.
> 
> It would seem that it's much easier to get into the spec when your 
> feature is proven to be useful by real-world adoption.
> ...

What he said.

Although I *do* believe that in the end we'll want RDFa-in-HTML5, what's 
really important right now is *not* RDFa-in-HTML5 but RDFa-in-HTML4. 
Define that, make it a success, and the rest will be simple.

Best regards, Julian

attached mail follows:



Toby A Inkster ha scritto:
> Calogero Alex Baldacchino wrote:
>
>> The concern is about every kind of metadata with respect to their
>> possible uses; but, while it's been stated that Microforamts (for
>> instance) don't require any purticular support by UAs (thus they're
>> backward compatible), RDFa would be a completely new feature, thus html5
>> specification should say what UAs are espected to do with such new
>> attributes.
>
> RDFa doesn't require any special support beyond the special support 
> that is required for Microformats. i.e. nothing. User agents are free 
> to ignore the RDFa attributes. In that sense, RDFa already "works" in 
> pretty much every existing browser, even going back to dinosaurs like 
> Mosaic and UdiWWW.
>
> Agents are of course free to offer more than that. Look at what they 
> do with Microformats: Firefox for instance offers an API to handle 
> Microformats embedded on a page; Internet Explorer offers its "Web 
> Slices" feature.
>

Well, at the beginning of this thread the possible need to interchange 
RDF metadata and merge triples from different vocabularies was suggested 
as a use case for RDFa serialization of RDF, and this would hint a 
requirement for supporting an  RDFa processor in every conforming UA. 
This also opens a question about what else might be needed beside 
collecting triples (is an API to build custom query applications enough, 
or should some query feature be provided by browsers? are there possible 
problems involved (like possible spam through fake metadata in cached 
ads)? possible solutions to prevent or moderate it?).

If, otherwise, nothing special must be done by browsers with RDFa 
attributes, and instead their main use is for script or plugin or 
server-side computations, or for "free" support by UA, these ones would 
be no way different from any other kind of custom attributes (thus 
should a validation requirement be let's accept every attribute?), 
herein included data-*, but for the /intended use/, which may make the 
difference but is something only a human can understand, and no 
validator can check (from this point of view, validating RDFa 
attributes, whatever else attribute, or just html5 attributes and custom 
data-* ones would be the same, as validating would not be a concern as 
it isn't for proprietary CSS extensions).

>> For what concerns html serialization, in particular, I'd consider 
>> some code like [...] which is rendered properly
>
>
> Is it though? Try adding the following CSS:
>
>     span[property="cal:summary"] { font-weight: bold; }
>
> And you'll see that CSS doesn't cope with a missing ending tag in that 
> situation either.
>
> If you miss out a non-optional end tag, then funny things will happen 
> - RDFa isn't immune to that problem, but neither is the DOM model, 
> CSS, microformats, or anything else that relies on knowing where 
> elements end. A better comparison would be a missing </p> tag, which 
> is actually allowed in HTML, and HTML-aware RDFa processors can 
> generally handle just fine.

That's definetely *not* the same issue. As I've replied in a previous 
mail, people *do not* need proper styling to understend prose, they just 
need to understand the prose language, then their /brains/ will cope 
with the rest, thus the above example results in some acceptable 
graceful degradation (it may or may not be the wanted presentation, 
depending on where the closing </span> was to be positioned (it wouldn't 
be the right presentation in this case), but it is not too harmful 
anyway). Bots based on metadata, instead *do need* reliable metadata to 
work properly, unless they're made smart enough to debug the code 
they're fed (should Artificial Intelligence be a requirement? - no 
sarcasm here).

If broken/wrong presentation caused by a missing end tag had ever been 
an issue, html-serialization would have been deprecated in favour of 
xml-one (if something really "problematic" happened, authors would 
notice it on their very first test by opening a page in a browser, 
whereas an extensive and complete debug for triples might be an odd 
problem in a large document). In contrast with that, any break in 
metadata semantics caused by html-serialization can only be a sever 
issue for a metadata-based bot (because it needs accurate metadata, 
while a non-very-accurate presentation is not a great concern for human 
beings in most cases, and if no particular presentation is attached to 
those spans, but instead they're used just to add semantics through 
metadata, as it happens to embedd RDF through RDFa attributes, a 
side-effect may arise), thus html-serialization may be more prone to 
side-effects than xml-serialization (which stops on validation errors, 
being in turn a possible cause for side-effects with metadata), from 
this point of view -- that is, since RDFa semantics is more reliable in 
a more well-formed document, xml-serialization might help to debug some 
errors, while it is not a strict requirement for content presentation, 
and instead finding more or less emboldened words is better for users 
than finding a page which is not rendered at all, thus the differences 
between xhtml and html.

But if it's or will be agreed that inaccurate metadata are reliable, or 
that uncertain reliablility is not an issue for wide-scale semantic web 
applications, well, I really don't know what to say apart than I just 
have a different opinion.

However, that was just the first example I was able to produce just to 
give an idea; better examples can surely be thought out. What if, for 
instance, foster parenting or adoption agency caused metadata to be put 
far from (part of) their correspondent data? Style is inherited, but a 
wrong triple is a wrong triple (from this perspective, a parse error 
/might/ highlight some misplaced metadata more quickly than a raw debug 
of triples).

My point is that html-serialization is enough robust with respect to 
presentational issues, in most cases (it's the same for non-screen 
media), but it might not be the same for RDFa modelled metadata, which 
require a greater "well-formedness" than content presentation to be 
enough reliable, since RDFa is conceived with the purpose to allow RDF 
serialization into xml documents in first place, without the possible 
validation problems arising by direct use of xml-serialized RDF, and as 
an alternative to RELAX NG (since strict xml parsers, as for xhtml, are 
more diffused) -- it's in the first chapter of RDFa specification: 
"1.Motivation".

That is, RDFa is born as an xml-related feature in primis, thus I think 
that concerning whether it can work as well in another kind of document 
(not if it may work, but if it may work as well in different documents 
or if it can work better in some than in other ones) is legitimate -- of 
course the same concern may apply to eRDF as well as to other kinds of 
metadata.

>
>> considering RDFa relies on namespaces (thus,
>> adding RDFa attributes to HTML5 spec would require some features from
>> xml extensibility to be added to html serialization).
>
>
> RDFa *does not* rely on XML namespaces. RDFa relies on eight 
> attributes: about, rel, rev, property, datatype, content, resource and 
> typeof. It also relies on a CURIE prefix binding mechanism. In XHTML 
> and SVG, RDFa happens to use XML namespaces as this mechanism, because 
> they already existed and they were convenient. In non-XML markup 
> languages, the route to define CURIE prefixes is still to be decided, 
> though discussions tend to be leaning towards something like:
>
> <html prefix="dc=http://purl.org/dc/terms/ 
> foaf=http://xmlns.com/foaf/0.1/">
> <address rel="foaf:maker" rev="foaf:made">This document was made by <a 
> href="http://joe.example.com" typeof="foaf:Person" rel="foaf:homepage" 
> property="foaf:name">Joe Bloggs</a>.</address>
> </html>
>

Well, yes, that's a possible solution to be considered. Anyway, that 
would require (at least) another new attribute to be specc'ed out, with 
possible new concerns. For instance, a missing space between prefix/URI 
pairs might compromise its good parsing (while space separated curies, 
for instance, being shorter than absolute URIs, can focus a major 
attention on typing errors in hand-written code, but this is a 
subtlety), thus a separate attribute for each URI might be more robust 
(for instance something like xmlns-* or just ns-* in the <html> tag, 
similar to xmlns:* but not clashing with xml namespace mechanism, on the 
same line as data-* but with a different "scope"). Also, something like 
the eRDF use of <link> elements to declare namespaces (or mappings from 
prefixes to curies, to be more consistent with RDFa conventions) inside 
the head element might work, because an html document is likely to 
present such declarations once at the beginning. However, each solution 
would have its own "pros" and "cons", wile xml namespaces perfectly fit 
the purpose, even because (one of) their main use is to represent 
prefixed attributes or elements names taken from an RDF vocabulary which 
is in turn an XML 'format' and to embed them in another kind of 
document, that is to represent something coming from a different namespace.

> This discussion seems to be about "should/can RDFa work in HTML5?" 
> when in fact, RDFa already can and does work in HTML5 - there are 
> approaching a dozen interoperable implementations of RDFa, the 
> majority of which seem to handle non-XHTML HTML. Assuming that people 
> see value in RDFa, and assuming that the same people see value in 
> using HTML5, then these people will use RDFa in HTML5. The question we 
> should be discussing is not "should it work?" (because it already 
> does), but rather, "should it validate?"
>

There should be also people seeing value in eRDF, at least enough people 
for eRDF being supported by SearchMonkey. It is sure that these people 
see value in using eRDF within html documents, since eRDF is conceived 
to work with HTML "natively", that is without any need to change HTML 
(by introducing new attributes or using unrecognized ones); 
nevertheless, eRDF can't be valid HTML5 because of the "profile" 
attribute, which has been dropped. Should eRDF validate instead? Should 
we prefere eRDF to RDFa or viceversa? Should we treat them the very same 
way? Or should we just wait and see which one works better for people, 
to avoid an early specification of something later possibly 
demonstrating to be less useful than originally thought, for instance 
because most people decided to use something else?

WBR, Alex

 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8551&d=12-1

attached mail follows:



Ian Hickson wrote:
> 
>> The question we should be discussing is not "should it work?" (because 
>> it already does), but rather, "should it validate?"
> 
> No, the question is "what problem are we solving?". Talking about RDFa, 
> RDF, eRDF, Microformats, and so forth doesn't answer this question.
> 
> The question "should it validate" is the question "do we want to solve the 
> problem and is this the right solution", which is a question we can't 
> answer without actually knowing what the problem is.
> 
> So far, all I really know is that the problem is apparently obvious.
> 

My understanding of the use-case, based on discussions so far, is:

- Allow authors to embed annotations in HTML documents such that RDF 
triples can be unambiguously extracted from human-readable data without 
duplicating the data, and thus ensuring that the machine-readable data 
and the human-readable data remain in sync.

The disconnect you're facing is that the proposers of RDFa consider the 
ability to encode RDF triples to be a goal, while you consider RDF 
triples to be a solution to a (as-yet-undetermined) higher-level 
problem. They take RDF as a given, while you do not. They have already 
solved some problems with RDF and wish only to adapt this generalized 
solution to work in HTML, while you wish to re-solve all of these 
problems from the ground up.

Would you agree with this analysis?

If this is accurate, then it's difficult to see how continued discussion 
on this topic can be productive.


attached mail follows:



Ian Hickson wrote:
> 
>> They have already solved some problems with RDF and wish only to adapt 
>> this generalized solution to work in HTML, while you wish to re-solve 
>> all of these problems from the ground up.
> 
> I don't necessarily wish to resolve the problems -- if they have existing 
> good solutions, I'm all in favour of reusing them. I just want to know 
> what those problems are that we're solving, so that we can make sure that 
> the solutions we're adopting are in fact solving the problems we want to 
> solve. It would be irresponsible to add features without knowing why.
> 

I would assume that our resident proponents are already satisfied that 
their higher-level problem have been solved, and this is why they're 
frustrated that you won't just let them map their existing solutions 
into HTML all in one fell swoop.

I'm not sure I'd put myself into the "RDF proponent" bucket, but I do 
know one use-case of RDF that I've encountered frequently so I'll post 
it as a starting point.

The FOAF schema for RDF[0] addresses the problem of making personal 
profile data machine-readable along with some of the relationships 
between people. From the outside looking in, it seems that the goal they 
set themselves was to make machine-readable the sort of information you 
find on a social networking site.

One problem this can solve is that an agent can, given a URL that 
represents a person, extract some basic profile information such as the 
person's name along with references to other people that person knows. 
This can further be applied to allow a user who provides his own URL 
(for example, by signing in via OpenID) to bootstrap his account from 
existing published data rather than having to re-enter it.

Google Social Graph API[1] apparently makes use of FOAF (when serialized 
as XML) as one of the sources of data so that given a URL that 
represents a person it can return a list of URLs that represent friends 
of that person.

The Google Profiles application[2] makes use of the output of the Social 
Graph API to suggest URLs that a user might want to list on his profile 
page, so the user only needs to fill in a couple of URLs by hand.

So, to distill that into a list of requirements:

- Allow software agents to extract profile information for a person as 
often exposed on social networking sites from a page that "represents" 
that person.

   There is a number of existing solutions for this:
     * FOAF in RDF serialized as XML, Turtle, RDFa, eRDF, etc
     * The vCard format
     * The hCard microformat
     * The PortableContacts protocol[3]
     * Natural Language Processing of HTML documents

- Allow software agents to determine who a person lists as their friends 
given a page that "represents" that person.

   Again, there are competing solutions:
     * FOAF in RDF serialized as XML, Turtle, RDFa, eRDF, etc
     * The XFN microformat[4]
     * The PortableContacts protocol[3]
     * Natural Language Processing of HTML documents

-----------------------------------------------

Assuming that the above is a convincing problem domain, now let's add in 
the following requirement:

- Allow the above to be encoded without duplicating the data in both 
machine-readable and human-readable forms.

Now our solution list is reduced to (assuming we consider both 
requirements together):
     * FOAF in RDF serialized as RDFa or eRDF
     * The hCard microformat + the XFN microformat
     * Natural Language Processing of HTML documents

All three of the above options address the use-cases as I stated them -- 
the Social Graph API apparently uses all three if you're willing to 
consider a MySpace-specific "screen-scraper" as Natural Language 
Processing -- so what would be the advantages of the first solution?

  * Existing RDF-based systems can use an off-the-shelf RDFa or eRDF 
parser and get the same data model (RDF triples of FOAF predicates) that 
they were already getting from the XML and Turtle RDF serializations, 
reducing the amount of additional work that must be done to consume this 
format.

  * FOAF has an extensive vocabulary that's based on fields that have 
been observed on social networking sites, while hCard is built on vCard 
which has a more constrained scope intended for the sort of entries 
you'd expect to find in an "address book".

  * FOAF has been adopted -- usually in the RDF-XML serialization -- by 
some number of social networking sites (e.g. LiveJournal) so they are 
presumably already somewhat familiar with the FOAF vocabulary and may 
therefore be able to adopt it more easily in the RDFa or eRDF 
serializations.

Though there are of course also some disadvantages:

  * Some sites are already publishing XFN and/or hCard so consuming 
software would need to continue to support these in addition to 
FOAF-in-HTML-somehow, which is more work than supporting only XFN and 
hCard. (In other words, "XFN/hCard already work today")

  * RDFa requires extensions to the HTML language, while XFN, hCard and 
NLP do not.

  * Many existing FOAF parsers are not actually RDF parsers but are 
rather using stock XML parsers and assuming a particular tree layout, so 
they would not be able to reuse any code in processing triples from RDFa 
or eRDF.

-------------------------------------

Is this the sort of thing you're looking for, Ian?

Much of the above section could be applied to any other RDF vocabulary 
with a bit of search and replace, but I'll leave that to others since 
FOAF is the only RDF vocabulary with which I have any experience.

(and if I've misrepresented any of the facts about FOAF or RDF I'm happy 
to be corrected. I'm writing this only in an attempt to move the 
discussion forward; I'm currently neutral on whether RDFa should be 
adopted into HTML5.)

[0]http://www.foaf-project.org/
[1]http://code.google.com/apis/socialgraph/
[2]http://www.google.com/support/accounts/bin/answer.py?answer=97703&hl=en
[3]http://portablecontacts.net/
[4]http://www.gmpg.org/xfn/


attached mail follows:



On Sun, 11 Jan 2009, Martin Atkins wrote:
> 
> One problem this can solve is that an agent can, given a URL that 
> represents a person, extract some basic profile information such as the 
> person's name along with references to other people that person knows. 
> This can further be applied to allow a user who provides his own URL 
> (for example, by signing in via OpenID) to bootstrap his account from 
> existing published data rather than having to re-enter it.
> 
> So, to distill that into a list of requirements:
> 
> - Allow software agents to extract profile information for a person as often
> exposed on social networking sites from a page that "represents" that person.
> 
> - Allow software agents to determine who a person lists as their friends 
> given a page that "represents" that person.
>
> - Allow the above to be encoded without duplicating the data in both 
> machine-readable and human-readable forms.
> 
> Is this the sort of thing you're looking for, Ian?

Yes, the above is perfect. (I cut out the bits that weren't really "the 
problem" from the quote above -- the above is what I'm looking for.)

The most critical part is "allow a user who provides his own URL to 
bootstrap his account from existing published data rather than having to 
re-enter it". The one thing I would add would be a scenario that one would 
like to be able to play out, so that we can see if our solution would 
enable that scenario.

For example:

   "I have an account on social networking site A. I go to a new social 
   networking site B. I want to be able to automatically add all my 
   friends from site A to site B."

There are presumably other requirements, e.g. "site B must not ask the 
user for the user's credentials for site A" (since that would train people 
to be susceptible to phishing attacks). Also, "site A must not publish the 
data in a manner that allows unrelated users to obtain privacy-sensitive 
data about the user", for example we don't want to let other users 
determine relationships that the user has intentionally kept secret [1].

It's important that we have these scenarios so that we can check if the 
solutions we consider are actually able to solve these problems, these 
scenarios, within the constraints and requirements we have.

[1] http://w2spconf.com/2008/papers/s3p2.pdf

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

attached mail follows:



On Jan 11, 2009, at 14:01, Toby A Inkster wrote:

> RDFa *does not* rely on XML namespaces. RDFa relies on eight  
> attributes: about, rel, rev, property, datatype, content, resource  
> and typeof. It also relies on a CURIE prefix binding mechanism. In  
> XHTML and SVG, RDFa happens to use XML namespaces as this mechanism,  
> because they already existed and they were convenient.

Convenience is debatable. In any case, it is rather disingenuous to  
say that RDFa doesn't rely on XML Namespaces when all that has been  
defined so far relies of attributes whose qname contains the substring  
"xmlns".

> In non-XML markup languages, the route to define CURIE prefixes is  
> still to be decided, though discussions tend to be leaning towards  
> something like:
>
> <html prefix="dc=http://purl.org/dc/terms/ foaf=http://xmlns.com/foaf/0.1/ 
> ">
> <address rel="foaf:maker" rev="foaf:made">This document was made by  
> <a href="http://joe.example.com" typeof="foaf:Person"  
> rel="foaf:homepage" property="foaf:name">Joe Bloggs</a>.</address>
> </html>

Unless this syntax were also used for XHTML, the above would be in  
violation of the DOM Consistency Design Principle of the W3C HTML WG.

> This discussion seems to be about "should/can RDFa work in HTML5?"  
> when in fact, RDFa already can and does work in HTML5 - there are  
> approaching a dozen interoperable implementations of RDFa, the  
> majority of which seem to handle non-XHTML HTML.

Those implementations violate the software implementation reuse  
principle that motivates the DOM Consistency Design Principle. (The  
software reuse principle being that the same code path be used for  
both HTML and XHTML on layers higher than the parser.)

The prefix mapping mechanism of CURIEs has been designed with  
disregard towards this software reuse principle (in use in Gecko,  
WebKit and, I gather, Presto) that should have been known to anyone  
working on Web-related specs far before "DOM Consistency" was written  
into the Design Principles of the HTML WG.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



Martin Atkins wrote:

>   * Some sites are already publishing XFN and/or hCard so consuming
> software would need to continue to support these in addition to
> FOAF-in-HTML-somehow, which is more work than supporting only XFN and
> hCard.

Mitigating this though is GRDDL which allows the hCard+XFN to be  
parsed using a subset of FOAF (e.g. http://weborganics.co.uk/hFoaF/)  
and thus merged with FOAF available as RDF/XML, RDFa, etc.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>




attached mail follows:



Martin Atkins wrote:
> One problem this can solve is that an agent can, given a URL that
> represents a person, extract some basic profile information such as the
> person's name along with references to other people that person knows.
> This can further be applied to allow a user who provides his own URL
> (for example, by signing in via OpenID) to bootstrap his account from
> existing published data rather than having to re-enter it.
>
> So, to distill that into a list of requirements:
>
> - Allow software agents to extract profile information for a person as often
> exposed on social networking sites from a page that "represents" that person.
>
> - Allow software agents to determine who a person lists as their friends
> given a page that "represents" that person.
>
> - Allow the above to be encoded without duplicating the data in both
> machine-readable and human-readable forms.
>
> Is this the sort of thing you're looking for, Ian?
>
>Much of the above section could be applied to any other RDF vocabulary
>with a bit of search and replace, but I'll leave that to others since
>FOAF is the only RDF vocabulary with which I have any experience.

Why we must restrict the use case to a single vocabulary
or analyze all the possibile vocabularies?

I think it's be better to "generalize" the problem
and find a unique solution for human/machine.

I tried to expose this here...

http://lists.w3.org/Archives/Public/public-html/2009Jan/0082.html

...where the fundamental problem is described in this way:

- User agents must allow users to see that there are "semantic-links"
(connections to semantically structured informations)
in a HTML document/application. Consequently
user agents must allow users to "follow" the semantic-link,
(access/interact with the linked data, embedded or external)
and this involves primarily the ability to:
a) view the informations
b) select the informations
c) copy the informations in the clipboard
d) drag and drop the informations
e) send that informations
to another web application
(or to OS applications)
selected by the user.

-- 
Giovanni Gentili

attached mail follows:



Giovanni Gentili wrote:

> Why we must restrict the use case to a single vocabulary
> or analyze all the possibile vocabularies?
> 
> I think it's be better to "generalize" the problem
> and find a unique solution for human/machine.

The issue when trying to abstract problems is that you can end up doing 
"architecture astronautics"; you concentrate on making generic ways to 
build solutions to weakly constrained problems without any attention to 
the details of those problems that make them unique. The solutions that 
are so produced often have the theoretical capacity to solve broad 
classes of problem, but are often found to be poor at solving any 
specific individual problem.

By looking at actual use cases we can hope to retain enough detail in 
the requirements that we satisfy at least some use cases well, rather 
than wasting out time building huge follies that serve no practical 
purpose to anyone.

attached mail follows:



James Graham:
> The issue when trying to abstract problems is that you can end up doing
> "architecture astronautics"; you concentrate on making generic ways to build
> solutions to weakly constrained problems without any attention to the
> details of those problems that make them unique.

I think the right level, like in my proposal,
is greatly under "astronautics"
but no so low as "single vocabularies".

-- 
Giovanni Gentili

attached mail follows:



Giovanni Gentili wrote:
> James Graham:
>> The issue when trying to abstract problems is that you can end up doing
>> "architecture astronautics"; you concentrate on making generic ways to build
>> solutions to weakly constrained problems without any attention to the
>> details of those problems that make them unique.
> 
> I think the right level, like in my proposal,
> is greatly under "astronautics"
> but no so low as "single vocabularies".
> 

I rather disagree. How we interact with information depends 
fundamentally on the type of information. If the information is a set of 
geographical coordinates, for example, the set of useful interactions 
are rather different to those for a bibliographic entry. Trying to 
pretend that the two problems are just interchangeable instances of the 
same "semantically structured information" problem is likely to hide the 
important distinctions between the two problem domains.

attached mail follows:



Per discussion with Ian, I am posting a link to my take on the RDFa 
discussion to this list.

http://realtech.burningbird.net/semantic-web/semantic-markup/stop-justifying-rdfa

Thank you

Shelley Powers

attached mail follows:



The debate about RDFa highlights a disconnect in the decision making 
related to HTML5.

The purpose behind RDFa is to provide a way to embed complex information 
into a web document, in such a way that a machine can extract this 
information and combine it with other data extracted from other web 
pages. It is not a way to document private data, or data that is meant 
to be used by some JavaScript-based application. The sole purpose of the 
data is for external extraction and combination.

An earlier email between Martin Atkins and Ian Hickson had the following:

"On Sun, 11 Jan 2009, Martin Atkins wrote:
 >
 > One problem this can solve is that an agent can, given a URL that
 > represents a person, extract some basic profile information such as the
 > person's name along with references to other people that person knows.
 > This can further be applied to allow a user who provides his own URL
 > (for example, by signing in via OpenID) to bootstrap his account from
 > existing published data rather than having to re-enter it.
 >
 > So, to distill that into a list of requirements:
 >
 > - Allow software agents to extract profile information for a person 
as often
 > exposed on social networking sites from a page that "represents" that 
person.
 >
 > - Allow software agents to determine who a person lists as their friends
 > given a page that "represents" that person.
 >
 > - Allow the above to be encoded without duplicating the data in both
 > machine-readable and human-readable forms.
 >
 > Is this the sort of thing you're looking for, Ian?

Yes, the above is perfect. (I cut out the bits that weren't really "the
problem" from the quote above -- the above is what I'm looking for.)

The most critical part is "allow a user who provides his own URL to
bootstrap his account from existing published data rather than having to
re-enter it". The one thing I would add would be a scenario that one would
like to be able to play out, so that we can see if our solution would
enable that scenario.

For example:

   "I have an account on social networking site A. I go to a new social
   networking site B. I want to be able to automatically add all my
   friends from site A to site B."

There are presumably other requirements, e.g. "site B must not ask the
user for the user's credentials for site A" (since that would train people
to be susceptible to phishing attacks). Also, "site A must not publish the
data in a manner that allows unrelated users to obtain privacy-sensitive
data about the user", for example we don't want to let other users
determine relationships that the user has intentionally kept secret [1].

It's important that we have these scenarios so that we can check if the
solutions we consider are actually able to solve these problems, these
scenarios, within the constraints and requirements we have."


It would seem that Ian agrees with a need to both a) provide a way to 
document complex information in a consistent, machine readable form and 
that b) the purpose of this data is for external consumption, rather 
than internal use. Where the disconnect comes in is he believes that 
RDF, and the web page serialization technique, RDFa, are only one of a 
set of possible solutions.

Yet at the same time, he references how the MathML and SVG people 
provide sufficient use cases to justify the inclusion of both of these 
into HTML5. But what is MathML. What does it solve? A way to include 
mathematical formula into a document in a formatted manner. What is SVG? 
A way to embed vector graphics into a web page, in such a way that the 
individual elements described by the graphics can become part of the 
overall DOM.

So, why accept that we have to use MathML in order to solve the problems 
of formatting mathematical formula? Why not start from scratch, and 
devise a new approach?

So, why accept that we have to use SVG in order to solve the problems of 
vector graphics? Why not start from scratch, and devise a new approach?

Come to think of it, I think we should also question the use of the 
canvas element. After all, if the problem set is that we need the 
ability to animate graphics in a web page using a non-proprietary 
technology, then wouldn't something like SVG work for this purpose? 
Isn't the canvas element redundant? But then, perhaps we should start 
over from the beginning and just create a new graphics capability from 
scratch, and reject both canvas and SVG.

We don't reject MathML, though. Neither do we reject SVG or canvas. Or 
any other of a number of entities being included in HTML5, including 
SQL. Why? Because they have a history of use, extensive documentation as 
to purpose and behavior, and there are a considerable number of 
implementations that support the specifications. It doesn't make sense 
to start from scratch. It makes more sense to make use of what already 
works.

I have to ask, then: why do we isolate RDF, and RDFa for special 
handling? If we can accept that SQL is a natural database query 
mechanism, and SVG is a natural for vector graphics, and the canvas 
element is the proper choice for a script-enabled bitmaps, and 
MathML...well, you get the picture-if we can accept that these mature, 
well documented representatives of each of their genres as the de facto 
implementation, enough to incorporate each into HTML5, why then do we 
demand that RDF and its web page serialization technique, RDFa, must 
"prove" themselves, when we don't demand the same from other external 
objects and specifications?

To do so is not consistent. To continue to do so demonstrates that 
perhaps other issues are at play in regards to RDF/RDFa.

Martin provided a use case that Ian acknowledges is justified. Ipso 
facto, we do not need to continue providing use cases for this type of 
requirement. We have established that the requirement/need/desire to 
incorporate data into a web page that is consistently machine readable, 
which can be consistently extracted, and consistently combined with data 
from other documents using automated processes is a legitimate need. RDF 
was designed specifically for this purpose, is a mature specification, 
with extensive documentation, and one can find many different 
implementations of its use. The use of RDF for FOAF is just one of many 
uses, RSS 1.0 was another, and a version of RDF embedded within photos, 
CC licensing--these are all based on the same model.

In other words, if we accept that SVG is the de facto implementation of 
vector graphics (as compared to something such as, say, VML), and we 
accept the same for MathML, the canvas element, SQL, and so on, to not 
accept RDF as the de facto implementation for the purpose behind which 
it was designed, is to single out RDF/RDFa for "special handling" within 
the group. To demand more from it, then has been demanded from any other 
element included in HTML5.

In particular, as has been documented elsewhere, very little is needed 
to support RDFa within HTML5. The requirements are much less than those 
for the canvas element, SVG, MathML, and even SQL. So the task, itself, 
is not daunting. Not as daunting as, say, the alt attribute.

This then returns us to my earlier supposition: To not support RDF/RDFa 
as the de facto implementation of complex, structured data is not 
consistent. To continue to do so demonstrates that perhaps other issues 
are at play in regards to RDF/RDFa. Such inconsistencies are not in the 
best interest when developing a new specification meant for widespread 
use on the web. If, as I believe, the inconsistency reflects an 
underlying bias against the concept behind RDF, which is that true web 
semantics is based on structured data, not natural language processing, 
or not exclusively based on natural language processing, then I believe 
it's important to highlight such bias, and deal with it accordingly.

Shelley



attached mail follows:



On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers
<shelleyp@burningbird.net> wrote:
> The debate about RDFa highlights a disconnect in the decision making related
> to HTML5.

Perhaps.  Or perhaps not.  I am far from an apologist for Hixie, (nor
for that matter and I a strong advocate for RDF), but I offer the
following question and observation.

> The purpose behind RDFa is to provide a way to embed complex information
> into a web document, in such a way that a machine can extract this
> information and combine it with other data extracted from other web pages.
> It is not a way to document private data, or data that is meant to be used
> by some JavaScript-based application. The sole purpose of the data is for
> external extraction and combination.

So, I take it that it isn't essential that RDFa information be
included in the DOM?  This is not rhetorical: I honestly don't know
the answer to this question.

> So, why accept that we have to use MathML in order to solve the problems of
> formatting mathematical formula? Why not start from scratch, and devise a
> new approach?

Ian explored (and answered) that here:

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-April/014372.html

Key to Ian's decision was the importance of DOM integration for this
vocabulary.  If DOM integration is essential for RDFa, then perhaps
the same principles apply.  If not, perhaps some other principles may
apply.

- Sam Ruby

attached mail follows:



On 17/1/09 19:27, Sam Ruby wrote:
> On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers
> <shelleyp@burningbird.net>  wrote:
>> The debate about RDFa highlights a disconnect in the decision making related
>> to HTML5.
>
> Perhaps.  Or perhaps not.  I am far from an apologist for Hixie, (nor
> for that matter and I a strong advocate for RDF), but I offer the
> following question and observation.
>
>> The purpose behind RDFa is to provide a way to embed complex information
>> into a web document, in such a way that a machine can extract this
>> information and combine it with other data extracted from other web pages.
>> It is not a way to document private data, or data that is meant to be used
>> by some JavaScript-based application. The sole purpose of the data is for
>> external extraction and combination.
>
> So, I take it that it isn't essential that RDFa information be
> included in the DOM?  This is not rhetorical: I honestly don't know
> the answer to this question.

Good question. I for one expect RDFa to be accessible to Javascript.

http://code.google.com/p/rdfquery/wiki/Introduction -> 
http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a 
nice example of code that does something useful in this way.

cheers,

Dan

--
http://danbri.org/

attached mail follows:



Dan Brickley wrote:
> On 17/1/09 19:27, Sam Ruby wrote:
>> On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers
>> <shelleyp@burningbird.net>  wrote:
>>> The debate about RDFa highlights a disconnect in the decision making 
>>> related
>>> to HTML5.
>>
>> Perhaps.  Or perhaps not.  I am far from an apologist for Hixie, (nor
>> for that matter and I a strong advocate for RDF), but I offer the
>> following question and observation.
>>
>>> The purpose behind RDFa is to provide a way to embed complex 
>>> information
>>> into a web document, in such a way that a machine can extract this
>>> information and combine it with other data extracted from other web 
>>> pages.
>>> It is not a way to document private data, or data that is meant to 
>>> be used
>>> by some JavaScript-based application. The sole purpose of the data 
>>> is for
>>> external extraction and combination.
>>
>> So, I take it that it isn't essential that RDFa information be
>> included in the DOM?  This is not rhetorical: I honestly don't know
>> the answer to this question.
>
> Good question. I for one expect RDFa to be accessible to Javascript.
>
> http://code.google.com/p/rdfquery/wiki/Introduction -> 
> http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a 
> nice example of code that does something useful in this way.
>
> cheers,
>
> Dan
>

I agree, and appreciate Dan for pointing out a specific instance of use.

Apologies for not making the assertion explicit.

Shelley
> -- 
> http://danbri.org/
>


attached mail follows:



On Sat, Jan 17, 2009 at 1:33 PM, Dan Brickley <danbri@danbri.org> wrote:
> On 17/1/09 19:27, Sam Ruby wrote:
>>
>> On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers
>> <shelleyp@burningbird.net>  wrote:
>>>
>>> The debate about RDFa highlights a disconnect in the decision making
>>> related
>>> to HTML5.
>>
>> Perhaps.  Or perhaps not.  I am far from an apologist for Hixie, (nor
>> for that matter and I a strong advocate for RDF), but I offer the
>> following question and observation.
>>
>>> The purpose behind RDFa is to provide a way to embed complex information
>>> into a web document, in such a way that a machine can extract this
>>> information and combine it with other data extracted from other web
>>> pages.
>>> It is not a way to document private data, or data that is meant to be
>>> used
>>> by some JavaScript-based application. The sole purpose of the data is for
>>> external extraction and combination.
>>
>> So, I take it that it isn't essential that RDFa information be
>> included in the DOM?  This is not rhetorical: I honestly don't know
>> the answer to this question.
>
> Good question. I for one expect RDFa to be accessible to Javascript.
>
> http://code.google.com/p/rdfquery/wiki/Introduction ->
> http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a nice
> example of code that does something useful in this way.

The fact that this works anywhere at all today implies that little, if
any, changes to browsers is required in order to support this.  Is
that a fair statement?

I've not taken a look at the code, but have taken a quick glance at
the output using IE8.0.7000.0 beta, Safari 3.2.1/Windows, Chrome
1.0.154.43, Opera 9.63, and Firefox 3.0.5.

The page is different (as in less functional) under IE8 and Safari.
Is there something that they need to do which is not already covered
in the HTML5 specification in order to support this?

- Sam Ruby

attached mail follows:



Sam Ruby wrote:
> On Sat, Jan 17, 2009 at 1:33 PM, Dan Brickley <danbri@danbri.org> wrote:
>   
>> On 17/1/09 19:27, Sam Ruby wrote:
>>     
>>> On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers
>>> <shelleyp@burningbird.net>  wrote:
>>>       
>>>> The debate about RDFa highlights a disconnect in the decision making
>>>> related
>>>> to HTML5.
>>>>         
>>> Perhaps.  Or perhaps not.  I am far from an apologist for Hixie, (nor
>>> for that matter and I a strong advocate for RDF), but I offer the
>>> following question and observation.
>>>
>>>       
>>>> The purpose behind RDFa is to provide a way to embed complex information
>>>> into a web document, in such a way that a machine can extract this
>>>> information and combine it with other data extracted from other web
>>>> pages.
>>>> It is not a way to document private data, or data that is meant to be
>>>> used
>>>> by some JavaScript-based application. The sole purpose of the data is for
>>>> external extraction and combination.
>>>>         
>>> So, I take it that it isn't essential that RDFa information be
>>> included in the DOM?  This is not rhetorical: I honestly don't know
>>> the answer to this question.
>>>       
>> Good question. I for one expect RDFa to be accessible to Javascript.
>>
>> http://code.google.com/p/rdfquery/wiki/Introduction ->
>> http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a nice
>> example of code that does something useful in this way.
>>     
>
> The fact that this works anywhere at all today implies that little, if
> any, changes to browsers is required in order to support this.  Is
> that a fair statement?
>
> I've not taken a look at the code, but have taken a quick glance at
> the output using IE8.0.7000.0 beta, Safari 3.2.1/Windows, Chrome
> 1.0.154.43, Opera 9.63, and Firefox 3.0.5.
>
> The page is different (as in less functional) under IE8 and Safari.
> Is there something that they need to do which is not already covered
> in the HTML5 specification in order to support this?
>   

I would think we would have to go through the code to see what this 
specific instance of client-side access of the RDFa isn't working. The 
debugger I'm using with IE8 shows the problem is occuring in the jQuery 
code, not necessarily anything specific to the RDFa plugin.

I know other JavaScript libraries that work with RDFa work, at least 
with Safari. For instance:

http://www.w3.org/2006/07/SWD/RDFa/impl/js/

Since this library was vetted for IE7, would assume it would work for 
IE8, too.

Of course, the RDFa attributes aren't incorporated into HTML5, which 
means their use would result in an invalid document. And of course, if 
they were incorporated, the issue of namespace for them would have to be 
addressed as namespaces were for MathML and SVG.

Shelley
> - Sam Ruby
>
>   


attached mail follows:



On Jan 17, 2009, at 20:33, Dan Brickley wrote:

> Good question. I for one expect RDFa to be accessible to Javascript.
>
> http://code.google.com/p/rdfquery/wiki/Introduction -> http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html 
>  is a nice example of code that does something useful in this way.


Does this code run the same way on both DOMs parsed from text/html and  
application/xhtml+xml in existing browsers without at any point  
branching on a condition that is a DOM difference between text/html- 
originated and application/xhtml+xml-originated DOMs?

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



Henri Sivonen wrote:
> On Jan 17, 2009, at 20:33, Dan Brickley wrote:
>
>> Good question. I for one expect RDFa to be accessible to Javascript.
>>
>> http://code.google.com/p/rdfquery/wiki/Introduction -> 
>> http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is 
>> a nice example of code that does something useful in this way.
>
>
> Does this code run the same way on both DOMs parsed from text/html and 
> application/xhtml+xml in existing browsers without at any point 
> branching on a condition that is a DOM difference between 
> text/html-originated and application/xhtml+xml-originated DOMs?
>
I don't want to specifically look at just the one case, since it is not 
working in Safari, and IE8 and is too complex to debug right at this moment.

Generally, though, RDFa is based on reusing a set of attributes already 
existing in HTML5, and adding a few more. I would assume no differences 
in the DOM based on XHTML or HTML. The one issue that would occur has to 
do with the values assigned, not the syntax.

I put together a very crude demonstration of JavaScript access of a 
specific RDFa attribute, about. It's temporary, but if you go to my main 
web page, http://realtech.burningbird.net, and look in the sidebar for 
the click me text, it will traverse each div element looking for an 
"about" attribute, and then pop up an alert with the value of the 
attribute. I would use console rather than alert, but I don't believe 
all browsers support console, yet.

Access the page using Firefox, which is served the page as XHTML. Access 
it as IE8, which gets the page as HTML. You can tell the difference 
between my graphics are based in inline SVG, and will only show if the 
page is served as XHTML.

So, yes, with my quick, crude demonstration, DOM access is the same in 
both environments.

Shelley





attached mail follows:



On Jan 17, 2009, at 22:35, Shelley Powers wrote:

> Generally, though, RDFa is based on reusing a set of attributes  
> already existing in HTML5, and adding a few more.

Also, RDFa uses CURIEs which in turn use the XML namespace mapping  
context.

> I would assume no differences in the DOM based on XHTML or HTML.

The assumption is incorrect.

Please compare
http://hsivonen.iki.fi/test/moz/xmlns-dom.html
and
http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml

Same bytes, different media type.

> I put together a very crude demonstration of JavaScript access of a  
> specific RDFa attribute, about. It's temporary, but if you go to my  
> main web page,http://realtech.burningbird.net, and look in the  
> sidebar for the click me text, it will traverse each div element  
> looking for an "about" attribute, and then pop up an alert with the  
> value of the attribute. I would use console rather than alert, but I  
> don't believe all browsers support console, yet.

This misses the point, because the inconsistency is with attributes  
named xmlns:foo.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:




>
> The assumption is incorrect.
>
> Please compare
> http://hsivonen.iki.fi/test/moz/xmlns-dom.html
> and
> http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml
>
> Same bytes, different media type.
>
>> I put together a very crude demonstration of JavaScript access of a 
>> specific RDFa attribute, about. It's temporary, but if you go to my 
>> main web page,http://realtech.burningbird.net, and look in the 
>> sidebar for the click me text, it will traverse each div element 
>> looking for an "about" attribute, and then pop up an alert with the 
>> value of the attribute. I would use console rather than alert, but I 
>> don't believe all browsers support console, yet.
>
> This misses the point, because the inconsistency is with attributes 
> named xmlns:foo.
>
And I also said that we would have to address the issue of namespaces, 
which actually may require additional effort. I said that the addition 
of RDFa would mean the addition of some attributes, and we would have to 
deal with namespace issues. Just like the HTML5 working group is having 
to deal with namespaces with MathML and SVG. And probably the next dozen 
or so innovations that come along. That is the price for not having 
distributed extensibility.

One works the issues. I assume the same could be said of any many of the 
newer additions to HTML5. Are you then saying that this will be a 
showstopper, and there will never be either a workaround or compromise?

Shelley

attached mail follows:



On Jan 18, 2009, at 01:32, Shelley Powers wrote:

> Are you then saying that this will be a showstopper, and there will  
> never be either a workaround or compromise?


Are the RDFa TF open to compromises that involve changing the XHTML  
side of RDFa not to use attribute whose qualified name has a colon in  
them to achieve DOM Consistency by changing RDFa instead of changing  
parsing?

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



On 18/1/09 19:34, Henri Sivonen wrote:
> On Jan 18, 2009, at 01:32, Shelley Powers wrote:
>
>> Are you then saying that this will be a showstopper, and there will
>> never be either a workaround or compromise?
>
>
> Are the RDFa TF open to compromises that involve changing the XHTML side
> of RDFa not to use attribute whose qualified name has a colon in them to
> achieve DOM Consistency by changing RDFa instead of changing parsing?

I don't believe the RDFa TF are in a position to singlehandedly rescind 
a W3C Recommendation, ie. 
http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/.

What they presumably could do is propose new work items within W3C, 
which I'd guess would be more likely to be accepted if it had the active 
enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who might have 
something more to add.

Do you have an alternative design in mind, for expressing the namespace 
mappings?

cheers,

Dan

--
http://danbri.org/

attached mail follows:



On Jan 18, 2009, at 20:48, Dan Brickley wrote:

> On 18/1/09 19:34, Henri Sivonen wrote:
>> On Jan 18, 2009, at 01:32, Shelley Powers wrote:
>>
>>> Are you then saying that this will be a showstopper, and there will
>>> never be either a workaround or compromise?
>>
>>
>> Are the RDFa TF open to compromises that involve changing the XHTML  
>> side
>> of RDFa not to use attribute whose qualified name has a colon in  
>> them to
>> achieve DOM Consistency by changing RDFa instead of changing parsing?
>
> I don't believe the RDFa TF are in a position to singlehandedly  
> rescind a W3C Recommendation, ie. http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/ 
> .
>
> What they presumably could do is propose new work items within W3C,  
> which I'd guess would be more likely to be accepted if it had the  
> active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who  
> might have something more to add.
>
> Do you have an alternative design in mind, for expressing the  
> namespace mappings?


The simplest thing is not to have mappings but to put the  
corresponding absolute URI wherever RDFa uses a CURIE.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



On 18/1/09 20:07, Henri Sivonen wrote:
> On Jan 18, 2009, at 20:48, Dan Brickley wrote:
>
>> On 18/1/09 19:34, Henri Sivonen wrote:
>>> On Jan 18, 2009, at 01:32, Shelley Powers wrote:
>>>
>>>> Are you then saying that this will be a showstopper, and there will
>>>> never be either a workaround or compromise?
>>>
>>>
>>> Are the RDFa TF open to compromises that involve changing the XHTML side
>>> of RDFa not to use attribute whose qualified name has a colon in them to
>>> achieve DOM Consistency by changing RDFa instead of changing parsing?
>>
>> I don't believe the RDFa TF are in a position to singlehandedly
>> rescind a W3C Recommendation, ie.
>> http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/.
>>
>> What they presumably could do is propose new work items within W3C,
>> which I'd guess would be more likely to be accepted if it had the
>> active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who
>> might have something more to add.
>>
>> Do you have an alternative design in mind, for expressing the
>> namespace mappings?
>
> The simplest thing is not to have mappings but to put the corresponding
> absolute URI wherever RDFa uses a CURIE.

So this would be a kind of "interoperability profile" of RDFa, where 
certain features approved of by REC-rdfa-syntax-20081014 wouldn't be 
used in some hypothetical HTML5 RDFa.

If people can control their urge to use namespace abbreviations, and 
stick to URIs directly, would this make your DOM-oriented concerns go away?

cheers,

Dan

--
http://danbri.org/

attached mail follows:



Dan Brickley wrote:
> On 18/1/09 20:07, Henri Sivonen wrote:
>> On Jan 18, 2009, at 20:48, Dan Brickley wrote:
>>
>>> On 18/1/09 19:34, Henri Sivonen wrote:
>>>> On Jan 18, 2009, at 01:32, Shelley Powers wrote:
>>>>
>>>>> Are you then saying that this will be a showstopper, and there will
>>>>> never be either a workaround or compromise?
>>>>
>>>>
>>>> Are the RDFa TF open to compromises that involve changing the XHTML 
>>>> side
>>>> of RDFa not to use attribute whose qualified name has a colon in 
>>>> them to
>>>> achieve DOM Consistency by changing RDFa instead of changing parsing?
>>>
>>> I don't believe the RDFa TF are in a position to singlehandedly
>>> rescind a W3C Recommendation, ie.
>>> http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/.
>>>
>>> What they presumably could do is propose new work items within W3C,
>>> which I'd guess would be more likely to be accepted if it had the
>>> active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who
>>> might have something more to add.
>>>
>>> Do you have an alternative design in mind, for expressing the
>>> namespace mappings?
>>
>> The simplest thing is not to have mappings but to put the corresponding
>> absolute URI wherever RDFa uses a CURIE.
>
> So this would be a kind of "interoperability profile" of RDFa, where 
> certain features approved of by REC-rdfa-syntax-20081014 wouldn't be 
> used in some hypothetical HTML5 RDFa.
>
> If people can control their urge to use namespace abbreviations, and 
> stick to URIs directly, would this make your DOM-oriented concerns go 
> away?

Took five minutes to make this change in my template. Ran through 
validator.nu. Results:

Doesn't like the content-type. Didn't like profile on head. Having to 
remove the profile attribute in my head element limits usability, but 
I'm not going to  throw myself on the sword for this one.

Doesn't like property, doesn't like about. These are the RDFa attributes 
I'm using. The RDF extractor doesn't care that I used the URIs directly.

Didn't seem to mind SVG, but a value of "none" is a valid value for 
preserveAspectRatio.

Shelley
>
> cheers,
>
> Dan
>
> -- 
> http://danbri.org/
>


attached mail follows:



On 18/1/09 21:04, Shelley Powers wrote:
> Dan Brickley wrote:
>> On 18/1/09 20:07, Henri Sivonen wrote:
>>> On Jan 18, 2009, at 20:48, Dan Brickley wrote:
>>>
>>>> On 18/1/09 19:34, Henri Sivonen wrote:
>>>>> On Jan 18, 2009, at 01:32, Shelley Powers wrote:
>>>>>
>>>>>> Are you then saying that this will be a showstopper, and there will
>>>>>> never be either a workaround or compromise?
>>>>>
>>>>>
>>>>> Are the RDFa TF open to compromises that involve changing the XHTML
>>>>> side
>>>>> of RDFa not to use attribute whose qualified name has a colon in
>>>>> them to
>>>>> achieve DOM Consistency by changing RDFa instead of changing parsing?
>>>>
>>>> I don't believe the RDFa TF are in a position to singlehandedly
>>>> rescind a W3C Recommendation, ie.
>>>> http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/.
>>>>
>>>> What they presumably could do is propose new work items within W3C,
>>>> which I'd guess would be more likely to be accepted if it had the
>>>> active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who
>>>> might have something more to add.
>>>>
>>>> Do you have an alternative design in mind, for expressing the
>>>> namespace mappings?
>>>
>>> The simplest thing is not to have mappings but to put the corresponding
>>> absolute URI wherever RDFa uses a CURIE.
>>
>> So this would be a kind of "interoperability profile" of RDFa, where
>> certain features approved of by REC-rdfa-syntax-20081014 wouldn't be
>> used in some hypothetical HTML5 RDFa.
>>
>> If people can control their urge to use namespace abbreviations, and
>> stick to URIs directly, would this make your DOM-oriented concerns go
>> away?
>
> Took five minutes to make this change in my template. Ran through
> validator.nu. Results:
>
> Doesn't like the content-type. Didn't like profile on head. Having to
> remove the profile attribute in my head element limits usability, but
> I'm not going to throw myself on the sword for this one.
>
> Doesn't like property, doesn't like about. These are the RDFa attributes
> I'm using. The RDF extractor doesn't care that I used the URIs directly.

This sounds encouraging. Thanks for taking the time to try the 
experiment,  Shelley. But ... to be clear, are you putting full URIs in 
the @property attribute too? In 
http://www.w3.org/TR/rdfa-syntax/#s_curieprocessing it says '@property, 
@datatype and @typeof support only CURIE values.'

(Can you post an example?)

Reading ...
"""Many of the attributes that hold URIs are also able to carry 'compact 
URIs' or CURIEs. A CURIE is a convenient way to represent a long URI, by 
replacing a leading section of the URI with a substitution token. It's 
possible for authors to define a number of substitution tokens as they 
see fit; the full URI is obtained by locating the mapping defined by a 
token from a list of in-scope tokens, and then simply concatenating the 
second part of the CURIE onto the mapped value."""

... I guess the fact that @property is supposed to be CURIE-only isn't a 
problem with parsers since this can be understood as a CURIE with no (or 
empty) substitution token.

cheers,

Dan

--
http://danbri.org/

attached mail follows:



Dan Brickley wrote:
> On 18/1/09 21:04, Shelley Powers wrote:
>> Dan Brickley wrote:
>>> On 18/1/09 20:07, Henri Sivonen wrote:
>>>> On Jan 18, 2009, at 20:48, Dan Brickley wrote:
>>>>
>>>>> On 18/1/09 19:34, Henri Sivonen wrote:
>>>>>> On Jan 18, 2009, at 01:32, Shelley Powers wrote:
>>>>>>
>>>>>>> Are you then saying that this will be a showstopper, and there will
>>>>>>> never be either a workaround or compromise?
>>>>>>
>>>>>>
>>>>>> Are the RDFa TF open to compromises that involve changing the XHTML
>>>>>> side
>>>>>> of RDFa not to use attribute whose qualified name has a colon in
>>>>>> them to
>>>>>> achieve DOM Consistency by changing RDFa instead of changing 
>>>>>> parsing?
>>>>>
>>>>> I don't believe the RDFa TF are in a position to singlehandedly
>>>>> rescind a W3C Recommendation, ie.
>>>>> http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/.
>>>>>
>>>>> What they presumably could do is propose new work items within W3C,
>>>>> which I'd guess would be more likely to be accepted if it had the
>>>>> active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who
>>>>> might have something more to add.
>>>>>
>>>>> Do you have an alternative design in mind, for expressing the
>>>>> namespace mappings?
>>>>
>>>> The simplest thing is not to have mappings but to put the 
>>>> corresponding
>>>> absolute URI wherever RDFa uses a CURIE.
>>>
>>> So this would be a kind of "interoperability profile" of RDFa, where
>>> certain features approved of by REC-rdfa-syntax-20081014 wouldn't be
>>> used in some hypothetical HTML5 RDFa.
>>>
>>> If people can control their urge to use namespace abbreviations, and
>>> stick to URIs directly, would this make your DOM-oriented concerns go
>>> away?
>>
>> Took five minutes to make this change in my template. Ran through
>> validator.nu. Results:
>>
>> Doesn't like the content-type. Didn't like profile on head. Having to
>> remove the profile attribute in my head element limits usability, but
>> I'm not going to throw myself on the sword for this one.
>>
>> Doesn't like property, doesn't like about. These are the RDFa attributes
>> I'm using. The RDF extractor doesn't care that I used the URIs directly.
>
> This sounds encouraging. Thanks for taking the time to try the 
> experiment,  Shelley. But ... to be clear, are you putting full URIs 
> in the @property attribute too? In 
> http://www.w3.org/TR/rdfa-syntax/#s_curieprocessing it says 
> '@property, @datatype and @typeof support only CURIE values.'
>
> (Can you post an example?)
>
> Reading ...
> """Many of the attributes that hold URIs are also able to carry 
> 'compact URIs' or CURIEs. A CURIE is a convenient way to represent a 
> long URI, by replacing a leading section of the URI with a 
> substitution token. It's possible for authors to define a number of 
> substitution tokens as they see fit; the full URI is obtained by 
> locating the mapping defined by a token from a list of in-scope 
> tokens, and then simply concatenating the second part of the CURIE 
> onto the mapped value."""
>
> ... I guess the fact that @property is supposed to be CURIE-only isn't 
> a problem with parsers since this can be understood as a CURIE with no 
> (or empty) substitution token.

I apologize for wasting this group's time. I misunderstood the RDFa 
documentation myself, and am using full URIs within the property 
attribute, too. I guess when I validated my RDFa only page 
(http://missourigreen.burningbird.net) with the W3C validator, and it 
gave me a valid result for RDFa, I assumed I was doing it correctly. 
Oddly enough, the RDF extractor worked with my erroneous use, too.

This presents a dilemma, as I don't now know how to represent the RDFa 
that will work either with the way the standard is now, or some nebulous 
future time in an acceptable format for HTML5.

I'm embarrassed that I wasted the group's time,  especially when I 
obviously don't have the abilities to contribute to the group, or to 
participate.

I'll refrain from responding to any future email.

Shelley

>
> cheers,
>
> Dan
>
> -- 
> http://danbri.org/
>


attached mail follows:



Dan Brickley wrote:
> On 19/1/09 15:42, Henri Sivonen wrote:
>
>>> I've been making some ill-documented tests in
>>> http://svn.foaf-project.org/foaftown/2009/rdfa/tests/ ... trying to
>>> find middle ground between current RDFa parser behaviour and something
>>> that can work in HTML5.
>>
>> Thanks.
>
> (current svn mime types now documented in 
> http://svn.foaf-project.org/foaftown/2009/rdfa/tests/mime.sh)
>
>>> It does seem that the RDFa tools should (although they don't all
>>> currently) require the ' xmlns:http="http:" ' hack. In other words I
>>> was over-optimistic in thinking this was legal RDFa without the
>>> xmlns:http hack. But that's so clearly a hack that I can imagine an
>>> errata being possible.
>>
>> Do they 'work', though, without it being 'legal'?
>
> I started trying to answer that in this email, but it quickly became 
> long and tangled. I tried t6.html (html5-ish, served as text/html, 
> using xmlns:http='http' hack) and t7.html (as t6.html minus the xmlns 
> part). The results of trying in 6 different RDFa parsers are roughly:
>
>   * more than half can swallow this "verbose form" of RDFa, so long as 
> the xmlns: is present.
>   * none of the parsers are happy if the xmlns:http is removed
>   * I didn't test a version using XHTML boilerplate or mimetype yet

I do have a site you can use for testing with the XHTML mimetype, but 
you might have to give me advice on what to change. It's at 
http://missourigreen.burningbird.net.

I've been adding and modifying based on the emails, but at the moment am 
using the RDFa DOCTYPE, as this seems to work with my ARC2 library, 
which doesn't like the namespace workaround.

This site doesn't use SVG, so it allows testing on just the RDFa stuff.

Shelley


attached mail follows:



On Jan 18, 2009, at 21:45, Dan Brickley wrote:

> If people can control their urge to use namespace abbreviations, and  
> stick to URIs directly, would this make your DOM-oriented concerns  
> go away?

Yes, it would make my DOM Consistency concern go away if the urge were  
thus controlled for both HTML and XHTML.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



On Sun, Jan 18, 2009 at 1:34 PM, Henri Sivonen <hsivonen@iki.fi> wrote:
> On Jan 18, 2009, at 01:32, Shelley Powers wrote:
>
>> Are you then saying that this will be a showstopper, and there will never
>> be either a workaround or compromise?
>
> Are the RDFa TF open to compromises that involve changing the XHTML side of
> RDFa not to use attribute whose qualified name has a colon in them to
> achieve DOM Consistency by changing RDFa instead of changing parsing?

Just so that we have all of the data available to make an informed
decision, do we have examples of how it would "break the web" if
attributes which started with the characters "xmlns:" (and *only*
those attribute) were placed into the DOM exactly as they would be
when those bytes are processed as XHTML?

Notes: I am *not* suggesting anything just yet, other than the
gathering of this data.  I also recognize that this would require a
parsing change by browser vendors, which also is a cost that needs to
be factored in.  But right now, I am interested in how it would affect
the web if this were done.

> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/

- Sam Ruby

attached mail follows:



On Sat, Jan 17, 2009 at 5:51 PM, Henri Sivonen <hsivonen@iki.fi> wrote:
> On Jan 17, 2009, at 22:35, Shelley Powers wrote:
>
>> Generally, though, RDFa is based on reusing a set of attributes already
>> existing in HTML5, and adding a few more.
>
> Also, RDFa uses CURIEs which in turn use the XML namespace mapping context.
>
>> I would assume no differences in the DOM based on XHTML or HTML.
>
> The assumption is incorrect.
>
> Please compare
> http://hsivonen.iki.fi/test/moz/xmlns-dom.html
> and
> http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml
>
> Same bytes, different media type.

The W3C Recommendation for DOM also describes a readonly attribute on
Attr named 'name'.  Discuss.

>> I put together a very crude demonstration of JavaScript access of a
>> specific RDFa attribute, about. It's temporary, but if you go to my main web
>> page,http://realtech.burningbird.net, and look in the sidebar for the click
>> me text, it will traverse each div element looking for an "about" attribute,
>> and then pop up an alert with the value of the attribute. I would use
>> console rather than alert, but I don't believe all browsers support console,
>> yet.
>
> This misses the point, because the inconsistency is with attributes named
> xmlns:foo.

There is a similar inconsistency in how xml:lang is handled.  Discuss.

> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/

- Sam Ruby

attached mail follows:



On Jan 18, 2009, at 02:02, Sam Ruby wrote:

> On Sat, Jan 17, 2009 at 5:51 PM, Henri Sivonen <hsivonen@iki.fi>  
> wrote:
>> On Jan 17, 2009, at 22:35, Shelley Powers wrote:
>>
>>> Generally, though, RDFa is based on reusing a set of attributes  
>>> already
>>> existing in HTML5, and adding a few more.
>>
>> Also, RDFa uses CURIEs which in turn use the XML namespace mapping  
>> context.
>>
>>> I would assume no differences in the DOM based on XHTML or HTML.
>>
>> The assumption is incorrect.
>>
>> Please compare
>> http://hsivonen.iki.fi/test/moz/xmlns-dom.html
>> and
>> http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml
>>
>> Same bytes, different media type.
>
> The W3C Recommendation for DOM also describes a readonly attribute on
> Attr named 'name'.  Discuss.

I have added this to the test cases.

In the DOM API, you can use the namespace-unaware DOM Level 1 view to  
make both cases look the same upon getting a parser-inserted value.  
(This is, of course, totally against namespace-aware programming  
practices, and in non-browser apps, the API might not even expose  
qnames or higher-level technologies like RELAX NG or XPath can't  
trigger on them.)

But it's too early to declare victory. Surely we want also scripted  
setters that mutate the DOM into a state that could have been the  
result of a parse.

Now we have tentatively seen that DOM Level 1 APIs seem to do what we  
want. So let's try using setAttribute():
http://hsivonen.iki.fi/test/moz/xmlns-dom-setter.html
The result looks the same as the HTML case earlier:
http://hsivonen.iki.fi/test/moz/xmlns-dom.html

But now, the XHTML side using the setter:
http://hsivonen.iki.fi/test/moz/xmlns-dom-setter.xhtml
...gives a result that is different from the parser-inserted attribute  
XHTML:
http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml
Furthermore, the resulting DOM is no longer serializable as XML 1.0.

So let's move to a less intuitive case and use the namespace-aware  
Level 2 setter while assuming the use of the namespace-unaware Level 1  
getter:
http://hsivonen.iki.fi/test/moz/xmlns-dom-setter-ns.xhtml
Looks good compared to the parser-inserted XHTML case:
http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml

But now, the HTML side is broken:
http://hsivonen.iki.fi/test/moz/xmlns-dom-setter-ns.html
vs.
http://hsivonen.iki.fi/test/moz/xmlns-dom.html

>>> I put together a very crude demonstration of JavaScript access of a
>>> specific RDFa attribute, about. It's temporary, but if you go to  
>>> my main web
>>> page,http://realtech.burningbird.net, and look in the sidebar for  
>>> the click
>>> me text, it will traverse each div element looking for an "about"  
>>> attribute,
>>> and then pop up an alert with the value of the attribute. I would  
>>> use
>>> console rather than alert, but I don't believe all browsers  
>>> support console,
>>> yet.
>>
>> This misses the point, because the inconsistency is with attributes  
>> named
>> xmlns:foo.
>
> There is a similar inconsistency in how xml:lang is handled.  Discuss.

The xml:lang DOM inconsistency has lead to a situation where the  
xml:lang/lang area in Validator.nu has has the highest incidence of  
validator buts per spec sentence of all areas of HTML5. You've  
reported at least one of those bugs. The amount of developer time  
needed to get it right was ridiculously high.

fantasai recently wrote: “Unless you're working on a CSS layout engine  
yourself, the level of detail, complex interactions with the rest of  
CSS, and design and implementation constraints we need to deal with  
here are more complicated than you can imagine. (Source: http://fantasai.inkedblade.net/weblog/2009/layout-is-expensive/)

 From my experience with Validator.nu (that doesn't even have a DOM!)  
I think I can say: Unless you're working on a software product whose  
code reuse between HTML and XHTML depends on the DOM Consistency  
Design principle, the badness caused by violations of the DOM  
Consistency Design principle is more complicated than you can imagine.  
(Where 'you' is not you, Sam, but the generic English you.)

xml:lang was introduced by people who were designing for an XML  
universe when it seemed that would be the way the world would go, so  
they can be forgiven, and the WHATWG can clean up the mess. Likewise,  
the syntax that the SVG WG chose made sense given that they were  
designing for an XML world. It can be accepted as legacy, and HTML5  
parser writers can spend time optimizing the conditional camel casing.

RDFa, on the other hand, was created by people who fully expected it  
to be served as text/html, even though they called it something like  
XHTML 1.1 plus RDFa instead of calling it HTML5. Furthermore, when  
they saw they wanted to have RDFa in HTML5, too, instead of addressing  
HTML issues then, they just continued pushing towards REC. It's easily  
looks like this was done so that RDFa could be presented as a done  
deal that HTML5 needs to deal with instead of something whose details  
are negotiable. Creating a new mess that would have been easily  
avoidable is not similarly forgivable. Also, it sets in very bad  
precedent if we allow other groups to keep us on the treadmill by  
injecting new HTML-hostile features and expecting us to spend cycles  
to sort them out by "working the issues".

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



Ian Hickson wrote:
> On Sat, 17 Jan 2009, Sam Ruby wrote:
>   
>> Shelley Powers wrote:
>>     
>>> So, why accept that we have to use MathML in order to solve the 
>>> problems of formatting mathematical formula? Why not start from 
>>> scratch, and devise a new approach?
>>>       
>> Ian explored (and answered) that here:
>>
>> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-April/014372.html
>>
>> Key to Ian's decision was the importance of DOM integration for this 
>> vocabulary.  If DOM integration is essential for RDFa, then perhaps the 
>> same principles apply.  If not, perhaps some other principles may apply.
>>     
>
> Sam's point here bears repeating, because there seems to be an impression 
> that we took on SVG and MathML without any consideration, while RDF is 
> getting an unfair reception.
>
> On the contrary, SVG and MathML got the same reception. For MathML, for 
> instance, a number of options were very seriously considered, most notably 
> LaTeX. For SVG, we considered a variety of options including VML.
>
> I would encourage people to read the e-mail Sam cited:
>
>    http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-April/014372.html
>
> It's long, but the start of it is a summary of what was considered and 
> shows that the same process derived from use cases was used for SVG and 
> MathML as is being used on this thread here.
>
>   
I'm not doubting the effort that went into getting MathML and SVG 
accepted. I've followed the effort associated with SVG since the beginning.

I'm not sure if the same procedure was also applied to the canvas 
object, as well as the SQL query capability. Will assume so.

The point I'm making is that you set a precedent, and a good one I 
think: giving precedence to "not invented here". In other words, to not 
re-invent new ways of doing something, but to look for established 
processes, models, et al already in place, implemented, vetted, etc, 
that solve specific problems. Now that you have accepted a use case, 
Martin's, and we've established that RDFa solves the problem associated 
with the use case, the issue then becomes is there another data model 
already as vetted, documented, implemented that would better solve the 
problem.

I propose that RDFa is the best solution to the use case Martin 
supplied, and we've shown how it is not a disruptive solution to HTML5.

The fact that it is based on RDF, a mature, well documented, widely used 
model with many different implementations is a perk.

Shelley


attached mail follows:



On Jan 17, 2009, at 21:38, Shelley Powers wrote:

> I'm not doubting the effort that went into getting MathML and SVG  
> accepted. I've followed the effort associated with SVG since the  
> beginning.
>
> I'm not sure if the same procedure was also applied to the canvas  
> object, as well as the SQL query capability. Will assume so.

Note that SVG, MathML and SQL have had different popularity  
trajectories in top four browser engines than RDF.

SVG is going up. At the time it was included in HTML5 (only to be  
commented out shortly thereafter), three of the top browser engines  
implemented SVG for retained-mode vector graphics and their SVG  
support was actively being improved. (One of the top four engines  
implemented VML, though.)

At the time MathML was included in HTML5, it was supported by Gecko  
with renewed investment into it as part of the Cairo migration. Also,  
Opera added some MathML features at that time. Thus, two of the top  
four engines had active MathML development going on. Further, one of  
the major MathML implementations is an ActiveX control for IE.

When SQL was included in HTML5, Apple (in WebKit) and Google (in  
Gears) had decided to use SQLite for this functionality. Even though  
Firefox doesn't have a Web-exposed database, Firefox also already  
ships with embedded SQLite. At that point it would have been futile  
for HTML5 to go against the flow of implementations.

The story of RDF is very different. Of the top four engines, only  
Gecko has RDF functionality. It was implemented at a time when RDF was  
a young W3C REC and stuff that were W3C RECs were implemented less  
critically than nowadays. Unlike SVG and MathML, the RDF code isn't  
actively developed (see hg logs). Moreover, the general direction  
seems to be away from using RDF data sources in Firefox internally.

Meanwhile, the feed example you gave--RSS 1.0--shows how the feed spec  
community knowingly moved away from RDF with RSS 2.0 and Atom.  
Furthermore, RSS 1.0 usually isn't parsed into an RDF graph but is  
treated as XML instead. If RSS 1.0 is evidence, it's evidence  
*against* RDF.

> The point I'm making is that you set a precedent, and a good one I  
> think: giving precedence to "not invented here". In other words, to  
> not re-invent new ways of doing something, but to look for  
> established processes, models, et al already in place, implemented,  
> vetted, etc, that solve specific problems. Now that you have  
> accepted a use case, Martin's, and we've established that RDFa  
> solves the problem associated with the use case, the issue then  
> becomes is there another data model already as vetted, documented,  
> implemented that would better solve the problem.

Clearly, RDFa wasn't properly vetted--as far as the desire to deploy  
it in text/html goes--when the outcome was that it ended up using  
markup that doesn't parse into the DOM the same way in HTML and XML.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



Henri Sivonen wrote:
> On Jan 17, 2009, at 21:38, Shelley Powers wrote:
>
>> I'm not doubting the effort that went into getting MathML and SVG 
>> accepted. I've followed the effort associated with SVG since the 
>> beginning.
>>
>> I'm not sure if the same procedure was also applied to the canvas 
>> object, as well as the SQL query capability. Will assume so.
>
> Note that SVG, MathML and SQL have had different popularity 
> trajectories in top four browser engines than RDF.
>
> SVG is going up. At the time it was included in HTML5 (only to be 
> commented out shortly thereafter), three of the top browser engines 
> implemented SVG for retained-mode vector graphics and their SVG 
> support was actively being improved. (One of the top four engines 
> implemented VML, though.)
>
> At the time MathML was included in HTML5, it was supported by Gecko 
> with renewed investment into it as part of the Cairo migration. Also, 
> Opera added some MathML features at that time. Thus, two of the top 
> four engines had active MathML development going on. Further, one of 
> the major MathML implementations is an ActiveX control for IE.
>
> When SQL was included in HTML5, Apple (in WebKit) and Google (in 
> Gears) had decided to use SQLite for this functionality. Even though 
> Firefox doesn't have a Web-exposed database, Firefox also already 
> ships with embedded SQLite. At that point it would have been futile 
> for HTML5 to go against the flow of implementations.
>
> The story of RDF is very different. Of the top four engines, only 
> Gecko has RDF functionality. It was implemented at a time when RDF was 
> a young W3C REC and stuff that were W3C RECs were implemented less 
> critically than nowadays. Unlike SVG and MathML, the RDF code isn't 
> actively developed (see hg logs). Moreover, the general direction 
> seems to be away from using RDF data sources in Firefox internally.
>

Now wait a second, you're changing the parameters of the requirements. 
Before, the criteria was based on the DOM. Now you're saying that the 
browsers actually have to do with something with it.

Who is to say what the browsers will do with RDF in the future?

In addition, is that the criteria for pages on the web -- that every 
element in them has to result in different behaviors in browsers, only? 
What about other user agents?

That seems to me to be looking for RDFa sized holes and them throwing 
them into the criteria, specifically to trip up RDF, and hence, RDFa.


> Meanwhile, the feed example you gave--RSS 1.0--shows how the feed spec 
> community knowingly moved away from RDF with RSS 2.0 and Atom. 
> Furthermore, RSS 1.0 usually isn't parsed into an RDF graph but is 
> treated as XML instead. If RSS 1.0 is evidence, it's evidence 
> *against* RDF.
>
>> The point I'm making is that you set a precedent, and a good one I 
>> think: giving precedence to "not invented here". In other words, to 
>> not re-invent new ways of doing something, but to look for 
>> established processes, models, et al already in place, implemented, 
>> vetted, etc, that solve specific problems. Now that you have accepted 
>> a use case, Martin's, and we've established that RDFa solves the 
>> problem associated with the use case, the issue then becomes is there 
>> another data model already as vetted, documented, implemented that 
>> would better solve the problem.
>
> Clearly, RDFa wasn't properly vetted--as far as the desire to deploy 
> it in text/html goes--when the outcome was that it ended up using 
> markup that doesn't parse into the DOM the same way in HTML and XML.
>
SVG and MathML were both created as XML, and hence were not vetted for 
text/html, either. And yet, here they are. Well, here they'll be, 
eventually.

Come to that -- I don't think the creators of SQL actually ever expected 
that someday SQL  queries would be initiated from HTML pages.

Shelley


attached mail follows:



On Jan 17, 2009, at 22:43, Shelley Powers wrote:

> Henri Sivonen wrote:
>> On Jan 17, 2009, at 21:38, Shelley Powers wrote:
>>
>>> I'm not doubting the effort that went into getting MathML and SVG  
>>> accepted. I've followed the effort associated with SVG since the  
>>> beginning.
>>>
>>> I'm not sure if the same procedure was also applied to the canvas  
>>> object, as well as the SQL query capability. Will assume so.
>>
>> Note that SVG, MathML and SQL have had different popularity  
>> trajectories in top four browser engines than RDF.
>>
>> SVG is going up. At the time it was included in HTML5 (only to be  
>> commented out shortly thereafter), three of the top browser engines  
>> implemented SVG for retained-mode vector graphics and their SVG  
>> support was actively being improved. (One of the top four engines  
>> implemented VML, though.)
>>
>> At the time MathML was included in HTML5, it was supported by Gecko  
>> with renewed investment into it as part of the Cairo migration.  
>> Also, Opera added some MathML features at that time. Thus, two of  
>> the top four engines had active MathML development going on.  
>> Further, one of the major MathML implementations is an ActiveX  
>> control for IE.
>>
>> When SQL was included in HTML5, Apple (in WebKit) and Google (in  
>> Gears) had decided to use SQLite for this functionality. Even  
>> though Firefox doesn't have a Web-exposed database, Firefox also  
>> already ships with embedded SQLite. At that point it would have  
>> been futile for HTML5 to go against the flow of implementations.
>>
>> The story of RDF is very different. Of the top four engines, only  
>> Gecko has RDF functionality. It was implemented at a time when RDF  
>> was a young W3C REC and stuff that were W3C RECs were implemented  
>> less critically than nowadays. Unlike SVG and MathML, the RDF code  
>> isn't actively developed (see hg logs). Moreover, the general  
>> direction seems to be away from using RDF data sources in Firefox  
>> internally.
>>
>
> Now wait a second, you're changing the parameters of the requirements.

I'm explaining how SVG, MathML and SQL are different from RDF(a) in a  
way that's very relevant to the practice of including stuff in the spec.

> Before, the criteria was based on the DOM. Now you're saying that  
> the browsers actually have to do with something with it.
>
> Who is to say what the browsers will do with RDF in the future?

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-August/016045.html 
  is a message where one of the editors of RDFa mentions RDFa together  
with "client-side tools like Ubiquity". That Ubiquity is a Firefox  
extension rather than part of the core feature set is an  
implementation detail. I read this as envisioning browser-sensitivity  
to RDFa.

> In addition, is that the criteria for pages on the web -- that every  
> element in them has to result in different behaviors in browsers,  
> only?

No. However, most of the time, when people publish HTML, they do it to  
elicit browser behavior when a user loads the HTML document in a  
browser.

>> Meanwhile, the feed example you gave--RSS 1.0--shows how the feed  
>> spec community knowingly moved away from RDF with RSS 2.0 and Atom.  
>> Furthermore, RSS 1.0 usually isn't parsed into an RDF graph but is  
>> treated as XML instead. If RSS 1.0 is evidence, it's evidence  
>> *against* RDF.
>>
>>> The point I'm making is that you set a precedent, and a good one I  
>>> think: giving precedence to "not invented here". In other words,  
>>> to not re-invent new ways of doing something, but to look for  
>>> established processes, models, et al already in place,  
>>> implemented, vetted, etc, that solve specific problems. Now that  
>>> you have accepted a use case, Martin's, and we've established that  
>>> RDFa solves the problem associated with the use case, the issue  
>>> then becomes is there another data model already as vetted,  
>>> documented, implemented that would better solve the problem.
>>
>> Clearly, RDFa wasn't properly vetted--as far as the desire to  
>> deploy it in text/html goes--when the outcome was that it ended up  
>> using markup that doesn't parse into the DOM the same way in HTML  
>> and XML.
>>
> SVG and MathML were both created as XML, and hence were not vetted  
> for text/html, either. And yet, here they are. Well, here they'll  
> be, eventually.

Actually, the creators of MathML had the good sense and foresight to  
avoid name collisions with HTML even after Namespaces theoretically  
gave them a permission not to care.

Unlike the creators of RDFa, the creators of SVG weren't pushing for  
inclusion in HTML5 or saying that it's OK to serve their XML as text/ 
html--quite the contrary. And the integration would have been nicer if  
the SVG WG had had the same prudence as the Math WG.

> Come to that -- I don't think the creators of SQL actually ever  
> expected that someday SQL  queries would be initiated from HTML pages.


I don't see the creators of SQL asking for the inclusion of their  
stuff in HTML after building on another spec that is well-known to be  
trouble with HTML (Namespaces in XML in the RDFa case).

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



On 18/1/09 00:24, Henri Sivonen wrote:

> No. However, most of the time, when people publish HTML, they do it to
> elicit browser behavior when a user loads the HTML document in a browser.

Most users of the Web barely know what a browser is, let alone HTML. 
They're just putting information online; perhaps into a closed site (eg. 
facebook), perhaps into a public-facing site (eg. a blog), or perhaps 
into 1:1, group or IM messaging (eg. webmail). HTML figures in all these 
scenarios. Browsers or HTML rendering code too, of course. But I don't 
think we can jump from that to claims about user intent, and more than 
their use of the Internet signifies an intent to have their information 
chopped up into packets and transmitted according to the rules of TCP/IP.

The reason for my pedantry here is not to be argumentative, but just to 
suggest that this (otherwise very natural) thinking leads us to forget 
about the other major consumers of HTML - search engines. Having their 
stuff found and linked by other is often a big part of the motivation 
for putting stuff online. HTML parsing is involved, impact on the needs 
and interests of mainstream users is involved; but it's not clear 
whether all/any/many users 'do it to elicit search engine behaviour when 
indexing the HTML document'.

Aren't search engines equally important consumers of HTML? Perhaps 
they're more simple-minded in their behaviour than a full UI browser. 
But from the user side, there's only slightly more value in being 
readable without being findable than vice-versa...

cheers,

Dan

--
http://danbri.org/

attached mail follows:



On Saturday 2009-01-17 22:25 +0200, Henri Sivonen wrote:
> The story of RDF is very different. Of the top four engines, only Gecko 
> has RDF functionality. It was implemented at a time when RDF was a young 
> W3C REC and stuff that were W3C RECs were implemented less critically 
> than nowadays.

Actually, the implementation was well underway *before* RDF was a
W3C REC, done by a team led by one of the designers of RDF.  In
other words, it was in Gecko because there were RDF advocates at
Netscape (although advocating, I think, a somewhat different RDF
than the current RDF recommendations).

Compare the dates on:
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
http://www.w3.org/TR/1999/PR-rdf-schema-19990303/
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=mozilla%2Frdf&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=1998-01-01&maxdate=1999-01-01&cvsroot=%2Fcvsroot

-David

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/

attached mail follows:



On 17/1/09 23:30, L. David Baron wrote:
> On Saturday 2009-01-17 22:25 +0200, Henri Sivonen wrote:
>> The story of RDF is very different. Of the top four engines, only Gecko
>> has RDF functionality. It was implemented at a time when RDF was a young
>> W3C REC and stuff that were W3C RECs were implemented less critically
>> than nowadays.
>
> Actually, the implementation was well underway *before* RDF was a
> W3C REC, done by a team led by one of the designers of RDF.  In
> other words, it was in Gecko because there were RDF advocates at
> Netscape (although advocating, I think, a somewhat different RDF
> than the current RDF recommendations).

Yes, Netscape had this stuff when it was still called MCF. W3C's RDF 
took ideas from several input activities, including MCF, Microsoft 
XML-Data, PICS, and requirements from the Dublin Core community. But it 
looks more like MCF than the others.

MCF was originally proposed by R.V.Guha at Apple; it followed him from 
Apple to Netscape in 1997, and when the Mozilla sources were later 
thrown over the wall, there was a lot of MCF in there.

MCF White Paper, 1996 http://www.guha.com/mcf/wp.html
spec, http://www.guha.com/mcf/mcf_spec.html

While this was at Apple, there was a product/viewer called HotSauce / 
Project X, and some early grassroots adoption of MCF as a text format 
for publishing website summaries.

http://web.archive.org/web/19961224042753/http://hotsauce.apple.com/
http://downlode.org/Etext/MCF/macworld_online.html

  It was at this stage that dialog started with the Library scene and 
Dublin Core folk, about how it related to their notion of catalogue 
records, and to the evolving PICS labelling system, format and protocol 
being built at W3C.
eg.
http://www.ssrc.hku.hk/tb-issues/TidBITS-355.html#lnk3
http://web.archive.org/web/19980215092626/http://www.ariadne.ac.uk/issue7/mcf/
The MCF/RSS relationship is a whole other story, eg. see
http://www.scripting.com/midas/mcf.html
http://www.scripting.com/frontier/siteMap.mcf
http://web.archive.org/web/19990222114619/http://www.xspace.net/hotsauce/sites.html

Then the thing moved to Netscape. Tim Bray helped Guha XMLize the spec, 
which was submitted to W3C in 1997, where it joined the existing efforts 
to extend PICS to include text labels and more structure - 
http://www.w3.org/TR/NOTE-pics-ng-metadata
http://www.daml.org/committee/minutes/2000-12-07-RDF-design-rationale.ppt
http://searchenginewatch.com/2165291

So the June 97 spec was
http://www.w3.org/TR/NOTE-MCF-XML/
.. you can see from the figures that the technology was very RDF-shaped, 
http://www.w3.org/TR/NOTE-MCF-XML/#sec2. Also a tutorial at 
http://www.w3.org/TR/NOTE-MCF-XML/MCF-tutorial.html

Netscape press release accompanying June 13 1997 submission -
http://web.archive.org/web/20010308150737/http://cgi.netscape.com/newsref/pr/newsrelease432.html

Less than 4 months later, this came out as a W3C Working Draft called 
"RDF": http://www.w3.org/TR/WD-rdf-syntax-971002/
... in a shape that didn't really change much subsequently. RDF wasn't 
the same design exactly as MCF but the ancestry is clear enough.

And getting back to the original point, yeah Mozilla had MCF sitemaps 
code in there.

Revisiting 
http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/9-8-97/312711&EDATE= 

http://www.irt.org/articles/js086/ and the like, it's clear that RDF was 
very much a child of the 1st browser wars.

In retrospect the direction it took within Mozilla didn't do anyone much 
good. The earliest MCF apps were about public data on the public Web, 
feeds, sitemaps and so on. But eventually the ambition to be a complete 
information hub led to MCF/RDF being used for pretty much everything 
*inside* Mozilla. And I don't think that turned out very well. 
http://www.mozilla.org/rdf/doc/api.html etc. The RDF vocabularies it 
used were poorly or never documented (I have some guilt there) and when 
Netscape went away, the incentive to connect to public data on the Web 
seemed to drop (no more tie-ins with the 'what's related' annotation 
server, 'dmoz' etc.). RDF drifted from being a Web data format to be 
consumed *by* the browser, into an engineering tool to be used in the 
construction *of* the browser, ie. as a datasource abstraction within 
Mozilla APIs. While I can certainly see the value of having a unified 
view of mail, news, sitemaps, and so on, the Moz code at the time wasn't 
really in a position to match up to the language in the press releases.

Not making any particular point here beyond connecting up to the MCF 
heritage...

cheers,

Dan

--
http://danbri.org/



attached mail follows:



On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers
<shelleyp@burningbird.net> wrote:
>
> I propose that RDFa is the best solution to the use case Martin supplied,
> and we've shown how it is not a disruptive solution to HTML5.

Others may differ, but my read is that the case is a strong one.  But
I will caution you that a little patience is in order.  SVG is not a
done deal yet.  I've been involved in a number of standards efforts,
and I've never seen a case of "proposed on a Saturday morning, decided
on a Saturday afternoon".  One demo is not conclusive.  Now you
mention that there exists a number of libraries.  I think that's
important.  Very important.  Possibly conclusive.

But back to expectations.  I've seen references elsewhere to Ian being
booked through the end of this quarter.  I may have misheard, but in
any case, my point is the same: if this is awaiting something from
Ian, it will be prioritized and dealt with accordingly.  If, however,
some of the legwork is done for Ian, this may help accelerate the
effort.

Even little things may help a lot.  I know what I'm about to say may
be unpopular, but I'll say it anyway: take a few good examples of RDFa
and run them through Henri's validator.  The validator will helpfully
indicate exactly what areas of the spec would need to be updated in
order to accommodate RDFa.  The next step would be to take a look at
those sections.  If the update is obvious and straightforward, perhaps
nothing more is required.  But if not, researching into the options
and making recommendations may help.

- Sam Ruby

attached mail follows:



Sam Ruby wrote:
> On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers
> <shelleyp@burningbird.net> wrote:
>   
>> I propose that RDFa is the best solution to the use case Martin supplied,
>> and we've shown how it is not a disruptive solution to HTML5.
>>     
>
> Others may differ, but my read is that the case is a strong one.  But
> I will caution you that a little patience is in order.  SVG is not a
> done deal yet.  I've been involved in a number of standards efforts,
> and I've never seen a case of "proposed on a Saturday morning, decided
> on a Saturday afternoon".  One demo is not conclusive.  Now you
> mention that there exists a number of libraries.  I think that's
> important.  Very important.  Possibly conclusive.
>   
I am patient. Look at me? I make extensive use of both SVG and RDF -- 
that is the mark of a patient woman.
> But back to expectations.  I've seen references elsewhere to Ian being
> booked through the end of this quarter.  I may have misheard, but in
> any case, my point is the same: if this is awaiting something from
> Ian, it will be prioritized and dealt with accordingly.  If, however,
> some of the legwork is done for Ian, this may help accelerate the
> effort.
>   
First of all, whatever happens has to happen with either vetting by the 
RDF/RDFa folks, if not their active help. This is my way of saying, I'd 
be willing to do much of the legwork, but I want to make I don't 
represent RDFa incorrectly.

Secondly, my finances have been caught up in the current downturn, and 
my first priority has to be on the hourly work and odd jobs I'm getting 
to keep afloat. Which means that I can't always guarantee 20+ hours a 
week on a task, nor can I travel. Anywhere.

But if both are acceptable conditions, I'm willing to help with tasks.
> Even little things may help a lot.  I know what I'm about to say may
> be unpopular, but I'll say it anyway: take a few good examples of RDFa
> and run them through Henri's validator.  The validator will helpfully
> indicate exactly what areas of the spec would need to be updated in
> order to accommodate RDFa.  The next step would be to take a look at
> those sections.  If the update is obvious and straightforward, perhaps
> nothing more is required.  But if not, researching into the options
> and making recommendations may help.
>
>   
Tasks including this one.

Shelley

> - Sam Ruby
>
>   


attached mail follows:



On Sat, Jan 17, 2009 at 3:51 PM, Shelley Powers
<shelleyp@burningbird.net> wrote:
> Sam Ruby wrote:
>>
>> On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers
>> <shelleyp@burningbird.net> wrote:
>>
>>>
>>> I propose that RDFa is the best solution to the use case Martin supplied,
>>> and we've shown how it is not a disruptive solution to HTML5.
>>>
>>
>> Others may differ, but my read is that the case is a strong one.  But
>> I will caution you that a little patience is in order.  SVG is not a
>> done deal yet.  I've been involved in a number of standards efforts,
>> and I've never seen a case of "proposed on a Saturday morning, decided
>> on a Saturday afternoon".  One demo is not conclusive.  Now you
>> mention that there exists a number of libraries.  I think that's
>> important.  Very important.  Possibly conclusive.
>>
>
> I am patient. Look at me? I make extensive use of both SVG and RDF -- that
> is the mark of a patient woman.
>>
>> But back to expectations.  I've seen references elsewhere to Ian being
>> booked through the end of this quarter.  I may have misheard, but in
>> any case, my point is the same: if this is awaiting something from
>> Ian, it will be prioritized and dealt with accordingly.  If, however,
>> some of the legwork is done for Ian, this may help accelerate the
>> effort.
>>
>
> First of all, whatever happens has to happen with either vetting by the
> RDF/RDFa folks, if not their active help. This is my way of saying, I'd be
> willing to do much of the legwork, but I want to make I don't represent RDFa
> incorrectly.
>
> Secondly, my finances have been caught up in the current downturn, and my
> first priority has to be on the hourly work and odd jobs I'm getting to keep
> afloat. Which means that I can't always guarantee 20+ hours a week on a
> task, nor can I travel. Anywhere.
>
> But if both are acceptable conditions, I'm willing to help with tasks.

I don't see any of that as being a problem.

>> Even little things may help a lot.  I know what I'm about to say may
>> be unpopular, but I'll say it anyway: take a few good examples of RDFa
>> and run them through Henri's validator.  The validator will helpfully
>> indicate exactly what areas of the spec would need to be updated in
>> order to accommodate RDFa.  The next step would be to take a look at
>> those sections.  If the update is obvious and straightforward, perhaps
>> nothing more is required.  But if not, researching into the options
>> and making recommendations may help.
>
> Tasks including this one.

Excellent.  Well, all except for the downturn thing, but you know what I mean.

In order to prevent any misunderstandings: it is not for me to assign
work.  In fact, nobody here is in such a position.  People simply note
things that need to be done, and do the ones that interest them, at
the pace at which they are able.

And communicate copiously.  If you need help in vetting, I am given to
understand that there is a small pocket of RDF enthusiasm in the W3C.
:-P

> Shelley

- Sam Ruby

attached mail follows:



Ian Hickson wrote:
> On Sun, 18 Jan 2009, Shelley Powers wrote:
>   
>>> The more use cases there are, the better informed the results will be.
>>>       
>> The point isn't to provide use cases. The point is to highlight a 
>> serious problem with this working group--there is a mindset of what the 
>> future of HTML will look like, and the holders of the mindset brook no 
>> challenge, tolerate no disagreement, and continually move to quash any 
>> possibility of asserting perhaps even the faintest difference of 
>> opinion.
>>     
>
> I'm certainly sad that this is the impression I have given. I'd like to 
> clarify for everyone's sake that this mailing list is definitely open to 
> any proposals, any opinions, any disagreement. The only thing I ask is 
> that people use rational debate, back up their opinions with logical 
> arguments, present research to justify their claims, and derive proposals 
> from user needs.
>
>   
I've been especially critical of you, which isn't fair. At the same 
time, as you have said yourself, you are a "benevolent dictator", which 
seems to me to not be the best strategy for an inclusive HTML for the 
future.

I know I'm not comfortable with the concept. But I'm also late to this 
group, and shouldn't disrupt if the strategy works.
>   
>> Regardless, I got the point in the comment. That, combined with this 
>> email from Ian, tells us that it doesn't matter how our arguments run, 
>> the logic of our debate, the rightness of our cause--he is the final 
>> arbiter, and he does not want RDFa.
>>     
>
> For the record, I am as open to us including a feature like RDFa as I am 
> to us including a feature like MathML, SVG, or indeed anything else. While 
> I may present a devil's advocate position to stimulate critical 
> consideration of proposals, this does not mean that my mind is made up. If 
> my mind was made up, I wouldn't be asking for use cases, and I wouldn't 
> be planning to investigate the issue further in April.
>
>
>   
There is a fine difference between being the devil's advocate, and the 
devil's front door made of thick oak, with heavy brass fittings.

How does one know if one has provided a use case in a format that is 
more likely to meet a successful outcome, than not. Is the criteria 
documented somewhere? It's difficult to provide use cases with the 
twenty questions approach.

What are the criteria by which a possible solution to a problem is 
judged? Is there a consistent set of questions asked? Tests made? A 
certain number of implementations? Again, is this documented somewhere?

>> I am not paid by Google, or Mozilla, or IBM to continue throwing away my 
>> time, arguing for naught.
>>     
>
> It may be worth pointing out that, many of our most active participants 
> are volunteers, not paid by anyone to participate. Indeed I myself spent 
> many years contributing to the standards community while unemployed or 
> while a student. I am sorry you feel that you need to be compensated for 
> your participation in the standards community, and wish you the best of 
> luck in finding a suitable employer.
>
>   
The point I was trying to make, and forgive me if the my writing was too 
subtle, is that it's not the fact that the work will time, but whether 
the time will be well spent.

Operating in the dark and tossing use cases in hopes they stick against 
the wall, without understanding criteria is not a particularly good use 
of time. However, having specific tasks that meet a given goal, and 
knowing that the goal is stable, and not a moving target, goes a long 
way to ensuring that the time spent has value.

Knowing that one can, with diligence, ensure that the best result occurs 
is a good use of time.

Spitting into the wind, at the whim and whimsy of a benevolent dictator, 
is not a good use of time.


> As far as Google goes, we have no corporate opinion either way on the 
> topic of RDFa in HTML5. We do, however, encourage the continued practice 
> of basing decisions on data rather than hopes.
>
>   

Bully for Google.

Shelley

attached mail follows:




On Jan 18, 2009, at 8:43 AM, Shelley Powers wrote:

> Take you guys seriously...OK, yeah.
>
> I don't doubt that the work will be challenging, or problematical.  
> I'm not denying Henri's claim. And I didn't claim to be the one who  
> would necessarily come up with the solutions, either, but that I  
> would help in those instances that I could.
>
> What I did express in the later emails, is what others have  
> expressed who have asked about RDFa in HTML5: are we wasting our  
> time even trying? That it seems like a decision has already been  
> made, and we're spinning our wheels even attempting to find  
> solutions. There's a difference between not being willing to  
> negotiate, compromise, work the problem, and just spitting into the  
> wind for no good.

Based on past experience, I would say that you are not wasting your  
time. Evidence-based arguments, explication of use cases, solutions to  
technical problems, persuading third parties, and getting  
implementation traction (for example in popular JavaScript libraries,  
major browser engines, popular authoring/publishing software) will all  
affect how a feature is seen.

As past examples, allowing XML-like self-closing tag syntax for void  
elements in text/html, and ability to include SVG inline in text/html,  
are both features that were highly controversial and at times opposed  
by the editor and others. Nontheless we seem to be on track to have  
both of these in the spec. Note that in the case of SVG especially,  
the path from initial proposal to rough consensus to actual  
integration with the spec was a long one. In fact, integration in the  
spec is not yet fully complete due to some disputes about the details  
of the syntax. Another example is the "headers" attribute, and the  
more general issue of header association in tables. Though the  
"headers" attribute was controversial and once opposed by the editor,  
it is now in the spec.

I believe that most of us here, while we may have our biases and  
preconceptions, will evaluate concrete technical arguments in good  
faith, and are prepared to change our minds. The fact is that people  
have changed positions in the past, Ian included. So nothing should be  
assumed to be a done deal, especially at this early stage of exploring  
metadata embedding and RDFa.


>>> However, the debate ended as soon as Ian re-asserted his authority.
>>
>> Ian just gave an indication of when he's going to work on this  
>> again. That doesn't mean that research into e.g. DOM consistency  
>> can't happen meanwhile. It also doesn't mean that debate needs to  
>> stop.
>>
>>
> No, Ian's listing of tasks pretty much precluded any input into the  
> decision making process other than his own. I never see "we" when  
> Ian writes, I only see "I".

Ian intends to make an evaluation based on evidence and arguments  
presented. Presenting such evidence and arguments is input into the  
decision making process. That's how other changes to the spec that  
went against Ian's initial gut instinct happened. Indeed it is  
possible for Ian to be overruled if he is clearly blocking the  
consensus of the group(*), but so far that has not been necessary,  
even on controversial issues.

I encourage you to provide input into the process, and not to get too  
frustrated if the process is not quick. Nor by the fact that some may  
initially (or even finally, when all is said and done) disagree with  
you.

Regards,
Maciej


* - The HTML WG can take a vote which is binding at least in the W3C  
context or remove Ian as editor; and the WHATWG oversight group can  
remove Ian as editor or pressure him by virtue of having the authority  
to remove him.


attached mail follows:



Shelley Powers ha scritto:
>
>
> The point I'm making is that you set a precedent, and a good one I 
> think: giving precedence to "not invented here". In other words, to 
> not re-invent new ways of doing something, but to look for established 
> processes, models, et al already in place, implemented, vetted, etc, 
> that solve specific problems. Now that you have accepted a use case, 
> Martin's, and we've established that RDFa solves the problem 
> associated with the use case, the issue then becomes *is there another 
> data model already as vetted, documented, implemented that would 
> better solve the problem*.
>

RDF in a separate XML-syntax file, perhaps. Just because that use case 
raised a privacy concern on informations to keep private anyway, and 
that's not a problem solvable at the document level with metadata; 
instead, keeping relevant metadata in a separate file would help a 
better access control. Also, a separate file would have the relevant 
informations ready for use, while embedding them with other content 
would force a load and parsing of the other content in search of 
relevant metadata (possible, of course, and not much of a problem, but 
not as clean and efficient).

Moreover, it should be verified whether social-network service providers 
agree with such a requirement: I might avail of a compliant 
implementation to easily migrate from one service to another and leave 
the former, in which case why should a company open its inner 
infrastructure and database and invest resources for the benefit of a 
competitor accessing its data and consuming its bandwidth to catch its 
customers? (this is not the same interoperability issue for mail clients 
supporting different address book formats, minor vendors had to do that 
to improve their businness - and they didn't need to access a 
competitor's infrastructure).

Perhaps, that might work if personal infos and relationships were 
handled by an external service on the same lines of an OpenID service 
allowing an automated identification by other services; but this would 
reduce social networks to be a kind of front-end for such a centralized 
management (and service providers might not like that). Also, in this 
case anonimity should be ensured (for instance, I might have met you in 
two different networks, but knew your identity in only one of them, and 
you might wish that no one knew you're the person behind the other 
nickname; this is possible taking different informations in different 
databases and with different access rights, and should be replicable 
when merging such infos -- on the other hand, if you knew my identity, 
you should be allowed to "fill in the blanks" somehow).

Shelley Powers ha scritto:
> Anne van Kesteren wrote:
>> On Sun, 18 Jan 2009 17:15:34 +0100, Shelley Powers 
>> <shelleyp@burningbird.net> wrote:
>>> And regardless of the fact that I jumped to conclusions about WhatWG 
>>> membership, I do not believe I was inaccurate with the earlier part 
>>> of this email. Sam started a new thread in the discussion about the 
>>> issues of namespace and how, perhaps we could find a way to work the 
>>> issues through with RDFa. My god, I use RDFa in my pages, and they 
>>> load fine with any browser, including IE. I have to believe its 
>>> incorporation into HTML5 is not the daunting effort that others make 
>>> it seem to be.'
>>
>> You ask us to take you seriously and consider your feedback, it would 
>> be nice if you took what e.g. Henri wrote seriously as well. 
>> Integrating a new feature in HTML is not a simple task, even if the 
>> new feature loads and renders fine in Internet Explorer.
>>
> Take you guys seriously...OK, yeah.
>
> I don't doubt that the work will be challenging, or problematical. I'm 
> not denying Henri's claim. And I didn't claim to be the one who would 
> necessarily come up with the solutions, either, but that I would help 
> in those instances that I could. 

It seems that you'd expect RDFa to be specced out before solving related 
problems (so to push their solution). I don't think that's the right 
path to follow, instead known issues must be solved before making a 
decision, so that the specification can tell exactly what developers 
must implement, eventually pointing out (after/while implementing) newer 
(hopefully minor) issues to be solved by refining the spec (which is a 
different task than specifying something known to be, let's say, "buggy" 
or uncertain).


Everything, as always, IMHO

WBR, Alex


 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Blu American Express: gratuita a vita! 
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8613&d=4-2

attached mail follows:



Calogero Alex Baldacchino wrote:
> It seems that you'd expect RDFa to be specced out before solving related
> problems (so to push their solution). I don't think that's the right path to
> follow, instead known issues must be solved before making a decision, so
> that the specification can tell exactly what developers must implement

I think that an help in defining of the requirements around
structured data, RDFa, metadata copy&paste, semantic links [1],etc
could came from the W3C document "Use Cases and Requirements
for Ontology and API for Media Object 1.0" [2]

Take the requirements listed from "r01" to "r13" and replace
the term "media objects" with "structured/linked data".

[1] http://lists.w3.org/Archives/Public/public-html/2009Jan/0082.html
[2] http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r01
-- 
Giovanni Gentili

attached mail follows:



Shelley Powers ha scritto:
>
>
> The point I'm making is that you set a precedent, and a good one I 
> think: giving precedence to "not invented here". In other words, to 
> not re-invent new ways of doing something, but to look for established 
> processes, models, et al already in place, implemented, vetted, etc, 
> that solve specific problems. Now that you have accepted a use case, 
> Martin's, and we've established that RDFa solves the problem 
> associated with the use case, the issue then becomes *is there another 
> data model already as vetted, documented, implemented that would 
> better solve the problem*.
>

RDF in a separate XML-syntax file, perhaps. Just because that use case 
raised a privacy concern on informations to keep private anyway, and 
that's not a problem solvable at the document level with metadata; 
instead, keeping relevant metadata in a separate file would help a 
better access control. Also, a separate file would have the relevant 
informations ready for use, while embedding them with other content 
would force a load and parsing of the other content in search of 
relevant metadata (possible, of course, and not much of a problem, but 
not as clean and efficient).

Moreover, it should be verified whether social-network service providers 
agree with such a requirement: I might avail of a compliant 
implementation to easily migrate from one service to another and leave 
the former, in which case why should a company open its inner 
infrastructure and database and invest resources for the benefit of a 
competitor accessing its data and consuming its bandwidth to catch its 
customers? (this is not the same interoperability issue for mail clients 
supporting different address book formats, minor vendors had to do that 
to improve their businness - and they didn't need to access a 
competitor's infrastructure).

Perhaps, that might work if personal infos and relationships were 
handled by an external service on the same lines of an OpenID service 
allowing an automated identification by other services; but this would 
reduce social networks to be a kind of front-end for such a centralized 
management (and service providers might not like that). Also, in this 
case anonimity should be ensured (for instance, I might have met you in 
two different networks, but knew your identity in only one of them, and 
you might wish that no one knew you're the person behind the other 
nickname; this is possible taking different informations in different 
databases and with different access rights, and should be replicable 
when merging such infos -- on the other hand, if you knew my identity, 
you should be allowed to "fill in the blanks" somehow).

Shelley Powers ha scritto:
> Anne van Kesteren wrote:
>> On Sun, 18 Jan 2009 17:15:34 +0100, Shelley Powers 
>> <shelleyp@burningbird.net> wrote:
>>> And regardless of the fact that I jumped to conclusions about WhatWG 
>>> membership, I do not believe I was inaccurate with the earlier part 
>>> of this email. Sam started a new thread in the discussion about the 
>>> issues of namespace and how, perhaps we could find a way to work the 
>>> issues through with RDFa. My god, I use RDFa in my pages, and they 
>>> load fine with any browser, including IE. I have to believe its 
>>> incorporation into HTML5 is not the daunting effort that others make 
>>> it seem to be.'
>>
>> You ask us to take you seriously and consider your feedback, it would 
>> be nice if you took what e.g. Henri wrote seriously as well. 
>> Integrating a new feature in HTML is not a simple task, even if the 
>> new feature loads and renders fine in Internet Explorer.
>>
> Take you guys seriously...OK, yeah.
>
> I don't doubt that the work will be challenging, or problematical. I'm 
> not denying Henri's claim. And I didn't claim to be the one who would 
> necessarily come up with the solutions, either, but that I would help 
> in those instances that I could. 

It seems that you'd expect RDFa to be specced out before solving related 
problems (so to push their solution). I don't think that's the right 
path to follow, instead known issues must be solved before making a 
decision, so that the specification can tell exactly what developers 
must implement, eventually pointing out (after/while implementing) newer 
(hopefully minor) issues to be solved by refining the spec (which is a 
different task than specifying something known to be, let's say, "buggy" 
or uncertain).


Everything, as always, IMHO

WBR, Alex


 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Blu American Express: gratuita a vita!
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8614&d=4-2

attached mail follows:



> Now wait a second, you're changing the parameters of the requirements.
> Before, the criteria was based on the DOM. Now you're saying that the
> browsers actually have to do with something with it.

[Put "almost" in front of most words in the following.]

The consistent DOM criteria is necessary but not not sufficient.

Browsers doing something with it is a step towards sufficient.

Without DOM consistency (or at least an agreed workaround), it almost
doesn't matter how great RFDa is, because it isn't compatible.  Once
you have that consistency, then the questions can move on to the next
step.

That next step boils down to "Why bother?"  Needing DOM integration of
the information is a reason to bother.  Browsers doing something with
it is a reason to bother.  Those aren't the only reasons to bother,
but they are likely reasons, so people have asked about them.  If you
have other reasons, go ahead and offer those as well.  (But "existing
W3C standard" probably isn't strong enough.)

-jJ

attached mail follows:



RDFa should sink or swim on its own merits, and if RDFa requires
drastic changes to HTML, it is probably broken. Let the compelling
benefits of RDFa pave the way to implementations, and then standardize
based on experience with those.

RDFa should not be blessed by HTML, and the HTML spec should adopt a
similar stance to all new features. For example, I would be very
surprised to see Web Sockets fail on its own, since the benefits seem
clear. But I could be wrong, and it should face a survival test.

-- 

Robert Sayre

"I would have written a shorter letter, but I did not have the time."

attached mail follows:



Dan Brickley wrote:

> ... I guess the fact that @property is supposed to be CURIE-only  
> isn't a
> problem with parsers since this can be understood as a CURIE with  
> no (or
> empty) substitution token.

Actually, most RDFa parsers will break if full URIs are used in RDFa  
attributes: in RDFa all CURIEs need a prefix which is a string of  
zero or more alphanumeric characters, dashes and hyphens followed by  
a colon (and yes, the empty string is allowed - but it is permanently  
bound to <http://www.w3.org/1999/xhtml/vocab#>). The proposed  
recommendation (IIRC, that's the current status) for CURIEs *does*  
actually allow for unprefixed CURIES, but RDFa enforces extra  
conditions. (As it was published before the CURIE spec, which is a  
spin-off of RDFa.)

A suggestion I've heard for using full URIs is:

	<html xmlns:http="http:">
	  <title property="http://purl.org/dc/terms/title">Foo</title>
	</html>

Which should theoretically work according to the reference algorithm  
in the RDFa syntax document, however it does (I believe) break the  
XML Namespaces spec. (Though that wouldn't be a problem if an  
alternative, non-xmlns, syntax were adopted for CURIE prefix  
binding.) It wouldn't surprise me if a few RDFa parsers had issues  
with this caused by the front-end XML parser they use.

So RDFa, as it is currently defined, does need a CURIE binding  
mechanism. XML namespaces are used for XHTML+RDFa 1.0, but given that  
namespaces don't work in HTML, an alternative mechanism for defining  
them is expected, and for consistency would probably be allowed in  
XHTML too - albeit in a future version of XHTML+RDFa, as 1.0 is  
already finalised. (I don't speak for the RDFa task force as I am not  
a member, but I would be surprised if many of them disagreed with me  
strongly on this.)

Back to when I said "most RDFa parsers will break if full URIs are  
used in RDFa attributes". The Perl library RDF::RDFa::Parser doesn't,  
so if you want to do any testing with full URIs, it can be found on  
CPAN. Full URIs are a pain to type though - I certainly prefer using  
CURIEs.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>




attached mail follows:



Toby A Inkster wrote:
> So RDFa, as it is currently defined, does need a CURIE binding
> mechanism. XML namespaces are used for XHTML+RDFa 1.0, but given that
> namespaces don't work in HTML, an alternative mechanism for defining
> them is expected, and for consistency would probably be allowed in XHTML
> too - albeit in a future version of XHTML+RDFa, as 1.0 is already
> finalised. (I don't speak for the RDFa task force as I am not a member,
> but I would be surprised if many of them disagreed with me strongly on
> this.)

Speaking as an RDFa Task Force member - we're currently looking at an
alternative prefix binding mechanism, so that this:

xmlns:foaf="http://xmlns.com/foaf/0.1/"

could also be declared like this in non-XML family languages:

prefix="foaf=http://xmlns.com/foaf/0.1/"

The thought is that this prefix binding mechanism would be available in
both XML and non-XML family languages.

The reason that we used xmlns: was because our charter was to
specifically create a mechanism for RDF in XHTML markup. The XML folks
would have berated us if we created a new namespace declaration
mechanism without using an attribute that already existed for exactly
that purpose.

That being said, we're now being berated by the WHATWG list for doing
the Right Thing per our charter... sometimes you just can't win :)

I don't think that the RDFa Task Force is as rigid in their positions as
some on this list are claiming... we do understand the issues, are
working to resolve issues or educate where possible and desire an open
dialog with WHATWG.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.1 Website Launch
http://blog.digitalbazaar.com/2009/01/16/bitmunk-3-1-website-launch

attached mail follows:



Manu Sporny <msporny@digitalbazaar.com>, 2009-01-18 19:18 -0500:

> Speaking as an RDFa Task Force member - we're currently looking at an
> alternative prefix binding mechanism, so that this:
> 
> xmlns:foaf="http://xmlns.com/foaf/0.1/"
> 
> could also be declared like this in non-XML family languages:
> 
> prefix="foaf=http://xmlns.com/foaf/0.1/"

Is there a draft spec proposal for that available yet? Or maybe a
URL for an archived mailing-list discussion about it?

  --Mike

-- 
Michael(tm) Smith
http://people.w3.org/mike/

attached mail follows:



"Michael(tm) Smith" <mike@w3.org>, 2009-01-19 17:40 +0900:

> Manu Sporny <msporny@digitalbazaar.com>, 2009-01-18 19:18 -0500:
>
> > prefix="foaf=http://xmlns.com/foaf/0.1/"
> 
> URL for an archived mailing-list discussion about it?

OK, I found this:

  http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Jan/thread.html#msg74
  http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Jan/0074.html


-- 
Michael(tm) Smith
http://people.w3.org/mike/

attached mail follows:



Michael(tm) Smith wrote:
> "Michael(tm) Smith" <mike@w3.org>, 2009-01-19 17:40 +0900:
> 
>> Manu Sporny <msporny@digitalbazaar.com>, 2009-01-18 19:18 -0500:
>>
>>> prefix="foaf=http://xmlns.com/foaf/0.1/"
>> URL for an archived mailing-list discussion about it?
> 
> OK, I found this:
> 
>   http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Jan/thread.html#msg74
>   http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Jan/0074.html

I believe that the thread started here, @prefix is a small part of the
conversation:

http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2008Sep/0001.html

... it's fairly involved, and a good bit has changed since the
discussion back in September. The goal, though, is to provide a non-XML
mechanism for declaring CURIE prefixes.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.1 Website Launch
http://blog.digitalbazaar.com/2009/01/16/bitmunk-3-1-website-launch

attached mail follows:



On Jan 19, 2009, at 02:18, Manu Sporny wrote:

> Toby A Inkster wrote:
>> So RDFa, as it is currently defined, does need a CURIE binding
>> mechanism. XML namespaces are used for XHTML+RDFa 1.0, but given that
>> namespaces don't work in HTML, an alternative mechanism for defining
>> them is expected, and for consistency would probably be allowed in  
>> XHTML
>> too - albeit in a future version of XHTML+RDFa, as 1.0 is already
>> finalised. (I don't speak for the RDFa task force as I am not a  
>> member,
>> but I would be surprised if many of them disagreed with me strongly  
>> on
>> this.)
>
> Speaking as an RDFa Task Force member - we're currently looking at an
> alternative prefix binding mechanism, so that this:
>
> xmlns:foaf="http://xmlns.com/foaf/0.1/"
>
> could also be declared like this in non-XML family languages:
>
> prefix="foaf=http://xmlns.com/foaf/0.1/"
>
> The thought is that this prefix binding mechanism would be available  
> in
> both XML and non-XML family languages.

Considering recent messages in this thread, using full URIs and  
refraining from declaring 'http' as a namespace prefix in XHTML would  
be more backwards compatible than minting a new attribute called  
'prefix'. (I haven't verified the test results about using full URIs  
myself.)

Even though switching over to 'prefix' in both HTML and XHTML would  
address the DOM Consistency concern, using them for RDF-like URI  
mapping would as opposed to XML names would remove the issue of having  
to pass around compound values and putting them on the same layer on  
the layer cake would remove most objections related to qnames-in- 
content, some usual problem with Namespaces in XML would remain:
  * Brittleness under copy-paste due to prefixes potentially being  
declared far away from the use of the prefix in source.
  * Various confusions about the prefix being significant.
  * The problem of generating nice prefixes algorithmically without  
maintaining a massive table of a known RDF vocabularies.
  * Negative savings in syntax length when I given prefix is only used  
a couple of times in a file.

> The reason that we used xmlns: was because our charter was to
> specifically create a mechanism for RDF in XHTML markup. The XML folks
> would have berated us if we created a new namespace declaration
> mechanism without using an attribute that already existed for exactly
> that purpose.

The easy way to avoid accusations of inventing another declaration  
mechanism is not to have a declaration mechanism.

URIs already have namespacing built into their structure. You seem to  
be taking as a given that there needs to be an indirection mechanism  
for declaring common URI prefixes. As far as I can tell, an  
indirection mechanism isn't a hard requirement flowing from the RDF  
data model. After all, N-Triples don't have such a mechanism.

> That being said, we're now being berated by the WHATWG list for doing
> the Right Thing per our charter... sometimes you just can't win :)

Groups have a say on what goes into their charter, so it's not like a  
group is powerlessly following a charter forced upon it entirely from  
the outside. :-)

> I don't think that the RDFa Task Force is as rigid in their  
> positions as
> some on this list are claiming... we do understand the issues, are
> working to resolve issues or educate where possible and desire an open
> dialog with WHATWG.

Great!

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



Just a couple of clarifications - not trying to convince anybody of
anything, just setting the record straight.

Henri Sivonen wrote:
> Even though switching over to 'prefix' in both HTML and XHTML would
> address the DOM Consistency concern, using them for RDF-like URI mapping
> would as opposed to XML names would remove the issue of having to pass
> around compound values and putting them on the same layer on the layer
> cake would remove most objections related to qnames-in-content, some
> usual problem with Namespaces in XML would remain:
>
>  * Brittleness under copy-paste due to prefixes potentially being
> declared far away from the use of the prefix in source.
>  * Various confusions about the prefix being significant.

There does not seem to be agreement or data to demonstrate just how
significant these "issues" are... to some they're minor, to others
major. I'm not saying it isn't an issue. It certainly is an issue, but
one that was identified as having little impact. RDFa, by design, does
not generate a triple unless it is fairly clear that the author intended
to create one. Therefore, if prefix mappings are not specified, no
triples are generated. In other words, no bad data is created as a
result of a careless cut/paste operation.

The author will notice the lack of triple generation when checking the
page using a triple debugging tool such as Fuzzbot (assuming that they
care).

>  * The problem of generating nice prefixes algorithmically without
> maintaining a massive table of a known RDF vocabularies.

This is a best-practices issue and one that is a fairly easy problem to
solve with a wiki. Here's an example of one solution to your issue:

http://rdfa.info/wiki/best-practice-standard-prefix-names

>  * Negative savings in syntax length when I given prefix is only used a
> couple of times in a file.

The cost of specifying the prefix for foaf, when foaf is only specified
once in a document, is:

len("xmlns:foaf='http://xmlns.com/foaf/0.1/'")
   + len("foaf:")
   - len("http://xmlns.com/foaf/0.1/") == 18 characters

The cost of specifying the prefix for foaf, when foaf is used two times
in a document is:

len("xmlns:foaf='http://xmlns.com/foaf/0.1/'")
   + len("foaf:")
   - len("http://xmlns.com/foaf/0.1/")*2 == -8 characters

So, in general, your setup cost is re-couped if you have more than 1
instance of the prefix in a document... which was one of the stronger
reasons for providing a mechanism for specifying prefixes in RDFa.

>> The reason that we used xmlns: was because our charter was to
>> specifically create a mechanism for RDF in XHTML markup. The XML folks
>> would have berated us if we created a new namespace declaration
>> mechanism without using an attribute that already existed for exactly
>> that purpose.
> 
> The easy way to avoid accusations of inventing another declaration
> mechanism is not to have a declaration mechanism.
> 
> URIs already have namespacing built into their structure. You seem to be
> taking as a given that there needs to be an indirection mechanism for
> declaring common URI prefixes. As far as I can tell, an indirection
> mechanism isn't a hard requirement flowing from the RDF data model.

We did not take the @prefix requirement as a given, it was a requirement
flowing from the web authoring community (the ones that still code HTML
and HTML templates by hand), the use cases, as well as the RDF
community. I would expect the HTML5 LC or CR comments to reflect the
same requirements if WHATWG were to adopt RDFa without support for CURIEs.

> After all, N-Triples don't have such a mechanism.

You are correct - N-Triples do not... however, Turtle, Notation 3, and
RDF/XML do specify a prefixing mechanism. Each do so because it was
deemed useful by the people and workgroups that created each one of
those specifications.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.1 Website Launch
http://blog.digitalbazaar.com/2009/01/16/bitmunk-3-1-website-launch

attached mail follows:



If this was discussed already, sorry. There has been so much RDF/meta
data discussion that I'm far from on top of it..

I'd like some way to add meta data to a page that could be integrated
with the UA's copy/paste commands.

For example, if I copy a sentence from Wikipedia and paste it in some
word processor, it would be great if the word processor offered to
automatically create a bibliographic entry.

If I copy the name of one of my Facebook "friends" and paste it into
my OS address book, it would be cool if the contact information was
imported automatically. Or maybe I pasted it in my webmail's address
book feature, and the same import operation happened..

If I select an E-mail in my webmail and copy it, it would be awesome
if my desktop mail client would just import the full E-mail with
complete headers and different parts if I just switch to the mail
client app and paste.

To make such use cases possible I suppose what we need is
a) some way to embed standardised interchangeable meta data in HTML
(so that users can copy from regular web pages)
b) some support in the UA for figuring out what meta data applies to a
selection and, say, place three alternative formats on the clipboard:
   1) text/plain
   2) text/html
   3) application/metasomething+xml
c) support in other applications for detecting the third format on the
clipboard, parsing and using it. For example, a web application might
use the HTML5 clipboard data API to detect the meta data, parse it
with the UA's XML parser, and figure out if it was data it could make
use of.
Most applications would use *both* the regular text (plain or HTML)
format and the meta data.

Would anyone use this?

I think that actually some of the functionality we would enable here
would be so compelling that users would request it. If, for example,
Wikipedia -> OpenOffice pasting created automatic bibliography entries
users would start asking why Encyclopedia Britannica -> Microsoft Word
did not. If Myspace.com let you copy a selected contact and paste in
some webmail or OS address book, Facebook users would start several
Facebook groups trying to get it "working" there.

-- 
Hallvord R. M. Steen

attached mail follows:



Hallvord R M Steen wrote:
> I'd like some way to add meta data to a page that could be integrated
> with the UA's copy/paste commands.

These use cases are a good start, but the problem is that you've begun 
with the assumption that copy and paste would be a part of the solution.

> For example, if I copy a sentence from Wikipedia and paste it in some
> word processor, it would be great if the word processor offered to
> automatically create a bibliographic entry.

Do you mean a bibliographic entry that references the source web site, 
and included information such as the URL, title, publication date and 
author names?  That could be a useful feature, even if it could only 
obtain the URL and title easily.

Often, when writing an article that quotes several websites, it's a time 
consuming process to copy and paste the quote, then the page or article 
title and then the URL to link to it.  An editor with a Paste as 
Quotation feature which helped automate that would be useful.

HTML5 already contains elements that can be used to help obtain this 
information, such as the <title>, <article> and it's associated heading 
<h1> to <h6> and <time>.  Obtaining author names might be a little more 
difficult, though perhaps hCard might help.

> If I copy the name of one of my Facebook "friends" and paste it into
> my OS address book, it would be cool if the contact information was
> imported automatically. Or maybe I pasted it in my webmail's address
> book feature, and the same import operation happened..

I believe this problem is adequately addressed by the hCard microformat 
and various browser extensions that are available for some browsers, 
like Firefox.  The solution doesn't need to involve a copy and paste 
operation.  It just needs a way to select contact info on the page and 
export it to an address book.  There are even web services that will 
parse an HTML page and output a vCard file that can be imported directly 
into address book programs.

> If I select an E-mail in my webmail and copy it, it would be awesome
> if my desktop mail client would just import the full E-mail with
> complete headers and different parts if I just switch to the mail
> client app and paste.

Couldn't this be solved by the web mail server providing an export 
feature which let the user download the email as an .eml file and open 
it with their mail client?  Again, I don't believe the solution to this 
requires a copy and paste operation.  However, I'm not sure what problem 
you're trying to solve.  Why would a user want to do this?  Why can't 
users who want to access their email using a mail client use POP or IMAP?

-- 
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/

attached mail follows:



>> I'd like some way to add meta data to a page that could be integrated
>> with the UA's copy/paste commands.
>
> These use cases are a good start, but the problem is that you've begun with
> the assumption that copy and paste would be a part of the solution.

That's not a bug, it's a feature :)

Ian said a while ago that coming up with end-user friendly UI ideas
for the RDF stuff was harder than doing the technical work - implying
that if there are no good UI ideas, browser vendors would not find a
nice way to let users use metadata, and many of the use cases for
embedding it in HTML would not really be feasible.

Thus, this *is* a UI proposal, aiming to show that an operation nearly
*all* users are familiar with could be enhanced with richer ways to
embed meta data in HTML.

>> For example, if I copy a sentence from Wikipedia and paste it in some
>> word processor, it would be great if the word processor offered to
>> automatically create a bibliographic entry.
>
> Do you mean a bibliographic entry that references the source web site, and
> included information such as the URL, title, publication date and author
> names?

Exactly.

>  That could be a useful feature, even if it could only obtain the URL
> and title easily.
>
> Often, when writing an article that quotes several websites, it's a time
> consuming process to copy and paste the quote, then the page or article
> title and then the URL to link to it.  An editor with a Paste as Quotation
> feature which helped automate that would be useful.

It would be great. I hate the clumsy back-and-forward switching to
copy/paste all those bits of information ;-p

> HTML5 already contains elements that can be used to help obtain this
> information, such as the <title>, <article> and it's associated heading <h1>
> to <h6> and <time>.  Obtaining author names might be a little more
> difficult, though perhaps hCard might help.

Indeed. And it's not an either-or counter-suggestion to my proposal,
UAs could fall back to extracting such data if more structured meta
data is not available.

>> If I copy the name of one of my Facebook "friends" and paste it into
>> my OS address book, it would be cool if the contact information was
>> imported automatically. Or maybe I pasted it in my webmail's address
>> book feature, and the same import operation happened..
>
> I believe this problem is adequately addressed by the hCard microformat and
> various browser extensions that are available for some browsers, like
> Firefox.  The solution doesn't need to involve a copy and paste operation.
>  It just needs a way to select contact info on the page and export it to an
> address book.

This is way more complicated for most users. Your last sentence IMO is
not an appropriate way to use the word "just", seeing that you need to
find and invoke an "export" command, handle files, find and invoke an
"import" command and clear out the duplicated entries.. This is
impossible for several users I can think of, and even for techies like
us doing so repeatedly will eventually be a chore (even if we CAN, it
doesn't mean that's the way we SHOULD be working).

Besides, it doesn't really address the "copy ONE contact's
information" use case well.

Also, should any program that wants to support copy-and-paste of
contact information have to support text/html parsing and look for
class="" values? I guess that would be quite some work for the rather
limited functionality microformats gives you. It would be better with
a microformat-aware UA generating a common meta data interchange
format for the clipboard, and from there there it seems a small step
to allow web page authors to embed richer meta data the UA can use to
generate the clipboard meta data, right there in their HTML.

>> If I select an E-mail in my webmail and copy it, it would be awesome
>> if my desktop mail client would just import the full E-mail with
>> complete headers and different parts if I just switch to the mail
>> client app and paste.
>
> Couldn't this be solved by the web mail server providing an export feature
> which let the user download the email as an .eml file and open it with their
> mail client?

Of course, that or POP/IMAP access is the way things currently work.

> Again, I don't believe the solution to this requires a copy
> and paste operation.

..but I think it would be more intuitive and user friendly if
something like that worked. (Or drag-and-drop an E-mail from the
webmail to the desktop client/file system/other webmail, which is
basically the same thing).

> However, I'm not sure what problem you're trying to
> solve.  Why would a user want to do this?  Why can't users who want to
> access their email using a mail client use POP or IMAP?

Granted this use case is a bit more far-fetched (but I know people who
copy E-mails from their Outlook and paste in Windows Explorer! - for
"backing up" or archiving a message they want to keep).

-- 
Hallvord R. M. Steen

attached mail follows:



Hallvord R M Steen ha scritto:
>   
>> HTML5 already contains elements that can be used to help obtain this
>> information, such as the <title>, <article> and it's associated heading <h1>
>> to <h6> and <time>.  Obtaining author names might be a little more
>> difficult, though perhaps hCard might help.
>>     
>
> Indeed. And it's not an either-or counter-suggestion to my proposal,
> UAs could fall back to extracting such data if more structured meta
> data is not available.
>
>   

I think that's a counter-suggestion, instead. If UAs can gather enough 
informations from existing markup, they don't need to support further 
metadata processing; if authors can put enough informations in a page 
within existing markup (or markup being introduced in current 
specification), they don't need to learn and use additional metadata to 
repeat the same informations. It seems that any additional 
metadata-related markup would add complexity to UAs (requiring support) 
but not advantages (with respect to existing solutions) in this case.

Therefore, the question moves to the format to use to move such infos to 
the clipboard, which is a different concern than embedding metadata in a 
page. Also, different use cases should lead to different formats (with 
different kind of informations taken apart in different clipboard 
entries, or binded in a sort of multipart envelop to be serialized in 
just one entry), because a generic format, addressing a lot of use 
cases, could seem overengineered to developers dealing with a specific 
use case, thus a specific format could gain support in other 
applications more easily --- third parties developers could find easier 
and more consistent to get access to the right infos in the right 
format, either by looking for a specific entry (if supported by the OS), 
or by parsing a few headers in a multipart entry looking for an offset 
associated with a mime-type (which would work without requiring support 
by OS's, but an OS could provide facilities to directly access to a 
proper section anyway; however, any support for multiple kinds of infos 
should be in scope for the OS clipbord API and/or the UA, not for a 
specific application requiring specific data - and, given the above, 
that should not be in scope for HTML5).

>>> If I copy the name of one of my Facebook "friends" and paste it into
>>> my OS address book, it would be cool if the contact information was
>>> imported automatically. Or maybe I pasted it in my webmail's address
>>> book feature, and the same import operation happened..
>>>       
>> I believe this problem is adequately addressed by the hCard microformat and
>> various browser extensions that are available for some browsers, like
>> Firefox.  The solution doesn't need to involve a copy and paste operation.
>>  It just needs a way to select contact info on the page and export it to an
>> address book.
>>     
>
> This is way more complicated for most users. Your last sentence IMO is
> not an appropriate way to use the word "just", seeing that you need to
> find and invoke an "export" command, handle files, find and invoke an
> "import" command and clear out the duplicated entries.. This is
> impossible for several users I can think of, and even for techies like
> us doing so repeatedly will eventually be a chore (even if we CAN, it
> doesn't mean that's the way we SHOULD be working).
>   

It can be improved, but it's the _best_ way to do that, and should be 
replicated in the "copy-and-paste" architecture you're proposing. 
Please, consider a basic usability principle says users should be able 
to understand what's going on basing on previous experience (that is, an 
interface have to be predictable); but users aren't confident with 
copying and pasting something different than text (in general), thus a 
UA should distinguish among a bare "copy" option, and more specific 
actions (such as "copy as quotation", "copy contact info", and so on), 
and related paste options (as needed), so that users can understand and 
choose what they want to do.

On the other hand, the same should happen in a recipient application, 
especially if providing support for different kinds of info; if either a 
UA or a recipient application (or both) provided a simple copy and a 
simple paste option (or fewer options than supported, basing on metadata 
or common markup) it could be confusing for users, nor should 
applications use metadata to choose what to do, because the user could 
just want to copy and paste some text (or do something else, but he 
knows what, so he must be free to choose it).

That is, what you're proposing is mainly addressed by moving around 
import/export features to put them into a context menu and making them 
work on a selection of text (not eliminating and substituting them with 
a "simpler" copy-paste architecture), then requiring support by other 
applications and eventually by the operative system, which is definetly 
out-of-scope for any web-related standards (we can constrain web-related 
applications to improve their interoperability with respect to 
web-related features, not generic client-only applications and/or 
operative systems to create a "brand-new" interaction and 
interoperability - and UA implementors wouldn't be happy to implement 
something they know to be incompatible with existing platforms).

> Besides, it doesn't really address the "copy ONE contact's
> information" use case well.
>
>   

Assuming social-networks service provider wanted to support it, I think 
the best way to accomplish this case is to take metadata modelling a 
contact info in a separate, non-html file, so to provide a better 
control on sensible data and enforce privacy, and to expose it as a 
linked resource, accessible with proper rights (and modified server-side 
according to lower- or heigher-level rights). For instance, a nickname 
in a page could be part of an anchor pointing to a homepage, with an 
associated context menu linking to exportable metadata. This would work 
in about every UAs: a compliant one could recognize the metadata format 
while fetching it (eg through its headers, or sniffing its content), 
copy it to the clipboard in an appropriate manner, and notify it to the 
user, or activate a proper plugin associated with a proper mime-type (or 
just ask the user what to do); a non-compliant one could just show it as 
plain text, that users could copy and paste as serialized metadata in an 
application supporting such a format, which could fit the purpose very 
well and thus be largely supported as a contact info interchange format 
(a plugin associated to a mime-type would work in this case too, as well 
as telling the user what to do - save locally, open with an external 
application, convert to something else if possible, and so on).

>   
>>> If I select an E-mail in my webmail and copy it, it would be awesome
>>> if my desktop mail client would just import the full E-mail with
>>> complete headers and different parts if I just switch to the mail
>>> client app and paste.
>>>       
>> Couldn't this be solved by the web mail server providing an export feature
>> which let the user download the email as an .eml file and open it with their
>> mail client?
>>     
>
> Of course, that or POP/IMAP access is the way things currently work.
>
>   
>> Again, I don't believe the solution to this requires a copy
>> and paste operation.
>>     
>
> ..but I think it would be more intuitive and user friendly if
> something like that worked. (Or drag-and-drop an E-mail from the
> webmail to the desktop client/file system/other webmail, which is
> basically the same thing).
>
>   

I think that drag-and-drop would work better in this case, and without 
necessarily requiring a clipboard mechanism (that is, differently from a 
copy-and-paste operation).

For instance, a webmail interface could provide a link (named ".eml 
version" or "get a copy" or "save locally" or the alike) to an eml file 
(that is, to the mail entry in a server database dynamically extracted 
and served with proper headers when queried -- this is yet done, 
somehow, by some webmails). Given that, a user would have two choices:

1) Just follow the link, so that his UA would recognize a non-html 
document and,

* open it through a plugin;
* open it through an external program;
* ask the user what to do, whithin the option to save the file locally;

this is consistent with other similar operations users are confident 
with (such as opening a pdf file), and doesn't require any particular 
further support, neither from the UA, nor from the OS, nor from the 
recipient application. Of course, the user could also right click the 
link and select "save as" (or "save target as", or whatever else it is 
labeled).

2) Drag-and-drop the link to the desktop, or to another application, so 
that either a symlink (or an "internet shortcut" or whatever else it is 
called) or a .eml file (to avoid authentication issues) is created (on 
the desktop, or in a temporary folder, as needed). This can work quite 
fine when dragging to the desktop (it should create a symlink in most 
platforms), and require a little more support to create a .eml file 
(this is yet possible on some platforms always asking what to do with a 
dragged 'object') and/or to improve direct drag-and-drop between 
applications (actually it may not work or produce the same effect as 
copying and pasting the resource address).

In the latter case (direct drag-and-drop), recipient applications could 
recognize the dragged text as a URI and try and open it the same way 
they open a local file and follow a symbolic link, thus reducing 
requirements for OS support; authentication issues could be solved by 
supporting http authentication (or a form-based challange) either in a 
library used by recipient applications, or even in a system library 
(freely implemented by an OS, without any explicit requirement), which 
could be an improvement for generic resources location and access 
(specially when dealing with symlinks), therefore possibly useful in 
other contexts than just enhancing the interoperability between a 
browser and other applications, and therefore more likely to be implemented.

The overall mechanism (normal link to a resource + normal/little 
improved drag-and-drop of the link) should work very fine in most 
platforms and fall back gracefully in other ones, both because in part 
it works fine as is, and in part it would require, globally, fewer work 
to support it than to support a copy-and-paste mechanism based on 
metadata (or an effort which could be useful in different contexts than 
just a rich copy-and-paste between applications, whereas it is 
out-of-scope for html5, and will remain such until some experimentations 
will have been made on some platforms by some UA and non-UA applications).

On the other hand, a copy-and-paste mechanism (working as if saving a 
.eml file, or just putting metadata into the clipboard) would reduce or 
eliminate authentication issues (though constraining a strong OS 
support, which is a hard goal for a web standard), but I think that's 
less usable, all things considered. Let's consider the case of a message 
containing a big attachment, such as a pdf file, which is never 
immediately downloaded by any webmails and IMAP clients: a UA could hang 
over while downloading it as the result of a copy operation, blocking 
the following paste (if not the whole OS clipboard, if locked -- not 
doing so could cause a wrong paste by a user not caring that his browser 
is still working and trying to paste something immediately hence, as 
usually possible). I think that an immediate drag-and-drop, with the 
recipient application (eventually a window manager) handling the 
download is a better solution (after all, that's what usually happens 
when opening a big file from a slow source).

WBR, Alex
 
 
 --
 Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
 
 Sponsor:
 Blu American Express: gratuita a vita!
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8614&d=4-2

attached mail follows:



2009/1/20 Jamie Rumbelow <jamie@jamierumbelow.net>:
> I think that the already available solution to your problem are Microformats
> - you are essentially embedding metadata, semantically in HTML.

Of course, but I think your comment misses half of the proposed
solution.. namely what format the UA puts the information on the
clipboard in.

If you say microformats is the solution, I assume you mean UAs should
put HTML fragments with microformat-type attributes and values (mainly
class) on the clipboard as text/html, and applications that were
targetted by a paste operation should have HTML parsers and implement
support for specific microformats.
Which is why you added:

> Beside this, the applicability is rather specific - every application would
> need built in support and every website would have to markup the data in a
> specific way to support the application's format.
>
> This could get far too confusing and complicated...

It would not necessarily need support from the website - the UA could
have some logic to create associated meta data (URL, title, possibly
author from <META> tags though that wouldn't be very reliable) for the
bibliographic stuff if the page did not contain more specific meta
data for this purpose.

With Facebook I could write a Facebook application to generate the
meta data format - Facebook would not really need to support this.

With any other website I could add a User JavaScript or Greasemonkey
script that was aware of that site's markup and could extract the
information in a site-specific way and make it available to the UA as
HTML-embedded meta data..

-- 
Hallvord R. M. Steen

attached mail follows:



Hallvord R M Steen wrote:
> 2009/1/20 Jamie Rumbelow <jamie@jamierumbelow.net>:
>> I think that the already available solution to your problem are Microformats
>> - you are essentially embedding metadata, semantically in HTML.
> 
> Of course, but I think your comment misses half of the proposed
> solution.. namely what format the UA puts the information on the
> clipboard in.

Determining how one application passes information via the clipboard to 
another application seems very much out of scope of HTML.  If there was 
such a method available, then we could investigate how to obtain the 
relevant semantics from the document.  But we can't do that until there 
is some clipboard format available for this purpose that other 
applications can understand.

I doubt that it would be possible to create some generic format that 
would be suitable for such a wide range of use cases.  For an address 
book application, the most sensible approach would be to add a vCard 
format (text/directory;profile=vcard) to the clipboard.  Given that many 
address books already support the vCard file format, it's not such a 
stretch to believe that they could read the data in that format from the 
clipboard.  The problem is getting them to support an import from 
clipboard feature.

However, other use cases, like pasting a quote into a word processor 
complete with bibliographic information, would need an entirely 
different format.

-- 
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/

attached mail follows:



>> Of course, but I think your comment misses half of the proposed
>> solution.. namely what format the UA puts the information on the
>> clipboard in.
>
> Determining how one application passes information via the clipboard to
> another application seems very much out of scope of HTML.

If we keep considering clipboard support "out of scope" it means web
applications will continue to SUCK at copy/paste support. Copy&paste /
drag&drop is a UI workhorse the Web and its applications generally
can't take much advantage of now. We should do something about that
(where "we" is not necessarily the WHATWG/HTML5 WG, it might also and
possibly more likely be a WebApps WG task - I don't care where it's
done as long as it is done.).

>  If there was such
> a method available, then we could investigate how to obtain the relevant
> semantics from the document.  But we can't do that until there is some
> clipboard format available for this purpose that other applications can
> understand.
>
> I doubt that it would be possible to create some generic format that would
> be suitable for such a wide range of use cases.

That, of course, is what the RDF people claim to be doing. Whether it
makes sense and would get used I have no idea, but implementing some
rudimentary support for putting some RDF-markup on the clipboard and
retrieving it would let the Web have a go at figuring out if it IS
usable for information exchange, and shouldn't take too much work if
the generic clipboard API is in place. That's why I like this idea -
from my naturally browser-vendor-centric perspective :-)

> For an address book
> application, the most sensible approach would be to add a vCard format
> (text/directory;profile=vcard) to the clipboard.

I assume you'll answer the "where should the UA find the structured
information in order to place it on the clipboard" question with
"vCard microformat".

-- 
Hallvord R. M. Steen

attached mail follows:



I spent some time last week on the phone with Ian Hickson (Editor of the
HTML5 spec) and some time today with Henri Sivonen (Mozilla Foundation's
representative for the HTMLWG and lead developer on the HTML5 validation
suite).

The goal of the phone calls were to help clear up some misconceptions
that I believed had been brewing in both work groups for a while. What
follows is a summary of the discussions. I don't necessarily agree with
anything below, but am passing it on so that there is a better
understanding of what the WHATWG is requesting and the major issues that
they see with RDFa as it stands right now.

The discussion with both Ian and Henri was very pleasant, friendly, and
rational - both were able to clearly outline the current issues that
they saw with RDFa and proposed a number of fairly straight-forward ways
of moving the RDFa in HTML5 discussion forward. I trust that both of
them will correct any mistakes that I make in conveying the gist of what
they said.

Discussion with Ian Hickson
---------------------------

Ian's primary message was that the RDFa proponents have not clearly
outlined a list of use cases and why such use cases should be supported
in HTML5. He is concerned that without this basic information, which is
required for most new HTML5 features, it becomes difficult for him to
argue that there is an empirical basis for placing RDFa into HTML5.

I have started to gather the list of use cases that started RDFa as well
as the list of use cases that have sprung up around RDFa over the past
several months (add more if you know of cases in the wild):

http://www.rdfa.info/wiki/rdfa-use-cases

Some of Ian's other issues included:

- He was concerned that a generalized solution such as RDF can always be
solved with a much more specific vocabulary and mark-up solution (such
as HTML5 or Microformats) and that the more specific mechanism for
semantics expression should be favored and supported. For example, there
is an <article> tag in HTML5, which is favored because it expresses
semantics and authors would probably be more prone to using it because
they can associate CSS with the element, where they cannot do the same
with the RDFa attributes. The same would apply for the <audio> and
<video> elements in HTML5. I believe that would prefer to focus on
allowing a markup mechanism that supported RDF as well as other
semantics expression mechanisms - IF, it can be shown that it is
desirable for authors to embed semantics in web pages.

- If a decision had to be made today, Ian believes that it would go to a
vote which would be dangerous for both WHATWG and RDFa because the
outcome would effectively be random. Nobody really knows what the 300+
members of WHATWG, ~60 of which vote on a regular basis, think about
RDFa. It would be safe to assume that a significant portion do not know
enough about RDFa to vote on it.

- He re-iterated that he does not have anything major against RDF or the
idea of semantics expression in HTML5. He does have certain issues with
RDFa, but would like to see the use cases that we worked from in order
to better understand the problem domain. He seemed sincere on working
towards a solution of some kind.

The discussion with Henri Sivonen will be in the next e-mail.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.1 Website Launch
http://blog.digitalbazaar.com/2009/01/16/bitmunk-3-1-website-launch

attached mail follows:



Discussion with Henri Sivonen
-----------------------------

Henri had two major disagreements with RDFa as it stands right now.

CURIEs
------

Henri stated that the use of xmlns: will break the DOM Consistency
Principle between XHTML and HTML. I let him know about @prefix, which
would address the DOM Consistency Principle issue, but would still not
be good enough in his mind. He is most concerned with HTML authors
cutting and pasting snippets of RDFa that they believe to contain
triples, but when pasted, fail to generate any because the prefix
mappings were declared at the top of a page.

He was also concerned that authors would get frustrated when their
cut/paste RDFa did not produce the same triples that they saw on website
X... or worse, they wouldn't check it and would see no benefit by
embedding RDFa in their website.

To address this issue, he (and Dan Brickley) believe that allowing
@typeof, @property, @datatype and @rel/@rev to accept URIs would address
this issue with the people in WHATWG that don't like CURIEs. They
believe CURIEs are too fragile as implemented and it would be better to
specify all URIs fully instead of providing a prefixing mechanism.

Non-expert usage and RDFa Education
-----------------------------------

Education about RDFa would also be an issue with the majority of web
authors who don't care about web semantics and just want to get their
page operational.

He was concerned that authors who once used rel="license" are now being
asked to embed more complex metadata (such as Creative Commons licenses)
without truly grasping the subtleties of doing so (attaching them to the
wrong subject). This is an issue because bad semantic markup doesn't
generate the same sort of jarring page display issue that bad HTML
markup does.

The problem is with less-than-guru web authors who don't necessarily
care about web semantics and thus generate bad semantic data out of
ignorance. The common mis-use of @rev was cited as one possible outcome
- mis-used so badly that it is commonly not trusted by search engines.

Vocabulary Scalability
----------------------

He also did not believe in follow-your-nose to be a very useful concept
and noted that even if we continue down the @prefix and
Microformats-like RDFa markup routes, with special tokens/reserved words
specified in a separate file, it could cause a scalability issue when
vocabularies find themselves quite popular. The example of the W3C
serving up many, many gigabytes of the same HTML4.01 DTD every day was
cited as an example of what happens when your "vocabulary" becomes popular.

He was concerned that by requiring parsers to load references from a
remote file that one would either put the burden on web authors to stash
those files on their web servers or put the burden on vocabulary authors
to ensure that their vocabulary document can be transmitted millions of
times a month.

He cited that using something of the form of a pseudo-namespaced
"foaf-foo" where each token was specified in a spec somewhere, but there
was no way to follow your nose to it, or validate against it, would
solve the "failure-due-to-popularity" issue.

Henri was certainly sympathetic to embedding semantics in HTML for
everyone that needed the functionality (not just the 80% that
Microformats addresses) in HTML. He believes that removing CURIEs would
go a long way towards addressing his concern with the way RDFa is
currently implemented.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.1 Website Launch
http://blog.digitalbazaar.com/2009/01/16/bitmunk-3-1-website-launch


attached mail follows:




Manu, Henri,

I appreciate the effort of this discussion, though I disagree with much
of Henri's points (as I have in the past). More importantly, I think a
number of these suggestions would do significant damage to the effort of
embedding semantics in HTML, and to at least one important web design
principle. And *most* importantly: the time for finding compromise on
issues of personal taste has come and gone.

I'm bothered by this desire to redesign based on little evidence. The
idea of specifically *not* allowing follow-your-nose flies in the face
of much of W3C's work and the recent TAG publication on the
self-describing web. High load on a W3C web server (due to poor
implementations) is not evidence enough to undo a major design principle
of web architecture.

(I can certainly agree with issuing some implementation guidelines that
say "don't de-reference unless you need to.")

On the issue of cut-and-paste: Creative Commons is, to my knowledge, the
biggest publisher of RDFa, and we haven't had much trouble getting users
to copy and paste proper RDFa. It's also been no problem getting folks
to add more complex ideas, like attribution name and URL (in fact, many
are pressing us to add more to our vocabulary, and we're being very
careful to do that only after serious consideration.)

The use case where someone copies and pastes partial HTML+RDFa from
someone's existing web site and gets upset doesn't ring true: the same
"problem" occurs in a much worse way with CSS, and no one seems to be
too upset about that. In addition, in a lot of cases where it *would*
make sense to copy and paste a chunk of HTML from one site to another
(Creative Commons, widgets, etc...), the prefixes are declared in the
same block of markup anyways.

It appears it mostly comes down to:

> Henri was certainly sympathetic to embedding semantics in HTML for
> everyone that needed the functionality (not just the 80% that
> Microformats addresses) in HTML. He believes that removing CURIEs would
> go a long way towards addressing his concern with the way RDFa is
> currently implemented.

Removing CURIEs is not an option at this point, given the existing
standard, the existing deployment (by folks including Yahoo),
backwards-incompatibility, and the lack of evidence for needing such a
change at this point.

If we were only a few months into designing RDFa with no implementations
or deployments, this discussion would make sense, as would some attempt
at finding a compromise based on personal taste.

But at this point, one has to present significant evidence of harm to
undo what otherwise seems to be working just fine.

-Ben

PS: Note that I do agree on DOM consistency, and I suspect @prefix will
fix that issue, as Manu mentioned. I've mentioned this to Henri in prior
conversations, I believe.

attached mail follows:




Ian,

In [1] you asked, quite rightly, for 'problem statements' re RDFa. I've
pointed out two (IMHO important) ones at [1] which you *might* have
overlooked. I'd be happy to learn from you if you think these are
'acceptable':

1. Service and product provider can't include the meaning of the things they
publish in HTML. For example, how do you find out where the price of a book
is located in, say, a page from Amazon? Now, people that want to use this
data are forced to perform *screen scraping*, that is, there is a need for
publisher-push rather than consumer-pull semantics.

2. People doing data mash-ups need to learn a plethora of APIs/formats while
all they would likely want is *one data model* + and a bunch of vocabularies
covering the domain.

Cheers,
      Michael

[1] 
http://realtech.burningbird.net/semantic-web/semantic-markup/stop-justifying
-rdfa

-- 
Dr. Michael Hausenblas
DERI - Digital Enterprise Research Institute
National University of Ireland, Lower Dangan,
Galway, Ireland, Europe
Tel. +353 91 495730
http://sw-app.org/about.html
http://webofdata.wordpress.com/


> From: Jeremy Carroll <jeremy@topquadrant.com>
> Date: Fri, 13 Feb 2009 12:47:55 -0800
> To: 'Manu Sporny' <msporny@digitalbazaar.com>, 'Ian Hickson' <ian@hixie.ch>,
> 'RDFa mailing list' <public-rdf-in-xhtml-tf@w3.org>
> Cc: 'Sam Ruby' <rubys@intertwingly.net>, 'Dan Brickley' <danbri@danbri.org>,
> 'Michael Bolger' <michael@michaelbolger.net>, <public-rdfa@w3.org>, 'Tim
> Berners-Lee' <timbl@w3.org>, 'Dan Connolly' <connolly@w3.org>
> Subject: RE: RDFa and Web Directions North 2009
> Resent-From: <public-rdfa@w3.org>
> Resent-Date: Fri, 13 Feb 2009 20:48:39 +0000
> 
> 
>> Ian Hickson wrote:
>>> To reiterate: I have approached and been approached by a number of people
>>> in the RDF and RDFa communities, and I have repeatedly asked for people to
>>> list problems and use cases that are of relevance in the context of RDFa
>>> and RDF, with those problem descriptions not mentioning RDFa or other
>>> technical solutions. So far we have seen *very few* of these.
>> 
>> We have been gathering a complete list of Use Cases for the HTML5+RDFa
>> discussion here:
>> 
>> http://rdfa.info/wiki/rdfa-use-cases
>> 
>> There are 18 use cases so far, not including the ones from the W3C. If
>> you have one that is not listed on that page, please add it to the wiki,
>> or notify the public-rdfa@w3.org mailing list so that we can add it to
>> the wiki.
>> 
> 
> 17 actually, numbered from 2 to 18
> 
> 2, 6, 15  : mentions RDF in the problem description (I think irredeemably -
> i.e. these are just too techy to be meaningful use cases)
> 
> 11, 12, 13, 14, 16: while being RDFa specific on the surface could easily be
> reworded not to be.
> I believe such rewording would be beneficial.
> 
> So I make it less than 10 that meet Ian's not unreasonable requirements, which
> could be increased to about 14.
> 
> Jeremy
> 
> 
> 
> Snapshot of wiki numbering:
> Contents
> 
>     * 1 Introduction
>     * 2 Resource List Management Tool for Undergraduate Students
>     * 3 Yahoo! SearchMonkey
>     * 4 Creative Commons Rights Expression Language
>     * 5 Bitmunk - An Open, Digital Media Commerce Standard
>     * 6 Fuzzbot Semantic Processor
>     * 7 Basic Structured Blogging
>     * 8 Publishing an Event
>     * 9 Content Management Metadata
>     * 10 Self-Contained HTML Fragments
>     * 11 Web Clipboard
>     * 12 Semantic Wiki
>     * 13 Augmented Browsing for Scientists
>     * 14 Advanced Data Structures
>     * 15 Publishing a RDF Vocabulary
>     * 16 Extending an XML language by flexible metadata
>     * 17 General Mechanism for Assigning ISO Codes to Object Data
>     * 18 Enhancing a User-Agent's copy/paste operations with meta data
> 
> 
> 
> 


attached mail follows:



On Fri, 13 Feb 2009, Kjetil Kjernsmo wrote:
> 
> I can't speak for the RDFa community, but the reason you can't see a lot 
> of problem descriptions separate from technical solution is probably 
> that the community feels that RDF is a well established technology, and 
> so the focus is on showing how it is used rather than abstract 
> speculation on how it could be used.

It's certainly well-established in certain circles, but it's unfortunately 
the case that technologies have to rejustify themselves each time they 
enter new areas. RDFa isn't currently that well established as a general 
authoring language, and most authors haven't interacted with RDF knowingly 
at all.

RDF is not unique in this regard, by the way. HTML itself has had to 
reprove itself many times; currently there are many developers who are 
trying to decide what language to use to develop their next application, 
and HTML is a new contender in that race. Ten years ago, few people 
outside of the cutting-edge browser space would have thought to make an 
application in HTML, but after making its case to developers, it is now a 
seriously considered option.


> As for RDF use cases, please see e.g. 
> http://www.w3.org/2001/sw/sweo/public/UseCases/

There is a significant difference between case studies (examples of actual 
usage) and problem descriptions (examples of actual problems that might 
lead or might have led to usage). To be blunt, the existence of something 
using a technology is not an indication that the technology was a good 
solution. It can, however, lead to very useful experience: do any of the 
case studies listed above have frank evaluations of whether Semantic Web 
technologies have been successful? Most interesting would be reports from 
failed experiment -- the Semantic Web, like any technology, is not going 
to be right for everything; to what has it been found to _not_ be well 
suited? (The existence of reports showing failure increases the 
credibility of reports showing success.)


On Fri, 13 Feb 2009, Michael Hausenblas wrote:
> 
> In [1] you asked, quite rightly, for 'problem statements' re RDFa. I've 
> pointed out two (IMHO important) ones at [1] which you *might* have 
> overlooked. I'd be happy to learn from you if you think these are 
> 'acceptable':
> 
> 1. Service and product provider can't include the meaning of the things 
> they publish in HTML. For example, how do you find out where the price 
> of a book is located in, say, a page from Amazon? Now, people that want 
> to use this data are forced to perform *screen scraping*, that is, there 
> is a need for publisher-push rather than consumer-pull semantics.
>
> 2. People doing data mash-ups need to learn a plethora of APIs/formats 
> while all they would likely want is *one data model* + and a bunch of 
> vocabularies covering the domain.
> 
> [1] http://realtech.burningbird.net/semantic-web/semantic-markup/stop-justifying-rdfa

I hadn't seen your comment (though I had noted some of the other comments 
from that blog entry), but I have now added it to my list, thanks.


It should be noted that there are pretty simple solutions to both of the 
above, though. For example, for case 1 Amazon could just say "anything 
with class=price indicates the price for the item described by the nearest 
ancestor block with class=item" or some such, or they could expose the 
information in a much simpler way by having a "&format=json" mode for 
their pages that is purely machine-readable data. Or they could do what 
they in fact do do, which is expose this using a dedicated API:

   http://docs.amazonwebservices.com/AWSEcommerceService/2006-05-17/ApiReference/ItemLookupOperation.html

What we find is that in fact RDFa would not solve their problem here, 
since they apparently feel they need (for whatever reason) to track 
per-developer usage of this information. Thus, they require a unique URI 
to be used for each developer obtaining the information, and would 
presumably therefore not _want_ to expose it on their main product page.


With the case 2, I don't see how forcing all data into one data model 
actually helps anybody. If you want to merge file system metadata, then 
you want a tree structure. If you want to merge family history data, you 
want a directed graph. If you want to perform a scripted operation on a 
set of binary files, then some a script object and a dictionary mapping 
filenames to binary blobs is probably most useful.

The difficulty with dealing with data from multiple sources is rarely the 
data format (a problem not solved by RDF anyway) or the data model, it's 
usually with the semantics of the vocabularies involved. For example, 
merging MP3/ID3 data (dedicated vocabulary with dedicated format embedded 
in MP3 files) with an iTunes library data dump (dedicated vocabulary with 
XML format) would not be easier if they were both expressed as RDF using 
different vocabularies. If anything, frankly, the problem would get 
harder. One is reminded of jwz's infamous quip about regular expressions.

This isn't to say that RDF doesn't have its uses, of course it does. The 
question is what are they, and do they justify adding syntax to HTML.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

attached mail follows:



Ian Hickson wrote:
> To reiterate: I have approached and been approached by a number of people 
> in the RDF and RDFa communities, and I have repeatedly asked for people to 
> list problems and use cases that are of relevance in the context of RDFa 
> and RDF, with those problem descriptions not mentioning RDFa or other 
> technical solutions. So far we have seen *very few* of these.
>
>   
Ian,

I am quite relieved by your response.

Here is an attempt at a use-case:

When writing HTML (by hand or indirectly via a program) I want to 
isolate at describe what the content is about in terms of people, 
places, and other real-world things. I want to isolate "Napoleon" from a 
paragraph or heading, and state that the aforementioned entity is:  is 
of type "Person" and he is associated with another entity "France".

The use-case above is like taking a highlighter and making notes while 
reading about "Napoleon". This is what we all do when studying, but when 
we were kids, we never actually shared that part of our endeavors since 
it was typically the route to competitive advantage i.e., being top 
student in the class.

What I state above is antithetical to the essence of the World Wide Web, 
as vital infrastructure harnessing collective intelligence.

RDFa is about the ability to share what never used to be shared. It 
provides a simple HTML friendly mechanism that enables Web Users or 
Developers to describe things using the Entity-Attribute-Value approach 
(or Subject, Predicate, Object) without the tedium associated with 
RDF/XML (one of the other methods of making statements for the 
underlying graph model that is RDF).



-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com





attached mail follows:




Hi Ian,

This is approximately the 5th time we get to this point in the debate,
so let me attempt a different response to see if we can un-wedge
ourselves :)

> | The system uses RDFa to speed up user interaction when editing 
> | structured data. Instead of communicating with a remote server, the 
> | WSWYIG editor uses a direct manipulation based on RDFa and updates the 
> | server with the generated RDF graph only when the interaction finishes.
> 
> It's not clear here what makes RDF any more interesting to the solution 
> than, say, JSON, or SQL, or XML.

If we were only trying to solve the "how do I find better music online?"
problem, we probably would come up with a much more domain-specific
approach than RDFa.

If we were only trying to solve the "how do I express copyright
licensing on videos" problem, ditto.

And so on with each individual use case taken *in isolation*.

But when you take all of these use cases together, it become clear that,
if we generally had a way to express structured data within existing
HTML pages, using a universal data parser, with the ability to mix and
match properties such as licensing, bit rate, title, etc... then these
individual problems would be much easier to solve based on this one
underlying technology.

So, can we look at the use cases as a whole?

> The SearchMonkey problem description, while not really phrased as a 
> problem, is much closer to the kind of thing I'm looking for in order to 
> evaluate the proposals here.

Yes, because SearchMonkey is a very particular use case that inherently
leverages the multi-purpose capabilities of RDFa.

> The ccREL use case description doesn't describe the problem. It just 
> describes what ccREL is. It would be helpful if the problem was actually 
> explained, e.g. "Authors need a way to make sure that their content reuses 
> other content only in the manner allowed by that other content".

>From the Background Section (Section #2) of ccREL:

"""
Simple programs should thus be able to answer questions like:

  * Under what license has a copyright holder released her
    work, and what are the associated permissions and
    restrictions?

  * Can I redistribute this work for commercial purposes?

  * Can I distribute a modified version of this work?

  * How should I assign credit to the original author?
"""


-Ben

attached mail follows:



-cc: TimBL, Michael Bolger

Ian Hickson wrote:
> This would be much better material to have on this page:
>    http://rdfa.info/wiki/Rdfa-use-cases
> ...if that page is intended to show what problems RDFa solves.

That is the eventual goal - still organizing the page, which has been
updated to reflect the conversation yesterday and today (6 new use cases):

http://rdfa.info/wiki/rdfa-use-cases

The use cases will be modified to remove RDF/RDFa in the problem
descriptions. We should probably have a section for each
problem/use-case that states "This is how it is solved in RDFa" and
"This is how it is solved using current web technologies". There should
probably be a pro/con section for each approach as well, just to point
out the high-level difficulties with each approach.

Another Q/A page has been started concerning questions surrounding RDFa,
including common red-herrings and repeated discussions (with answers)
that can be avoided when discussing what RDFa can and cannot do:

http://rdfa.info/wiki/common-arguments-against-rdfa

Help in filling out both pages would be greatly appreciated.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
twitter: http://twitter.com/manusporny

attached mail follows:



On Feb 14, 2009, at 01:57, Mark Birbeck wrote:

> You seem to be implying that there is a fundamental impediment to
> creating an RDFa parser using the tools available in an HTML DOM. You
> base this assertion on Henri's document, but all his script shows is
> that objects in an HTML DOM don't have namespace information
> available.
>
> That's no surprise.
>
> My response is that this is irrelevant.

  1) Content consumer software should work both with HTML (text/html)  
and XHTML (application/xhtml+xml) if it works with one of them.

  2) For sane *software* architecture, code above the HTML/XML parsing  
layer should be able to run its dispatch code without any conditional  
branches on the HTMLness or XMLness of the origin of the data it is  
operating on. This applies to native browser code, JavaScript code  
running in a browser and non-browser (X)HTML consumers. (Even easy- 
looking tiny variations add up.)

  3) The point above is not about abstract XML architecture. It is an  
actual way of implementing software including (but not limited to)  
Gecko, WebKit, Presto (as far as can be guessed without seeing the  
code) and Validator.nu. Furthermore, the dominant design (http://en.wikipedia.org/wiki/Dominant_Design 
) of HTML5 parsers for non-browser applications is that they expose an  
XML API so that the application-level code is written as if working  
with an XML parser parsing an equivalent XHTML5 file.

  4) The qname is an artifact of the Namespaces in XML layer in XML  
and should not be significant to the application. The correct way to  
do namespace-wise correct dispatch is to dispatch on the  
[namespace,local] pair. If you are inspecting the qname of an  
attribute or element for any reason other than round-tripping  
serialization, you are Doing it Wrong.

  5) Given the points above, you should also do dispatch on the  
[namespace,local] pair on the HTML side.

  6) All features going into HTML5 should be robust and sane under  
scripting even if the people proposing the feature where interested in  
read-only use case is outside browsers. This includes keeping script- 
generated DOMs serializable.

  7) If, in order to satisfy point #2 above, your feature requires  
using getAttribute (without NS) on getting but setAttributeNS (with  
NS) on setting (to keep the XML DOM serializable!), your feature isn't  
satisfying point #6.

  8) So far, experience shows that even violations all of the above  
points that look small--such as lang vs. xml:lang--are more hurtful  
than people imagine at first. Examples:
   a) Browsers need to inspect two attributes instead of one to  
discover the language.
   b) To abstract problem a) away in non-browser applications in high- 
performance (in terms of CPU instructions executed per application- 
made query for an attribute) manner, the static RAM footprint of the  
Validator.nu HTML Parser is bloated by pointer size times 2328!
   c) The lang & xml:lang part of the HTML5 spec has had the highest  
incidence of validator bugs per spec sentence. (Bugs are bad and  
costly.)
Hence, all violations all the above points should be taken very  
seriously even if in isolation on their face the violations seemed  
ridiculously small to be indignant about. Violations for xml:lang  
legacy are somewhat excusable. Introducing new violations isn't.

  9) If you are defining something in terms all of the namespace  
mapping context, but you can't use DOM Level 3 lookupPrefix() to  
implement it (without violationg point #2), you are Doing it Wrong.

10) Browsers aren't the only kind of Web content consumer software.  
What you are specifying should work with XML API environments other  
than the browser flavor of DOM.

11) SAX2--arguable the most correct and complete XML API there is-- 
when run in the Namespace-aware mode (i.e. the correct mode  
considering contemporary XML architecture) doesn't expose the  
namespace declarations as attributes. Therefore, a SAX2-based RDFa-in- 
XHTML consumer needs to use the non-attribute abstraction  
(startPrefixMapping()) for gathering the namespace mapping context.  
However, the same application-level code (see point #2) wouldn't work  
with an HTML5 parser that implements mapping from text/html to SAX2 as  
defined today in the HTML 5 draft and as sufficient for all the HTML5  
features drafted so far.

12) XOM--arguable the most correct of the well-known XML tree APIs for  
Java--doesn't expose the namespace declarations as attributes.  
Therefore, a XOM-based RDFa-in-XHTML consumer needs to use the non- 
attribute abstraction for using the namespace mapping context.  
However, the same application-level code (see point #2) wouldn't work  
with an HTML5 parser that implements mapping from text/html to XOM as  
defined today in the HTML 5 draft and as sufficient for all the HTML5  
features drafted so far. (XOM even disallows including attributes  
names xmlns:foo in the tree.)

13) If points 9 through 12 were addressed by changing HTML5 parsers to  
expose attributes called xmlns:foo as namespace mapping context, the  
change HTML5 to enable RDFa would be notably more complex than just  
adding a few attributes.

> An RDFa parser needs to be able to 'spot' whether an attribute name
> begins 'xmlns:', but for that we don't need namespace support -- it's
> just string matching, no different to detecting an attribute like
> @data-length [1].

getAttributesNS(null, "data-length") works consistently in text/html  
and application/xhtml+xml.

>> And I wrote that "HTML parsing rules differ in visible ways from  
>> XHTML.
>> Ways that affect the specific names of attributes chose[sic] in  
>> RDFa."
>
> But the attributes in RDFa are not prefixed -- @about, @resource,
> @datatype and @content are new attributes, whilst @rel, @rev, @href
> and @src already exist -- so I don't see in what way the names were
> 'chosen' in a way that was influenced by XHTML.

Thank you for not prefixing the attribute names. However, you did to  
make the attribute values sensitive to the namespace mapping context.

>> A list of the parsers alluded to above would be helpful as an  
>> existence
>> proof for the above assertion.
>
> I think you have this the wrong way round.
>
> The parsing algorithm for RDFa refers to attributes and elements,
> navigated by recursively traversing the hierarchy. It's therefore
> applicable to anything that has such a hierarchical structure, and
> that allows attribute values to be retrieved. Both HTML and XHTML DOMs
> fit this description.

But do they fit the description with the exact same above-parser code?  
(See my point #2 above.)

> So I'd like to see a proof that shows that this simple architecture
> makes it impossible to create an RDFa parser on top of an HTML DOM.
> Henri has not provided a proof of anything other than that an HTML DOM
> doesn't support namespaces, yet for some reason this 'non-proof' gets
> circulated as fact.

It is not circulated as proof that you can't implement an RDFa parser  
on top of an HTML DOM. It is circulated as proof that you can't  
implement an RDFa parser that a) works without conditional branches on  
HTMLness/XMLness and b) without violating Namespace-wise correct  
coding practices on c) *both* HTML and XML parser output.

>> Your recent statement that "I can assure you that the parsing rules  
>> were
>> very explicitly written in such a way that the only thing they  
>> require to do
>> their work is a hierarchy of nodes, and the ability to obtain the  
>> value of
>> an attribute.", while technically true, tends to obscure more than  
>> reveal
>> when it comes to these differences.
>
> Again...what differences? I'm still confused as to what it is that
> we're being different to.
>
> Just in case what you are getting at is that there is somehow a
> difference between parsing RDFa in XHTML and parsing RDFa in HTML, I
> can only say again that there isn't -- there is only one parsing
> algorithm in RDFa.

See my points 9 through 12 above.

Do the existing RDFa parsers run different code (i.e. taking different  
branches) above the HTML and XML parsers?

Obviously, you can make an RDFa parser for text/html if the API the  
parser exposes violates the Infoset or differs from browser behavior  
and you run different code for expanding CURIEs in the text/html and  
application/xhtml+xml cases or you run Namespace-wise bogus code for  
the XML case.

>> Actually, I say differences.  I only have an existence proof for one
>> difference at the moment.  Is there more?  Beats me.  Hence my  
>> assertion
>> that a definitive list would be helpful.
>
> As I said, the "existence proof" of which you speak (Henri's one),
> proves only that namespace properties do not exist in an HTML DOM,
> whilst they do in an XHTML DOM.
>
> That's very different from being an "existence proof" that there are
> two (or more) algorithms for parsing RDFa in a DOM, since RDFa does
> not require namespaces per se.

Again, points 9 through 12 above.

> The only reason I entered this debate was to clarify the single point
> that you made, propagating Henri's false claim -- that since the HTML
> DOM does not provide namespace information, it is therefore not
> possible (or 'more difficult') to create an RDFa parser.

If you violate point #2, you make things more difficult. By how much?  
See point #8.

This problem can be addressed by using absolute URIs instead of CURIEs  
and phasing out CURIEs by declaring xmlns:http="http:" on the XML side  
during the transition. (If that makes the predicates annoyingly long,  
what you have is a fundamental problem with the idea of using URIs as  
identifiers as opposed to using them for application-level addressing  
on the Internet. In that case, you should address that problem  
directly on the level of the RDF model instead of trying to push the  
annoyance around syntactically.)

If you wish to get new features added to HTML5 and the proposed syntax  
depends on element or attribute names that contain the colon  
(xmlns:foo in this case), you are just asking for trouble because the  
colon is special in XML but not in text/html (and if you ask making it  
special in text/html, too, you are asking more than just adding a few  
attributes).

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/



attached mail follows:



-cc: A whole load of folks.

Ian, It's going to take me multiple days to respond to this e-mail. So
this is part 1 of N.

Ian Hickson wrote:
> ...the process needs to be:
> 
>  1. Find problems.
>  2. Propose solutions that solve one or more of those problems.
>  3. Evaluate the solutions against each problem.
>  4. If a solution is found that addresses many of the problems, adopt it.

I don't think that we're in dis-agreement on the process. The
disagreement, I believe, is whether or not RDFa meets the criteria for
step #4. Here's what I think we have so far.

RDFaTF/SWDWG/XHTML2:

1. Agrees on process listed above (in general).
2. Understands the details of the use cases that RDFa is meant to
   address.
3. Believes that RDFa is currently the best solution for addressing all
   of the use cases.

WHATWG/HTML5:

1. Agrees on the process listed above (in general).
2. Does not have enough data on the use cases that RDFa is meant to
   address.
3. Cannot know if RDFa is currently the best solution for addressing all
   of the use cases until they are detailed.

Are those fair statements?

Documenting use cases will help us move this forward.

> That is, the use cases have to be used both at the start of the process 
> _and_ at the end of the process. Otherwise, we risk ending up with 
> something that doesn't actually solve any of the use cases we were 
> attempting to solve.

Personally, I have no objection to this as it is a good exercise. Others
might think differently - certainly not speaking on behalf of anybody
but myself.

>> Using the same principle, we also future-proof our work. At CC, we're 
>> not sure what other fantastic media will appear next. 3D video? Full 
>> virtual reality? Who knows. But when those come out, with their custom 
>> attributes to describe properties we don't even know about yet, we'll 
>> still be able to use the same RDF and RDFa to express their licensing 
>> terms, and the same parser to pick things up.
> 
> Personally I prefer to address today's problems today and tomorrow's 
> problems tomorrow, so that as we meet new problems, they are addressed 
> with surgical precision, rather than trying to come up with systems that 
> can solve everything forever. But again, to each his own.
> 
> This line of argumentation (that we should design systems that solve all 
> future needs, whether forseeable or not) is also not convincing to me.

Hrm, I believe that both you and Ben are in strong agreement on this key
point.

Ben was alluding to the design principle which states that the semantic
web should be extensible and that nobody should be in control of the use
of a vocabulary. It's an extensibility argument - the same kinda thing
that data-* is meant to solve in HTML5.

> On Fri, 13 Feb 2009, Ben Adida wrote:
>> [...] we're not asking browsers to implement any specific features other 
>> than make those attributes officially available in the DOM.
> 
> You presumably do want some user agents some where at some time to do 
> something with these triples, otherwise what's the point? Whether this is 
> through extensions, or through browsers in ten years when the state of the 
> art is at the point where something useful can be done with any RDFa, or 
> through search engines processing RDFa data, there has to be _some_ user 
> agent somewhere that uses this data, otherwise what's the point?

Ben was addressing the argument that this is a very large burden on the
HTML5 spec implementers. We -could- be asking for all HTML5-compatible
user agents to provide triples to the page/plugins through a browser
API. Ben was stating that while that would be great, we're not asking
for that. We're asking for the minimal amount of work, which is:

"Don't do anything with the DOM/stream to remove RDFa attributes. Make
sure other applications that can access the DOM/stream can see the RDFa
attributes. Not spewing validation errors in XHTML5 would be good".

> I agree that if it is the case that there are problems that are best 
> solved through RDFa, that it would make sense to use RDFa as is and that 
> not using it would be silly.

The RDFaTF/SWDWG/XHTML2 is asserting that there are a set of problems,
described by use cases, that are best solved by using RDFa. See below
for the conclusion of this statement.

> Of course, it may be that there are no such problems, or that such 
> problems aren't compelling enough to need to solve them in HTML5, or that 
> all these problems that are solved through RDFa are in fact a subset of 
> the problems that can all be solved using a common feature. In these 
> cases, reusing RDFa wouldn't make sense -- we'd want to (respectively) not 
> use anything, not use anything yet, or use something else from which one 
> could obtain triples as well as other things.

You are asserting this because to date, you have not seen a set of
detailed use cases in a format that cuts to the heart of each problem
and explains how RDFa solves the problem. Is that a fair statement?

> On Sat, 14 Feb 2009, Kjetil Kjernsmo wrote:
>> On Saturday 14 February 2009, you wrote:
>>> Please don't take these questions as personal attacks. I honestly am 
>>> trying to find out how RDF and RDFa are to work in HTML5, to see if 
>>> they make sense to add.
>> Sure! Skepticism is sound, but you have be aware that the questions you 
>> raise has all been discussed at length elsewhere, and sometimes all this 
>> advocacy seems to be a waste of time, time that would be better spent 
>> actually writing code (and stick to XHTML for the web page needs) to 
>> prove the case by actual running code. Thus, I will be very brief.
> 
> The problem is that every time I ask these questions, I get that reply -- 
> we've answered these questions long ago, so the answers will be brief. 
> Unfortunately this doesn't really end up answering my questions.

I believe that you get these short answers because almost everybody
expects you to be well versed in RDF and the semantic web. At least, the
ones that know RDF/RDFa/semantic web stuff deeply have the mistaken
notion that other web experts know this stuff deeply enough to have a
debate on the issue... which is a really bad assumption to make (which
most of us are guilty of on this mailing list).

Kjetil, I don't mean to single you out - this comment below is one that
I've heard time and time again from various communities online
(Microformats, RDF and RDFa at times). So the response is to the
community in general, not you.

I'm speaking for myself - don't know what others think about this...

> On Sat, 14 Feb 2009, Kjetil Kjernsmo wrote:
>> Sure! Skepticism is sound, but you have be aware that the
>> questions you raise has all been discussed at length elsewhere

Where? Can anyone on here just point me to a page that has all of this
information in an easy-to-digest form? I don't have time to troll
through mailing lists discussing esoteric knowledge representation issues.

>> sometimes all this advocacy seems to be a waste of time.

What advocacy? When have we been even mildly successful at advocating
RDF and RDFa? Where have we provided tools for people to advocate RDFa?
Where are the products of our advocacy? The websites that explain what
problems RDF and RDFa will solve in simple terms that a web developer
can understand?

Why can't we just point Ian to a website and say, "Here - this explains
a load of the questions that you have in very easy to grasp concepts."

How many of you have seen this yet?

http://www.crisisofcredit.com/

It summarizes the US credit crisis, a very complex issue, in under 10
minutes, without using scary jargon to explain what is happening. We
have /nothing/ like this to explain RDF and RDFa, nor any of the issues
that we are covering in this discussion. RDF and RDFa isn't that
complicated, but we have managed to make it very difficult for the
uninitiated to pick up and use.

It's very telling that we keep having to create new wiki pages to
address Ian's questions - we're doing a terrible job at advocacy.

Yes, these issues have all been discussed at length elsewhere... and I
have no idea what went on in those discussions nor do I have the time to
find the conversation... I wouldn't even know where to look, and believe
me, I have tried!

If you find yourself using the "this discussion has already happened
elsewhere" argument, please either provide a link to the discussion or
better yet, create a page on the RDFa wiki that explains the reasoning
behind the discussion in detail with a reference to the original discussion.

We need to start creating some long-term assets to help people advocate
RDFa... pointing people to mailing list discussions won't get us much
further than we are now.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Scaling Past 100,000 Concurrent Web Service Requests
http://blog.digitalbazaar.com/2008/09/30/scaling-webservices-part-1

attached mail follows:



Ian Hickson wrote:
> (Also helpful for my own edification would be a reply to the e-mail I sent 
> earlier this week:
>    http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Feb/0121.html
> ...which unfortunately appears to have killed its thread.)

I'll answer some parts of it, but first a point about what I believe is
off topic.

Some very advanced use cases of RDF, e.g. the triplestore across web
pages with reasoning, have been used to argue that RDF is bad. This is a
bit like arguing against <a href=> because, imagine if someone built a
giant search engine that had this awesome pagerank algorithm and then
started displaying ads alongside search results and then allowed folks
to embed those ads on their pages and paid those folks a cut of the ads
and then... wait a minute have you thought of clickjacking? No? Then
clearly HTML anchors are evil.

Let's stick to the simple use cases for now. Any enabling technology, if
it's useful to users in finding information, will also be useful to
spammers. To paraphrase Bruce Schneier, stop being surprised when the
bad guys hijack the good guys' infrastructure. Someone like Google will
figure out a way to use the structure of the data network to make sense
of it all. Maybe semantic pagerank, who knows.

Now, I'll focus on the CC use case and your specific points regarding CC.

> ...the process needs to be:
> 
>  1. Find problems.
>  2. Propose solutions that solve one or more of those problems.
>  3. Evaluate the solutions against each problem.
>  4. If a solution is found that addresses many of the problems, adopt it.

I believe that is what we've done, with the added item (1.5) find
existing technologies that solve a large chunk of the problem, i.e. RDF.

> I don't know that I've ever heard a _good_ example!

I thought you said SearchMonkey was a good example? It's certainly a
great example for Creative Commons.

Do *you* have to believe that an example is good before HTML5 considers
it? Is it not enough for big publishers like Yahoo to tell you that the
example is good?

> IMHO, the syntax and data model is the easy part. If you had trouble 
> getting adoption of your vocabulary with a trivial dedicated syntax, I 
> don't think you're likely to have any more luck now that your vocabulary 
> comes with a general-purpose data model and half a dozen different 
> syntaxes. But your mileage may vary, I guess.

Syntax and data model are easy only if you ignore extensibility. When we
came up with CC, it was only for images and music. But we knew we would
eventually delve into science, and we would need to CC-license
scientific publications, datasets, describe datasets transfer
agreements, etc...

We knew we needed extensibility from the start, so that our tools built
today can still work in some fashion tomorrow, and so that markup
written today can still function in tomorrow's tools to some degree.

We've also expanded in ways we didn't initially predict, with
cc:attributionName and cc:attributionURL: when you click from a web page
to its Creative Commons license link, we parse the referring URL for
RDFa and display, in the deed, to whom you should give credit.

Had we hacked up a one-time data-model and syntax, we wouldn't have been
able to add this feature without also asking uninterested users to
change their markup, or forcing tools to parse two different syntaxes.

> This line of argumentation (that small problems should share solutions so 
> as to leverage each others' work) is not convincing to me.

I'm not arguing for this as a general philosophy for every design
process, but it's a little bit disconcerting that you think small
solutions should never leverage each other. Isn't it an important aspect
of standards work? Isn't that what inventing a new VIDEO tag is for, to
ensure that folks who want to embed video (each for different purposes)
can each do so more easily?

It seems you draw a hard line at generalizing applications that use
different vocabularies. It's okay to generalize embedding video, but
it's not okay to generalize embedded metadata. That line is, in my
opinion, artificial. Maybe it's just because you haven't worked as much
with interoperable structured data? Certainly, I haven't worked much
with embedded video, so I don't know the reasoning behind a new VIDEO tag.

> Personally I prefer to address today's problems today and tomorrow's 
> problems tomorrow, so that as we meet new problems, they are addressed 
> with surgical precision, rather than trying to come up with systems that 
> can solve everything forever. But again, to each his own.
> 
> This line of argumentation (that we should design systems that solve all 
> future needs, whether forseeable or not) is also not convincing to me.

You're arguing based on a false dichotomy of extremes: why so black and
white? We don't expect to solve every problem from the start. With RDF
and RDFa, we have a solution that makes it *easier* to combine our work
with that of others, and *easier* to evolve our own solution over time.

So things are easier, though of course not automatic.

And since we're reusing a lot of existing technology (RDF), and we're
collaborating with Manu, Mark, and others to define RDFa, it costs us a
lot less than to make up our own approach. The test cases alone, built
by Manu and Michael and which CC contributed nothing to, are worth our
use of RDFa.

> You presumably do want some user agents some where at some time to do 
> something with these triples, otherwise what's the point? Whether this is 
> through extensions, or through browsers in ten years when the state of the 
> art is at the point where something useful can be done with any RDFa, or 
> through search engines processing RDFa data, there has to be _some_ user 
> agent somewhere that uses this data, otherwise what's the point?

I would like to hear your comments on the parallel I've drawn to the
@rel attribute. Browsers don't need to do anything with it except make
it available in the DOM. Google can use it to tweak its search
algorithm. But surely, you're not trying to explore every possible
Google implementation detail for rel="canonical" to spec HTML5? It's
pretty obvious that specifying @rel enables the kind of application that
Google is developing.

The same applies to RDFa. Making it available enables SearchMonkey,
ccREL, and other applications, because the data is now structured and
thus more useful for ... well whatever someone wants to do with
structured data just like like Google chooses what to do with @rel long
after the spec's ink has dried.

Do you agree with this comparison?

-Ben


attached mail follows:



Ian Hickson wrote:
>>> My understanding was that people wanted RDF data to be persisted 
>>> across multiple sessions, which would lead to bad data "poisoning the 
>>> well" in a way that no other feature in Web browsers has yet had to 
>>> deal with.
>> Some people do, some don't. I think we should assume that the RDF triple 
>> store may be more akin to the browser cache (can be cleared on a whim) 
>> than to a traditional database (clearing the data is bad).
> 
> If we allow any persistence without some solution to the trust/spam 
> problem, the store will quickly become useless (in the same way that the 
> various features to open a new window quickly became useless once sites 
> found ways to use them for doing popup ads).

I agree with you that we will need to find solutions to the trust/spam
problem, not only for RDFa, but for the general Web as well. There are
some ideas bouncing around, but we need to put them on the wiki. I have
started a page for this purpose:

http://rdfa.info/wiki/security-and-trust

The two examples that are up there have obvious flaws, but are meant to
be a starting point for the security/persistence conversation. Everyone
should feel free to rip those examples apart and improve upon them.

> This is one example of what I meant by having to evaluate each use case, 
> by the way. If we decide that "RDFa" means "a per-tab triple store with a 
> lifetime equal to the page and that is not affected by cross-origin 
> iframes", then that wouldn't address the "collect lots of data and then 
> query it" use case, despite still being "RDFa". 

I think it's going to be very difficult to find agreement on what RDFa
is and isn't at the application layer.

> It is IMHO important for 
> the RDFa community to agree on exactly which uses cases are the ones that 
> are intended to be addressed, so that we can make sure that what we come 
> up with actually does address exactly those cases. 

The list of use cases being non-exhaustive and constantly evolving, of
course.

> Is there documentation 
> anywhere on what the existing RDFa specification is attempting to solve 
> along these lines?

The current version of RDFa is based on the scenarios defined in this
document:

http://www.w3.org/TR/xhtml-rdfa-scenarios

> e.g. what is the storage semantic for the current RDFa 
> specification? 

Right now, the how and when to do persistence and trust is left to the
language and application that utilizes RDFa.

http://rdfa.info/wiki/developer-faq#Does_RDFa_define_a_storage_model_or_persistence_layer.2FAPI.3F

> Does it have persistence? 

The current RDFa spec doesn't mention anything about a persistence layer:

http://rdfa.info/wiki/developer-faq#Does_RDFa_define_a_storage_model_or_persistence_layer.2FAPI.3F

> How does it deal with cross-origin data load?

The current RDFa spec does not address cross-origin data load:

http://rdfa.info/wiki/Developer-faq#How_does_RDFa_deal_with_cross-origin_data_load.3F

> If we're just partitioning data stores on a per-origin basis, then there's 
> no need for signatures, even, we can just use the existing origin data. 
> The question is whether that is enough.

No, we would still want signatures even if we were partitioning data
stores. For example, if you wanted to verify a digital contract or
digital statement of work of any kind, per-origin verification isn't
good enough.

> (This still doesn't address the problem of sites like wikipedia or blogs 
> that accept input from multiple users, though.)

This is a stab at addressing that issue:

http://rdfa.info/wiki/security-and-trust#Signature_attributes

> There needs to be some mechanism for determining what's in the white 
> lists. 

Could you elaborate, please? Do you mean the format of the white lists?
Or the type of data the white lists store? Something else?

> (Black lists wouldn't work since an attacker could just come up 
> with an infinite number of alternative site names.)

Noted, correction has been made to (removed mention of blacklists):

http://rdfa.info/wiki/developer-faq#How_does_one_prevent_bad_triples_from_corrupting_a_local_triple_store.3F

> I don't really understand how the digital signature mechanism would work.

Does this help at all?

http://rdfa.info/wiki/security-and-trust#Signature_attributes

> In SSL, the user selects a single site, and the browser can verify that 
> that site is who the user thinks it is. 

In general, yes - but there are known attacks against this security
model (MITM, plug-in-based trusted certificate poisoning,
DNS/certificate hijacking), so it's not perfect.

> It doesn't prevent hostile sites 
> that the user intended to go to from interacting with the user. How would 
> digital signatures help here? Attackers can sign stuff just like anyone 
> else can, no?

http://rdfa.info/wiki/Developer-faq#Hackers_can_digitally_sign_triples_too.2C_what.27s_to_stop_hostile_sites_from_interacting_with_the_person_browsing.3F

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Scaling Past 100,000 Concurrent Web Service Requests
http://blog.digitalbazaar.com/2008/09/30/scaling-webservices-part-1

attached mail follows:



On 19/2/09 03:43, Manu Sporny wrote:

> http://rdfa.info/wiki/developer-faq#How_does_one_prevent_bad_triples_from_corrupting_a_local_triple_store.3F

I've just tweaked this to add:

"Also note that the phrase "triple store" is somewhat dated. Practically 
all RDF storage systems (since RDFCore in 2004 and a few years before) 
have effectively been "quad stores". While the core RDF specs are 
described in terms of triples, database systems for managing triples 
have almost always kept track of the source (or "provenance") of each 
piece of data. For this reason when RDF's data access / query language, 
SPARQL, was created, it included within the language a mechanism for 
querying this extra information. This ability to explicitly represent 
(and query) the source of each RDF data graph gives some extra machinery 
for dealing with trust. We might, for example, pose a SPARQL query that 
was targetted only at graphs tagged as trusted. RDF stores are no longer 
a simplistic melting pot in which data from multiple sources gets 
indecipherably tangled."

For a practical account from someone who added provenance, ie. quads, to 
an early RDF triple storage system that didn't (at the time) keep track 
of it, see http://www.ibm.com/developerworks/xml/library/x-rdfprov.html 
by Edd Dumbill.

Excerpting here:

"""21 Jul 2003
When you start aggregating data from around the Web, keeping track of 
where it came from is vital. In this article, Edd Dumbill looks into the 
contexts feature of the Redland Resource Description Format (RDF) 
application framework and creates an RDF Site Summary (RSS) 1.0 
aggregator as a demonstration.

In Listing 6 of my second article on FOAF (see Resources), I 
demonstrated FOAFbot, a community support agent I wrote that aggregates 
people's FOAF files and answers questions about them. FOAFbot has the 
ability to record who said what about whom. When asked what my name was, 
FOAFbot responded:
edd@xml.com's name is 'Edd Dumbill',
according to Dave Beckett, Edd Dumbill,
Jo Walsh, Kip Hampton, Matt Biddulph,
Dan Brickley, and anonymous source Anon47

The idea behind FOAFbot is that if you can verify that a fact is 
recorded by several different people (whom you trust), you are more 
likely to believe it to be true.
Here's another use for tracking provenance of such metadata. One of the 
major abuses of search engines early on in their history was meta tag 
spamming. Web sites would put false metadata into their pages to boost 
their search engine ranking. For this reason, search engines stopped 
paying attention to meta tags because they were most likely lies. 
Instead, search engines such as Google found other more sophisticated 
metrics to rank page relevance.
Looking toward the future of the Web, it will become vital to avoid 
abuses such as meta tag spamming. Tim Berners-Lee's vision for a 
Semantic Web (see Resources) aims for a Web where most data is 
machine-readable, in order to automate much of the information 
processing currently done by humans.
The potential difficulties of metadata abuse are even larger on the 
Semantic Web: A Web site would no longer be restricted to making claims 
only about itself. It could also make claims about other sites. It would 
be possible, for instance, for one bookstore to make false claims about 
the prices offered by a competitor.
I won't go into detail on the various security and trust mechanisms that 
will prevent this sort of semantic vandalism, but I will focus on the 
foundation that will make them possible: tracking provenance.[...]"""

Edd's article also mentions "(Incidentally, I owe a debt of gratitude to 
Dave Beckett, the creator of Redland. When I was writing FOAFbot last 
year, Redland did not have support for contexts, so I ended up 
implementing them in a very roundabout fashion. In response to my 
requests, Dave added support for contexts into his toolkit.)" ... worth 
repeating here, as it shows the way RDF toolkits have matured over the 
years in response to just the kind of practical concern Ian (and Edd) 
raises, around spam, trust and aggregation.

Hope this helps with the use cases...

cheers,

Dan

attached mail follows:



Manu Sporny wrote:
> I agree with you that we will need to find solutions to the trust/spam
> problem, not only for RDFa, but for the general Web as well.

Disclaimer, since this was not very clear: This was a personal statement
of opinion and was in no way made on behalf of the RDFa in XHTML Task
Force.

Covering security and persistence issues for RDFa is not in the RDF in
XHTML Task Force charter. I was attempting to document a placeholder
page and get /something/ regarding security and persistence as it
relates to web semantics on the rdfa.info/wiki site.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Scaling Past 100,000 Concurrent Web Service Requests
http://blog.digitalbazaar.com/2008/09/30/scaling-webservices-part-1

attached mail follows:




Hey folks,

Yahoo has launched even more RDFa coolness: embed RDFa on your site to
describe your flash games and videos, and they show up embedded in Yahoo
search results *for everyone*, *by default*.

http://ysearchblog.com/2009/03/12/embed-videos-games-and-docs-with-searchmonkey-2/

Yahoo provides detailed explanations of how to mark up your content with
RDFa:

http://developer.search.yahoo.com/help/objects/games
http://developer.search.yahoo.com/help/objects/video

Note in particular that, if you want your flash games featured directly
on the Yahoo search results page, your one and only path is RDFa. (For
videos and such, you can use microformats or the de-facto facebook
connect microformat.)

I *really* like how they're guiding folks to mark up RDFa. We should
take this as an example for our FOAF, media, etc... RDFa guidelines.

-Ben

PS: the only thing that's a bit unfortunate is that they didn't reuse
Digital Bazaar's media vocabulary. I hope we can find a way to create
equivalences at some point... that's the goal of RDF, after all.


attached mail follows:



Ben Adida wrote:
> Yahoo has launched even more RDFa coolness: embed RDFa on your site to
> describe your flash games and videos, and they show up embedded in Yahoo
> search results *for everyone*, *by default*.

Overall, this is great news. Very nice to see Yahoo! adopting RDFa this
deeply into their search service... do have some gripes about
SearchMonkey Vocabularies, however...

> PS: the only thing that's a bit unfortunate is that they didn't reuse
> Digital Bazaar's media vocabulary. I hope we can find a way to create
> equivalences at some point... that's the goal of RDF, after all.

I've had a bit of time to look at Yahoo's published vocabularies and I'm
quite concerned by them and Yahoo!s general direction with vocabulary
design.

Here's a list of issues that I was able to find... there are
many more issues that I found than are outlined here. It would be good
to talk with whoever designed their vocabularies. You can find an
overview of Yahoo!s vocabularies here:

http://developer.yahoo.com/searchmonkey/smguide/profile_vocab.html

Issues specific to Yahoo's Media vocabulary:

Vocabulary is not machine-readable, not validate-able
-----------------------------------------------------

Yahoo's searchmonkey media vocabulary defined here:

http://search.yahoo.com/searchmonkey/media/

is not machine-readable. There are no RDF ranges, subClassOf, comments,
or types specified. New RDF vocabularies, especially ones from large
companies like Yahoo, should be machine readable otherwise it's going to
be nearly impossible to validate against them.

Monolithic Vocabulary Design
----------------------------

Rather than break Media out into multiple different vocabularies, Yahoo
has shoved audio, video, text, photos, thumbnails, re-invented sets, and
shoved them into one monolithic vocabulary which will surely get more
and more bloated as the years increase.

Rather than create a nice vocabulary stack (like what we've been doing
for the past several years):

+--------------+
|Music Ontology|
+--------------+-------+
|     Audio    | Video |
+--------------+-------+
|        Media         |
+----------------------+

They've instead created a mega vocabulary that doesn't seem to be backed
up by any usage data... or rather, it certainly isn't backed up by the
data we collected on the subjects of audio and video. Perhaps I'm
missing some sort of grand architecture, but when you have media:Article
and media:Text (neither of which subclass each other), then it shows
that not a great deal of design work went into your vocabularies.

Confounding Media with Media Format
-----------------------------------

Yahoo defines the following properties in media:

* media:bitrate
* media:channels
* media:duration
* media:fileSize
* media:framerate
* media:height
* media:samplingrate
* media:type
* media:width

Most of these are quite specific to web-based media formats and have
nothing to do with media in the physical world (not the Web). Many of
these can't be used to describe media:Text or media:Article. These
attributes really have nothing to do with media and should be separated
out into a different media format vocabulary.

* media:views

This one has more to do with social news sites than media.

Specification of medium using both class and property
-----------------------------------------------------

Yahoo defines both this:
	
media:Image, media:Audio, media:Video

and this:

media:medium - The type of object: image | audio | video | document |
executable.

What's the point of having both a 'medium' property and classes that
define the medium? media:medium shouldn't exist at all - use one or the
other, not both. Using both is confusing and will inevitably lead to
more pain for Yahoo down the line when you have to look at not only
@typeof information, but also medium information.

Naming conflict, right off of the bat
-------------------------------------

Yahoo has defined the following prefixes: commerce, media

These conflict directly with ones that we've already created, which
isn't that big of a deal - in fact, it shows that RDFa is resilient even
in these scenarios. However, it also means that almost all of the
solutions that have been proposed for addressing the "cut-paste
fragility" issue that the WHATWG has raised are now much more difficult
to implement correctly. Which commerce and which media vocabularies do
we resolve to?

I'm afraid that since Yahoo is the 300lb gorilla in the room, there will
be no place for good vocabulary designs.

These vocabularies will hurt RDFa adoption in the long run
----------------------------------------------------------

My real fear is that while Yahoo adopting RDFa will help in the short
term, these badly designed vocabularies will hurt RDFa adoption in the
long run.

The worst-case scenario is seeing wide adoption of Yahoo's media
vocabulary as it currently stands, which will eventually come under much
harsher and less constructive criticism than I've outlined above.

As I stated earlier, there are many more issues with what Yahoo has done
with their SearchMonkey vocabularies that should be fixed for the
benefit of this community. We are more than glad to help them work
through the issues, as long as Yahoo is willing to have an open dialog
with the RDF vocabulary creation community.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Absorbing Costs Considered Harmful
http://blog.digitalbazaar.com/2009/02/27/absorbing-costs-harmful


attached mail follows:



Re: <http://krijnhoetmer.nl/irc-logs/whatwg/20090407#l-675>:

> # # [18:31] * jgraham thought that GRDDL was basically a way of using an XSLT stylesheet to transform some HTML into some RDF
> # # [18:31] <gsnedders> Yeah, basically
> # # [18:32] <jgraham> And that the author had to put the link to the sty;esheet in @profile
> # # [18:32] <Hixie> no
> # # [18:32] <Hixie> they have to put a link to a page that has a link to the xslt 

James is right: the HTML page can carry the transformation link, no 
indirection through an additional profile is needed (only the presence 
of profile="http://www.w3.org/2003/g/data-view" as opt-in). See, for 
instance, <http://www.w3.org/2004/01/rdxh/spec#grddl-xhtml>).

(I'm having a deja vu, Ian :-)

BR, Julian

attached mail follows:



Elias Torres wrote:
> I'm sure you have seen this but I haven't seen it here...
> 
> http://laughingmeme.org/2009/04/03/url-shortening-hinting/
> 
> Basically Kellan is using rel="alternate shorter" instead of
> rev="canonical" which is more inline with RDFa. I don't keep up with
> HTML5 but it seems like @rev is deprecated. Good discussion on the post.

I hope @rev comes back into HTML5. Deprecating it is based on Ian
noticing that it wasn't used correctly to date. I'm sure that's true,
but it misses the point: if no tools make use of @rev, then of course
the markup will be wrong often. But with RDFa parsers like SearchMonkey
and others, there is now an incentive to write things correctly.

I think if the url-shortening folks choose rev=canonical, publishers
will figure out just fine, because the first thing they would do is
check that it works (with a bookmarklet, a web service that checks for
you, etc...) and correct their markup if it doesn't.

The idea that markup gets dirty because it's not human-visible is
true... but I suspect it's not the root cause. The root cause is that
there's no feedback mechanism to reinforce good markup. You can provide
a feedback mechanism using SearchMonkey or otherwise.

-Ben
Received on Friday, 8 May 2009 21:26:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:18:24 GMT