Attached is a MIME digest of e-mails that were used as the source of microdata use cases, scenarios, and requirements. This collection excludes a dozen or so private e-mails. In addition, feedback was collected from the following Web pages: http://realtech.burningbird.net/semantic-web/semantic-markup/stop-justifying-rdfa http://realtech.burningbird.net/semantic-web/semantic-markup/stop-justifying-rdfa?page=1 http://rdfa.info/wiki/Rdfa-use-cases and related pages http://developer.yahoo.com/searchmonkey/ and related pages ...as well as IRC discussions and a number of private conversations. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
attached mail follows:
Summary: I believe that there are use cases for RDFa - and that they are precisely the sort of thing that Yahoo, Google, Ask, and their ilk are not going to be interested in, since they are based on solving problems that those search engines do not efficiently solve, such as (among others) using private data or dealing with trustworthy data to answer very specific questions automatically. If Ian needs to understand the Semantic Web Industry and why people have invested in the RDFa proposal, then it is important to identify the right questions, and having him alone identify the sub-questions when he doesn't understand the issue isn't going to help him make a well-informed decision. Some of Ian's questions are discussed here. I cut the mail "short" since I think it is already too long for many people, which means that the debate will simply pass without their reading or input. On Wed, 31 Dec 2008 20:46:01 +1100, Ian Hickson <ian@hixie.ch> wrote: > One of the outstanding issues for HTML5 is the question of whether HTML5 > should solve the problem that RDFa solves, e.g. by embedding RDFa ... > Before I can determine whether we should solve this problem, and before I > can evaluate proposals for solving this problem, I need to learn what the > problem is. > > Earlier this year, there was a thread on RDFa on the WHATWG list. Very > little of the thread focused on describing the problem. This e-mail is an > attempt to work out what the problem is based on that feedback, on > discussions at the recent TPAC, and on other research I have done. > > > On Mon, 25 Aug 2008, Manu Sporny wrote: >> Ian Hickson wrote: >> > I have no idea what problem RDFa is trying to solve. I have no idea >> > what the requirements are. >> >> Web browsers currently do not understand the meaning behind human >> statements or concepts on a web page. If web browsers could understand >> that a particular page was describing a piece of music, a movie, an >> event, a person or a product, the browser could then help the user find >> more information about the particular item in question. It would help >> automate the browsing experience. Not only would the browsing experience >> be improved, but search engine indexing quality would be better due to a >> spider's ability to understand the data on the page with more accuracy. > > Let's see if I can rephrase that in terms of requirements. > > * Web browsers should be able to help users find information related to > the items that page they are looking at discusses. > > * Search engines should be able to determine the contents of pages with > more accuracy than today. > > Is that right? > > Are those the only requirements/problems that RDFa is attempting to > address? If not, what other requirements are there? I don't think so. I think there are some other requirements: A standard way to include arbitrary data in a web page and extract it for machine processing, without having to pre-coordinate their data models. Since many people use RDF as an interchange, storage and processing format for this kind of data (because it provides for automated mapping of data from one schema to many others, without requiring anyone to touch the original schemata or agree in advance how they should be created), I believe there is a requirement for a method that allows third parties to include RDF data in, and extract it from information encoded within an HTML page. >> The Microformats community has done a remarkable job of working on the >> web semantics problem, creating several different methods of expressing >> common human concepts (contact information (hCard), events (hCalendar), >> and audio recordings (hAudio)). > > Right; with Microformats, each Microformat has its own problem space and > thus each one can be evaluated separately. It is much harder to evaluate > something when the problem space is as generic as it appears RDFa's is. The point is that there are a very large set of very small problem spaces relevant to a small group at a time. Like RDF itself, RDFa is meeting the problem of allowing these people to share machine-processable data without previously coordinating their approach. >> The results of the first set of Microformats efforts were some pretty >> cool applications, like the following one demonstrating how a web >> browser could forward event information from your PC web browser to your >> phone via Bluetooth: >> >> http://www.youtube.com/watch?v=azoNnLoJi-4 > > It's a technically very interesting application. What has the adoption > rate been like? How does it compare to other solutions to the problem, > like CalDav, iCal, or Microsoft Exchange? Do people publish calendar > events much? There are a lot of Web-based calendar systems, like MobileMe > or WebCalendar. Do people expose data on their Web page that can be used > to import calendar data to these systems? In some cases this data is indeed exposed to Webpages. However, anecdotal evidence (which unfortunately is all that is available when trying to study the enormous collections of data in private intranets) suggests that this is significantly more valuable when it can be done within a restricted access website. ... >> In short, RDFa addresses the problem of a lack of a standardized >> semantics expression mechanism in HTML family languages. > > A standardized semantics expression mechanism is a solution. The lack of > a solution isn't a problem description. What's the problem that a > standardized semantics expression mechanism solves? There are many many small problems involving encoding arbitrary data in pages - apparently at least enough to convince you that the data-* attributes are worth incorporating. There are many cases where being able to extract that data with a simple toolkit from someone else's content, or using someone else's toolkit without having to tell them about your data model, solves a local problem. The data-* attributes, because they do not represent a formal model that can be manipulated, are insufficient to enable sharing of tools which can extract arbitrary modelled data. RDF, in particular, also provides estabished ways of merging existing data encoded in different existing schemata. There are many cases where people build their own dataset and queries to solve a local problem. As an example, Opera is not intersted in asking Google to index data related to internal developer documents, and use it to produce further documentation we need. However, we do automatically extract various kinds of data from internal documents and re-use it. While Opera does not in fact use the RDF toolstack for that process, there are many other large companies and organisations who do, and who would benefit from being able to use RDFa in that process. >> RDFa not only enables the use cases described in the videos listed >> above, but all use cases that struggle with enabling web browsers and >> web spiders understand the context of the current page. > > It would be helpful if we could list these use cases clearly and in > detail so that we could evaluate the solutions proposed against them. > > Here's a list of the use cases and requirements so far in this e-mail: > > * Web browsers should be able to help users find information related to > the items that page they are looking at discusses. > > * Search engines should be able to determine the contents of pages with > more accuracy than today. > > * Exposing calendar events so that users can add those events to their > calendaring systems. > > * Exposing music samples on a page so that a user can listen to all the > samples. > > * Getting data out of poorly written Web pages, so that the user can find > more information about the page's contents. > > * Finding more information about a movie when looking at a page about the > movie, when the page contains detailed data about the movie. > > Can we list some more use cases? > > > Here are some other questions that I would like the answers to so that I > can better understand what is being proposed here: > > Does it make sense to solve all these problems with the same syntax? That depends on the answers to your next two questions. Moreover, that is not actually a very good question in this case. I think the judgement call should be whether a syntax that allows people to solve the identified problem set consistently is sufficiently valuable (measured in terms of the advantages weighed against the disadvantages) to justify being part of HTML5. > What are the disadvantanges of doing so? I am not sure. > What are the advantages? Many people will be able to use standard tools which are part of their existing infrastructure to manipulate important data. They will be able to store that data in a visible form, in web pages. They will also be able to present the data easily in a form that does not force them to lose important semantics. People will be able to build toolkits that allow for processing of data from webpages without knowing, a priori, the data model used for that information. > What is the > opportunity cost of encouraging everyone to expose data in the same way? I don't know. I don't see much of an opportunity cost. > What is the cost of having different data use specialised formats? If the data model, or a part of it, is not explicit as in RDF but is implicit in code made to treat it (as is the case with using scripts to process things stored in arbitrarily named data-* attributes, and is also the case in using undocumented or semi-documented XML formats, it requires people to understand the code as well as the data model in order to use the data. In a corporate situation where hundreds or tens of thousands of people are required to work with the same data, this makes the data model very fragile. Such considerations also apply to larger communities, for example those dealing with complex scientific information. > Do publishers actually want to use a common data format? It would appear so - even in cases where they don't want to publish their data in such an easy-to-use format for commercial reasons. > How have past efforts in creating data formats fared? Some have been pretty successful. Dublin Core is a general format for labelling content that is widely used. MARC records have been very successful. > Are enough data providers actually willing to expose their data in a > machine readable manner for this to be truly useful? To make this truly useful it doesn't need to be exposed to the public. It would appear that organisations are prepared to make large investments in RDF data whether they expose them or not (and some very large ones do expose data), which suggests that this data is truly useful. > If data providers > will be willing to expose their data as RDFa, why are they not already > exposing their data in machine-readable form today? > > - For example, why doesn't Amazon expose a CSV file of your usage > history, or an Atom feed of the comments for each product, or an > hProduct annotated form of their product data? (Or do they? And if so, > do we know if users use this data?) Why would they need to? > - As another example, why doesn't Craigslist like their data being > reused in mashups? Would they be willing to allow their users to reuse > their data in these new and exciting ways, or would they go out of > their way to prevent the data from being accessible as soon as a > critical mass of users started using it? This is a key question. Why *should* a data provider be required to offer their product (data) for other people to use, in order to demonstrate that the data is useful. Google, a large provider of data, insists on certain conditions being met before it makes its services available, and that seems perfectly reasonably to me. Whether Craigslist actively attempts to make their data easier to aggregate, or actively avoids facilitating that process, strikes me as irrelevant to the question of whether there is value in enabling them to do so. Because large organisations specialising in gathering people's data, from Flickr to Google and Facebook to Government taxation departments are not the only consumers and producers of data that determine value for users. It would seem important that the Web easily enable small-time users of data to efficiently communicate with one another, without the need to have one of the giants as an intermediary. When libraries in the Dominican Republic want to share data, and librarians in Léon want to use that data, it seems that the Web should facilitate that without resorting to intermediaries like Amazon or Yahoo! and since we already have the technology to do so in a way that enables very powerful data models to be used without requiring coordination, it seems odd that you don't even understand how this could be valuable. > What will the licensing situation be like for this data? Will the > licenses allow for the reuse being proposed to solve the problems and > use cases listed above? In some cases yes, and in some cases no. In other words, making such data available does not distort natural market conditions one way or another. > How are Web browsers going to expose user interfaces to answer user > questions? I am glad to see that you think user interface behaviour is in fact important to the process of specifying HTML (I had been under the impression that you believed the spec should not touch on it). There are various query systems already available in browsers, from the search engine in Opera that lets you do a free-text search on pages stored in your history to Tabulator - a substantial RDF browser available as a Widget for Opera or as an extension to Firefox, that allows for a variety of pre-configured questions as well as free-form questions. > Can only previously configured, hard-coded questions be asked, > or will Web browsers be able to answer arbitrary free-form questions from > users using the data exposed by RDFa? Both of these are possible. The value of RDFa is that it actually supports the possibility of asking free-form questions by using a data model that is sufficiently well specified to enable constructions of tools that are not dependent on being preconfigured to recognise the exact type of data being queried (unlike, say, microformats, which require an intermediate agreement to enable people to extract the data, and don't provide for merging data of different types for rich queries). > How are Web browsers that expose this data going to handle data that is > not exposed in the same format? For example, if a site exposes data in > JSON or CSV format rather than RDFa, will that data be available to the > user in the same way? Who cares? But for those who do, this is up to Web browsers. They can choose to implement transformations between some particular CSV data and RDFa. The difficulty here (and therefore illustration of the value of RDFa) is that CSV data has important details of the meaning of the data only available out of band in looking at how the data is recorded, while RDF allows for automating the process of merging data originally encoded in different RDFa vocabularies. ... > What is the expected strategy to fight spam in these systems? Is it > expected that user agents will just collect data in the background? If > so, how are user agents expected to distinguish between pages that have > reliable data and pages that expose data that is misleading or wrong? Aggregating data in real-time is relatively expensive, so is a strategy more suited to dealing with asking new questions. Typical systems so far have aggregated data in the background to deal with known queries (one example is Google, which crawls pages in advance, anticipating searches that match terms against the content of those pages), and use live querying for cases where the result cannot reliably be stored (e.g. airline reservation systems like TravelJungle or LastMinute which determine price and availability based on constantly changing data). Different use cases will imply different strategies for fighting spam. Some obvious ones are to rely on trusted sites and secured and signed data, to use reputation managers, to follow the "shape" of data over time so that anamolies can be highlighted and checked more carefully (in the manner of Bayesian filters for email). Some use cases don't care much about spam, or are not very interesting to spammers. Some use cases are private data anyway. > - Systems like Yahoo! Search and Live Search expend extraordinary > amounts of resources on spam fighting technology; such technology > would not be accessible to Web browsers unless they interacted with > anti-spam services much like browsers today interact with > anti-phishing services. Actually, at least Opera already incorporates anti-spam technology in its mail client. Where browsers are the primary consumers of data there is nothing at all to suggest that they cannot incorporate anti-spam technology directly. (Indeed, the POWDER specification is designed in part to make that easy - and it is exactly the sort of data that might sometimes be usefully encoded in RDFa since it is based on an RDF model). > Yet anti-phishing services have been controversial, since they involve > exposing the user's browsing history to third parties; anti-spam > services would be a significantly greater problem due to the vastly > greater level of spamming compared to phishing. What is the solution > proposed to tackle this problem? It is not clear that this problem is any different in the context of RDFa to the general problem already faced by the Web. In general, the solutions proposed are the same as those already used on the Web, and of course those in development. > - Even with a mechanism to distinguish trusted sites from spammy sites, > how would Web browsers deal with trusted sites that have been subject > to spamming attacks? This is common, for instance, on blogs or wikis. Right. But that doesn't mean we question whether browsers should enable blogs or wikis. Why would RDFa data be different enough to make this question relevant? > These are not rhetorical questions, and I don't know the answers to them. Some of them seem to be poorly phrased, although if you don't understand why people have been working on this technology and why they think it would be valuable to have it available in HTML I guess that is almost inevitable. > We need detailed answers to all those questions before we can really > evaluate the various proposals that have been made here. No, we apparently need you to personally understand the Semantic Web Industry. Determining answers to the questions which are important is probably helpful, but also helpful is explaining when your questions are irrelevant because they are based on a lack of understanding. This is not intended as a slight, but to clarify the process required to have something as large as the "Sematic Web" (capital letters, implying the whole W3C activity, the industry based around RDF, and so on) evaluated for potential inclusion in the HTML5 specification. I presume the same would apply if the "Web Services" people came and asked to have all of their things included in HTML, and offered a specification that could be used to achieve their desires. ... [not clear what the context was here, so citing as it was] >> > I don't think more metadata is going to improve search engines. In >> > practice, metadata is so highly gamed that it cannot be relied upon. >> > In fact, search engines probably already "understand" pages with far >> > more accuracy than most authors will ever be able to express. >> >> You are correct, more erroneous metadata is not going to improve search >> engines. More /accurate/ metadata, however, IS going to improve search >> engines. Nobody is going to argue that the system could not be gamed. I >> can guarantee that it will be gamed. >> >> However, that's the reality that we have to live with when introducing >> any new web-based technology. It will be mis-used, abused and corrupted. >> The question is, will it do more good than harm? In the case of RDFa >> /and/ Microformats, we do think it will do more good than harm. > > For search engines, I am not convinced. Google's experience is that > natural language processing of the actual information seen by the actual > end user is far, far more reliable than any source of metadata. Thus from > Google's perspective, investing in RDFa seems like a poorer investment > than investing in natural language processing. Indeed. But Google is something of an edge case, since they can afford to run a huge organisation with massive computer power and many engineers to address a problem where a "near-enough" solution brings themn the users who are in turn the product they sell to advertisers. There are many other use cases where a small group of people want a way to reliably search trusted data. From global virtual library systems to a single websites, there are many others who find that processing structured data is more efficient for their needs than doing free-text analysis of web pages (something that they effectively contract out to Google, Ask, Yahoo! and their many competitors who specialise in it). Some of these are the people whe have decided that investing in RDFa is a far more valuable exercis than trying to out-invest Google in natural language processing. This email is already too long for most people to get through it :( I believe that this discussion is going to last for some time (I cannot imagine why, given the HTML timeline, it would need to be resolved before June), so there will be time for others to discuss more fully the many points Ian raises as ones he would like to understand. cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
attached mail follows:
On Jan 1, 2009, at 06:41, Charles McCathieNevile wrote: > There are many cases where people build their own dataset and > queries to solve a local problem. As an example, Opera is not > intersted in asking Google to index data related to internal > developer documents, and use it to produce further documentation we > need. However, we do automatically extract various kinds of data > from internal documents and re-use it. While Opera does not in fact > use the RDF toolstack for that process, there are many other large > companies and organisations who do, and who would benefit from being > able to use RDFa in that process. If the data production and consumption are both under the control of one entity (Opera in this case), why does the solution need to be engineered for spontaneous integration of decentralized data sources? Do the savings of using off-the-shelf tools outweigh the cost they impose by not being quite right for any specific purpose? Presumably the Opera-specific processing is more significant than generic parsing. Or is it? It seems that RDFa is motivated by private data and by interchange at the same time. This suggests multiple bilateral access control agreements instead of a Web-like system where data is made available for GETting without prior agreement between the parties. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
On Wed, Dec 31, 2008 at 10:41 PM, Charles McCathieNevile <chaals@opera.com> wrote: > A standard way to include arbitrary data in a web page and extract it for > machine processing, without having to pre-coordinate their data models. This isn't a requirement (or in other words, a problem), it's a solution. What are the problems that need to be solved, and for which having a standard way to include arbitrary data in a web page and have it easily extractable would be helpful? (Note: I think there certainly *are* problems that *would* find this helpful, I'm just trying to lead your argument into the right direction.) (As well, since the discussion is about RDFa specifically, not data-markup in general, what are the problems that need RDFa *specifically* as a solution, as compared to the myriad other ways to embed data?) > Since many people use RDF as an interchange, storage and processing format > for this kind of data (because it provides for automated mapping of data > from one schema to many others, without requiring anyone to touch the > original schemata or agree in advance how they should be created), I believe > there is a requirement for a method that allows third parties to include RDF > data in, and extract it from information encoded within an HTML page. Solutions for this already exist; embedded N3 in a <script> tag, just to name something that Ian already mentioned, allows you to mash RDF data into a page in a machine-extractable way, and brings in any of the specific ancillary benefits of RDF. >>> The Microformats community has done a remarkable job of working on the >>> web semantics problem, creating several different methods of expressing >>> common human concepts (contact information (hCard), events (hCalendar), >>> and audio recordings (hAudio)). >> >> Right; with Microformats, each Microformat has its own problem space and >> thus each one can be evaluated separately. It is much harder to evaluate >> something when the problem space is as generic as it appears RDFa's is. > > The point is that there are a very large set of very small problem spaces > relevant to a small group at a time. Like RDF itself, RDFa is meeting the > problem of allowing these people to share machine-processable data without > previously coordinating their approach. Not quite correct. Again, the problem of embedded shareable data in a web page has been solved multiple times. The specific problem of sharing *RDF* data (due to needing/wanting the specific benefits RDF can offer) has also been solved. What are the precise problems that require *RDFa* as a solution? (I won't belabor this point, though it could be brought up several times more in your email. This is and was the primary point of contention between RDFa supporters and those of us who aren't convinced it belongs in the HTML5 spec. It is the major thrust of much of Ian's email; he's trying to help you (RDFa supporters in general, that is) find exactly what the problem is that RDFa specifically is trying to solve.) > Moreover, that is not actually a very good question in this case. I think > the judgement call should be whether a syntax that allows people to solve > the identified problem set consistently is sufficiently valuable (measured > in terms of the advantages weighed against the disadvantages) to justify > being part of HTML5. Well, there are many things that would offer more advantages than disadvantages by themselves. We can't possibly include all of them in the spec; you can think about this as including a hidden large disadvantage of 'will grow the size of the spec and the amount of work implementors have to do'. Thus the advantages must generally be significantly larger than the disadvantages; this is why the best argument for including something in the spec is often "there are already widespread hacks to accomplish this". <video>, for example, was included based on pretty much precisely that argument. Of course, that just means that we've identified a problem that is significant enough to be solved in the spec. There is still significant work involved in ensuring that we identify a solution that actually hits the problem squarely; the existing hacks are usually inadequate, not through any true fault of their own, but merely because they had not considered the problem broadly enough, or lacked enough eyes to find rough edges and missing spots. >> What are the advantages? > > Many people will be able to use standard tools which are part of their > existing infrastructure to manipulate important data. They will be able to > store that data in a visible form, in web pages. They will also be able to > present the data easily in a form that does not force them to lose important > semantics. > > People will be able to build toolkits that allow for processing of data from > webpages without knowing, a priori, the data model used for that > information. Part of the point of Ian's email is that this is not a problem that is solved by RDFa, it's a problem that's solved by *any* sufficient data format. Many solutions currently exist which don't require any addition to the spec. >> What is the >> opportunity cost of encouraging everyone to expose data in the same way? > > I don't know. I don't see much of an opportunity cost. There is no perfect data model, or perfect representation method. Every group of data is different, has different ideal representations, and incurs some degree of cost when forced into an existing data model (that is, one not tailored to the data's specs). This must thus be considered. >> - As another example, why doesn't Craigslist like their data being >> reused in mashups? Would they be willing to allow their users to reuse >> their data in these new and exciting ways, or would they go out of >> their way to prevent the data from being accessible as soon as a >> critical mass of users started using it? > > This is a key question. Why *should* a data provider be required to offer > their product (data) for other people to use, in order to demonstrate that > the data is useful. Google, a large provider of data, insists on certain > conditions being met before it makes its services available, and that seems > perfectly reasonably to me. > > Whether Craigslist actively attempts to make their data easier to aggregate, > or actively avoids facilitating that process, strikes me as irrelevant to > the question of whether there is value in enabling them to do so. Because > large organisations specialising in gathering people's data, from Flickr to > Google and Facebook to Government taxation departments are not the only > consumers and producers of data that determine value for users. > > It would seem important that the Web easily enable small-time users of data > to efficiently communicate with one another, without the need to have one of > the giants as an intermediary. When libraries in the Dominican Republic want > to share data, and librarians in Léon want to use that data, it seems that > the Web should facilitate that without resorting to intermediaries like > Amazon or Yahoo! and since we already have the technology to do so in a way > that enables very powerful data models to be used without requiring > coordination, it seems odd that you don't even understand how this could be > valuable. This is precisely a key question because of many of the arguments that RDFa supporters have brought up (specifically, in the last flurry of emails to the group on this subject), that having RDFa will allow web users to query their browsers, which can then seek out structured data to answer their questions. If large websites are not willing to provide their data to the web-at-large in a structured format, though, then all the data formats in the world won't accomplish the goal. In this email, though, you are largely arguing for smaller, more personal use cases. Most of the questions are still valid, however. Problem: Librarians across the world want to share data. What are the requirements here? How is RDFa meet those requirements? Are there other solutions which meet those requirements better? Are existing solutions adequate if deployed consistently (thus negating the need for a new technology)? Specifically, small-time users seem (to me, at least) to need RDFa as a solution the least. They can negotiate a shared data format themselves, or at least present an API that can be engineered against by others. RDF itself may be a useful tool here, if it allows reuse of existing tools and thus simplifies the process of sharing and consuming the data, but RDFa specifically is a solution for embedding this data within a web page and allowing browsers to digest it as they encounter it. This is not an appropriate solution for the sharing of catalog data between libraries; it *may* be a solution for the average web user to have their browser grab the embedded information on a page for a specific book and query for reviews on the product across the web. This, though, then once again brings up the traditional questions. Is RDFa the best solution for this? Are there existing solutions to this? Ian specifically mentioned simply Googling for the book title; this is indeed often quite adequate for a web user. Does the use of RDFa and the active involvement of the browser in this process offer enough of a benefit above just typing a phrase into the search bar to justify inclusion into the spec? If you believe so, can you explain precisely why? >> Can only previously configured, hard-coded questions be asked, >> or will Web browsers be able to answer arbitrary free-form questions from >> users using the data exposed by RDFa? > > Both of these are possible. The value of RDFa is that it actually supports > the possibility of asking free-form questions by using a data model that is > sufficiently well specified to enable constructions of tools that are not > dependent on being preconfigured to recognise the exact type of data being > queried (unlike, say, microformats, which require an intermediate agreement > to enable people to extract the data, and don't provide for merging data of > different types for rich queries). This is not a benefit of RDFa. It *may* be a benefit of RDF. What does RDFa bring to the table that other solutions do not? What does it take away? > Aggregating data in real-time is relatively expensive, so is a strategy more > suited to dealing with asking new questions. Typical systems so far have > aggregated data in the background to deal with known queries (one example is > Google, which crawls pages in advance, anticipating searches that match > terms against the content of those pages), Google is a large company, and can indeed invest resources into trawling and recording such data. This is explicitly not an option for the smaller uses you seem to be highlighting in this email, though. RDFa is specifically a (very) distributed data storage system. Can it address these sorts of problems, if the small-time users simply can't trawl the entire web for matching information? When the info is relatively contained (such that finding and reading the pages it exists on is feasible), is trawling the pages for RDFa data the best solution? Are there other solutions which would work better (such as providing an API for hitting a database)? Are there existing solutions which work adequately? > and use live querying for cases > where the result cannot reliably be stored (e.g. airline reservation systems > like TravelJungle or LastMinute which determine price and availability based > on constantly changing data). Similarly, would these sites work by trawling reservation sites for RDFa data? As well, what if the reservation sites aren't interested in providing the data in a machine-readable format (for example, if they want users to go directly to their sites)? Would it be better for these types of sites to hit an API provided by the reservation sites directly? Would it be better for the discount sites to trawl with custom algorithms that don't require the cooperation of the reservation sites? Within the space of page-embedded data, are there better solutions, or existing adequate solutions? >> - Systems like Yahoo! Search and Live Search expend extraordinary >> amounts of resources on spam fighting technology; such technology >> would not be accessible to Web browsers unless they interacted with >> anti-spam services much like browsers today interact with >> anti-phishing services. > > Actually, at least Opera already incorporates anti-spam technology in its > mail client. Where browsers are the primary consumers of data there is > nothing at all to suggest that they cannot incorporate anti-spam technology > directly. (Indeed, the POWDER specification is designed in part to make that > easy - and it is exactly the sort of data that might sometimes be usefully > encoded in RDFa since it is based on an RDF model). Fighting email spam is a different problem from fighting black-hat SEO spamming. The attack surfaces presented by RDFa are much closer to the latter than the former. >> - Even with a mechanism to distinguish trusted sites from spammy sites, >> how would Web browsers deal with trusted sites that have been subject >> to spamming attacks? This is common, for instance, on blogs or wikis. > > Right. But that doesn't mean we question whether browsers should enable > blogs or wikis. Why would RDFa data be different enough to make this > question relevant? Users are interacting with blogs/wikis on a human level, and thus can exercise their own (admittedly poor in practice) judgement. This is a different problem from the browser automatically parsing data on a page and removing the spam. > I presume the same would apply if the "Web Services" people came and asked > to have all of their things included in HTML, and offered a specification > that could be used to achieve their desires. It would be the case that they would be subject to the same questions as the RDFa spec is, yes. > ... > > [not clear what the context was here, so citing as it was] >>> >>> > I don't think more metadata is going to improve search engines. In >>> > practice, metadata is so highly gamed that it cannot be relied upon. >>> > In fact, search engines probably already "understand" pages with far >>> > more accuracy than most authors will ever be able to express. >>> >>> You are correct, more erroneous metadata is not going to improve search >>> engines. More /accurate/ metadata, however, IS going to improve search >>> engines. Nobody is going to argue that the system could not be gamed. I >>> can guarantee that it will be gamed. >>> >>> However, that's the reality that we have to live with when introducing >>> any new web-based technology. It will be mis-used, abused and corrupted. >>> The question is, will it do more good than harm? In the case of RDFa >>> /and/ Microformats, we do think it will do more good than harm. >> >> For search engines, I am not convinced. Google's experience is that >> natural language processing of the actual information seen by the actual >> end user is far, far more reliable than any source of metadata. Thus from >> Google's perspective, investing in RDFa seems like a poorer investment >> than investing in natural language processing. > > Indeed. But Google is something of an edge case, since they can afford to > run a huge organisation with massive computer power and many engineers to > address a problem where a "near-enough" solution brings themn the users who > are in turn the product they sell to advertisers. There are many other use > cases where a small group of people want a way to reliably search trusted > data. > > From global virtual library systems to a single websites, there are many > others who find that processing structured data is more efficient for their > needs than doing free-text analysis of web pages (something that they > effectively contract out to Google, Ask, Yahoo! and their many competitors > who specialise in it). Some of these are the people whe have decided that > investing in RDFa is a far more valuable exercis than trying to out-invest > Google in natural language processing. "Processing structured data" is something that can be done without RDFa. The reason for the resistance to RDFa from this working group so far is the lack of sufficient significant problems that are best solved by RDFa specifically. As well, the use cases for in-the-small data interchange and in-the-large data interchange are significantly different. Again, RDFa is a very distributed data storage format; you don't see the entire 'database' until you've trawled all the pages which include it. This is why there is such a focus on whether RDFa is a decent solution for search engines - they *see* the web better than anyone else, and thus appear to be able to utilize such a distributed data format most effectively than anyone else. However, Ian is pointing out that those same search engines (at least Google, though I expect Yahoo, etc. feel the same) believe that natural-language processing is a far more effective method of gathering information. It is less prone to gaming (natural language being naturally unstructured, it's harder to emit spam data that has the same statistical characteristics), and allows for extracting far more data automatically than any one user would ever think to include. > This email is already too long for most people to get through it :( I > believe that this discussion is going to last for some time (I cannot > imagine why, given the HTML timeline, it would need to be resolved before > June), so there will be time for others to discuss more fully the many > points Ian raises as ones he would like to understand. The HTML timeline is partially a joke (2023 is the date for 'full compliance'; there isn't a single browser yet who has fully implemented *html4* ^_^). We still would like things resolved with all due speed; the faster they hit the spec, the faster they'll be integrated into browsers. Conclusion ========== There is significant confusion (or at least lack of distinction) in your email (and generally in the arguments from RDFa supporters in my experience) between RDFa and RDF, RDF and the general concept of data interchange formats, distributed and centralized data storage, in-the-small data interchange and in-the-large data interchange, and personal use (ie web users) and organization use (ie search engines). Each of these individually confuse the argument; when brought together as they typically are, they render many arguments completely useless. Separating RDFa from RDF ------------------------ The bonuses/maluses of RDF itself are completely irrelevant to this discussion. This is because there already exists several methods in active use for embedding RDF in a web page. In other words, whatever problem requires you to embed RDF in a webpage has been *solved*, and without any necessity of cooperation from the html language itself. RDFa is specifically a proposal to embed structured data in a web page using attributes on elements. *This* is the solution we need to find problems for if we want RDFa merged into the spec. Separating RDF from general data interchange formats ---------------------------------------------------- Many of the problems that can be solved by using a common data interchange format don't require specifically what RDF brings to the table. As noted earlier in this email, every collection of data has its own shape, and its own particular 'ideal' representation. RDF forces a particular method of representation. This has its bonuses and maluses, but they are *completely separate* from the bonuses/maluses of generically using a data interchange format. Libraries don't need RDF to exchange data, they just need *some* agreement on data representation. What problems are specifically solved by RDF and its specific representation being favored in the spec over a more general method of data representation? Separating distributed and centralized data storage ---------------------------------------------------- RDFa is a distributed data storage format - a single page includes only a fraction of the relevant data. The opposite possibility is centralized data storage - a single entity holding the data in a particular place (such as a database on their servers). The latter is very common, simple, and natural. To get at the data, you just run queries against the single database. This does require the entity with the data to produce an API to run queries against, but the same is required for use of a distributed data format (the company in charge of the site has to specifically code to expose that data in the given format). Both storage methods, though, allow sharing of data and enable all manner of useful web services. What problems are specifically solved by a distributed data strategy which are solved worse or not at all by a centralized data strategy? Separating in-the-small and in-the-large data interchange --------------------------------------------------------- In-the-small data interchange involves a small number of entities who can trust each other and generally receive a direct benefit from structuring and sharing their data. In-the-large data interchange involves a large number of disparate entities who *can't* trust each other and won't generally receive direct benefit for structuring their data. What problems are shared by these two situations? Which are best solved by RDFa? Are there existing solutions to these problems that are adequate? If RDFa is intended to be for one or the other of these situations, it would be convenient for advocates to agree which it is, so that we can then focus the discussion on that. As it is we are getting into useless arguments where someone is talking about one situation, and then someone else brings up a "Yes, but..." involving the other situation. Separating personal consumption from corporate consumption ---------------------------------------------------------- It has already been noted that existing search engines have found metadata to be generally unreliable, and instead rely on natural-language processing to extract information from pages. Can RDFa offer better solutions to the problems of search engines than they currently employ? Personal use is an entirely different issue. RDFa is often touted as making it easy for users to look up information about data on the page. It has also been noted, though, that simply highlighting some text (say, a song title) and selecting "Search Google for the text '...'" (specific text is from my machine; your experience may vary) does essentially the same thing, and possibly offers much more. As well, new features such as IE8's accelerators offer even more advanced functionality when you need it, such as allowing you to search IMBD.com specifically for your highlighted text, using IMDB's own search form. Are there significant problems left in this space? Does RDFa solve them? Are they better solved by other solutions? ~TJ
attached mail follows:
Tab Atkins Jr. wrote: > ... > Solutions for this already exist; embedded N3 in a <script> tag, just > to name something that Ian already mentioned, allows you to mash RDF > data into a page in a machine-extractable way, and brings in any of > the specific ancillary benefits of RDF. > ... Well, it'll require an N3 parser where previously none was needed. Also, it separates the metadata from the text, a situation most people want to avoid. This may work, but as far as I can tell, the use of <script> for "data blocks" is an afterthought -- for instance, it's described in a section about, well, Scripting. So, is anybody using this successfully in practice? > ... > Not quite correct. Again, the problem of embedded shareable data in a > web page has been solved multiple times. The specific problem of > sharing *RDF* data (due to needing/wanting the specific benefits RDF > can offer) has also been solved. What are the precise problems that > require *RDFa* as a solution? > ... Could you elaborate a bit on these solutions? My understanding was that RDFa has been produced in order to address problems with other approaches, such as using <meta> elements, eRDF, or microformats. If there is a *successful* alternative to RDFa that does not require new attributes, please let us know :-). > ... > Well, there are many things that would offer more advantages than > disadvantages by themselves. We can't possibly include all of them in > the spec; you can think about this as including a hidden large > disadvantage of 'will grow the size of the spec and the amount of work > implementors have to do'. Thus the advantages must generally be > significantly larger than the disadvantages; this is why the best > argument for including something in the spec is often "there are > already widespread hacks to accomplish this". <video>, for example, > was included based on pretty much precisely that argument. > ... Reminder: RDFa is one of the things the (W3C) Working Group's Charter mentions as candidate for inclusion (either by a generic extensibility mechanism, or otherwise by extending the language): "The HTML WG is encouraged to provide a mechanism to permit independently developed vocabularies such as Internationalization Tag Set (ITS), Ruby, and RDFa to be mixed into HTML documents." <http://www.w3.org/2007/03/HTML-WG-charter.html#other> > ... Best regards, Julian
attached mail follows:
On Fri, Jan 2, 2009 at 11:55 AM, Julian Reschke <julian.reschke@gmx.de> wrote: > Tab Atkins Jr. wrote: >> >> ... >> Solutions for this already exist; embedded N3 in a <script> tag, just >> to name something that Ian already mentioned, allows you to mash RDF >> data into a page in a machine-extractable way, and brings in any of >> the specific ancillary benefits of RDF. >> ... > > Well, it'll require an N3 parser where previously none was needed. RDFa requires an RDFa parser as well, and in general *any* metadata requires a parser, so this point is moot. The only metadata that doesn't require a parser is no metadata at all. > Also, it > separates the metadata from the text, a situation most people want to avoid. That sounds like a requirement, but it's one that already presumes that metadata is useful to embed in webpages. It has not yet been established that there is a problem worth solving that metadata would address at all. (Clarifying this was the primary purpose of Ian's mail, and my first mail in this thread.) > This may work, but as far as I can tell, the use of <script> for "data > blocks" is an afterthought -- for instance, it's described in a section > about, well, Scripting. > > So, is anybody using this successfully in practice? I have no idea. The point is, though, that it *is* an existing possibility that requires no further effort from this working group or browser developers. As such, if it solves the problem (whatever it is, since that hasn't yet been well-established) sufficiently, we can leave it alone. It is in the best interests of everybody if a solution can be found without any changes to the language, because it means browser uptake is quick (immediate and retroactive, to be precise ^_^). We have to ensure that the problem isn't already solved by the language first, and only after that can we evaluate whether the language is the correct place to solve the problem, and only after *that* can we start discussing how to actually go about solving the problem in the language. Too much of this discussion is jumping straight to step 3, so Ian, I, and others are trying to focus it on step 1. >> ... >> Not quite correct. Again, the problem of embedded shareable data in a >> web page has been solved multiple times. The specific problem of >> sharing *RDF* data (due to needing/wanting the specific benefits RDF >> can offer) has also been solved. What are the precise problems that >> require *RDFa* as a solution? >> ... > > Could you elaborate a bit on these solutions? Microformats, embedded data in <script> blocks, embedded XML, custom attributes, other miscellaneous uses of @class and related attributes, and simply putting the data in natural language. These solutions already exist, and in several cases are easier to use than RDFa. Do they have specific failings that RDFa addresses? Are these failings significant enough to warrant extending the language to solve them? To we *understand* the failings (assuming they exist and are significant) well enough to be confident we can solve them correctly in the language, rather than waiting for the community to solve them themselves and then simply reifying their solutions? > My understanding was that RDFa has been produced in order to address > problems with other approaches, such as using <meta> elements, eRDF, or > microformats. > > If there is a *successful* alternative to RDFa that does not require new > attributes, please let us know :-). The most successful alternative is nothing at all. ^_^ We can extract copious data from web pages reliably without metadata, either using our human senses (in personal use) or natural-language-based processing (in search engine use). It has not yet been established that sufficient and significant enough problems *exist* to justify a solution, let alone one that requires an addition to html. That is what Ian is specifically looking for. Unfortunately, you really do need to justify metadata anew; you can't just point at Microformats or something similar and say "we're doing the same things as those guys!". They exist currently because they can fit their solutions into the language as it is; there is no further need to justify them in this group. Modifying the language, though, is an explicit admission that this is a problem worth solving and worth solving in a particular way, and so requires significant justification. >> ... >> Well, there are many things that would offer more advantages than >> disadvantages by themselves. We can't possibly include all of them in >> the spec; you can think about this as including a hidden large >> disadvantage of 'will grow the size of the spec and the amount of work >> implementors have to do'. Thus the advantages must generally be >> significantly larger than the disadvantages; this is why the best >> argument for including something in the spec is often "there are >> already widespread hacks to accomplish this". <video>, for example, >> was included based on pretty much precisely that argument. >> ... > > Reminder: RDFa is one of the things the (W3C) Working Group's Charter > mentions as candidate for inclusion (either by a generic extensibility > mechanism, or otherwise by extending the language): > > "The HTML WG is encouraged to provide a mechanism to permit independently > developed vocabularies such as Internationalization Tag Set (ITS), Ruby, and > RDFa to be mixed into HTML documents." > <http://www.w3.org/2007/03/HTML-WG-charter.html#other> As a note, this isn't the W3C's HTML WG. The WHATWG is independent from the W3C. ~TJ
attached mail follows:
Tab Atkins Jr. wrote: > ... >> Well, it'll require an N3 parser where previously none was needed. > > RDFa requires an RDFa parser as well, and in general *any* metadata > requires a parser, so this point is moot. The only metadata that > doesn't require a parser is no metadata at all. With RDFa, most of the parsing is done by HTML. So I would call it an "RDFa processor". And yes, that doesn't change the fact that code needs to be written. But it affects the type of the code that needs to be written. > ... > I have no idea. The point is, though, that it *is* an existing > possibility that requires no further effort from this working group or > browser developers. As such, if it solves the problem (whatever it > is, since that hasn't yet been well-established) sufficiently, we can > leave it alone. It is in the best interests of everybody if a > solution can be found without any changes to the language, because it > means browser uptake is quick (immediate and retroactive, to be > precise ^_^). > ... Well, there are lots of conditionals in this statement :-) > We have to ensure that the problem isn't already solved by the > language first, and only after that can we evaluate whether the > language is the correct place to solve the problem, and only after > *that* can we start discussing how to actually go about solving the > problem in the language. Too much of this discussion is jumping > straight to step 3, so Ian, I, and others are trying to focus it on > step 1. I would say this is because the research and design in this area totally predates HTML5. Are you seriously suggesting that all of that needs to start from scratch? >>> ... >>> Not quite correct. Again, the problem of embedded shareable data in a >>> web page has been solved multiple times. The specific problem of >>> sharing *RDF* data (due to needing/wanting the specific benefits RDF >>> can offer) has also been solved. What are the precise problems that >>> require *RDFa* as a solution? >>> ... >> Could you elaborate a bit on these solutions? > > Microformats, embedded data in <script> blocks, embedded XML, custom > attributes, other miscellaneous uses of @class and related attributes, > and simply putting the data in natural language. > ... - Microformats: how do they solve sharing RDF data? - embedded data in <script>: see discussion above - embedded XML: embedded in where? - custom attributes: wow, that sounds like RDFa - @class and friends: that sounds like eRDF, which the way it is currently specified is broken in HTML5 (@profile) - natural language: hey great, please elaborate :-) > ... >> My understanding was that RDFa has been produced in order to address >> problems with other approaches, such as using <meta> elements, eRDF, or >> microformats. >> >> If there is a *successful* alternative to RDFa that does not require new >> attributes, please let us know :-). > > The most successful alternative is nothing at all. ^_^ We can > extract copious data from web pages reliably without metadata, either > using our human senses (in personal use) or natural-language-based > processing (in search engine use). It has not yet been established > that sufficient and significant enough problems *exist* to justify a > solution, let alone one that requires an addition to html. That is > what Ian is specifically looking for. That's what you and Ian claim. Many disagree. > Unfortunately, you really do need to justify metadata anew; you can't > just point at Microformats or something similar and say "we're doing > the same things as those guys!". They exist currently because they > can fit their solutions into the language as it is; there is no > further need to justify them in this group. Modifying the language, > though, is an explicit admission that this is a problem worth solving > and worth solving in a particular way, and so requires significant > justification. Disagreed. The very existence of Microformats prove that people want to augment their content with metadata that is machine-readable. Some of the shortcomings of Microformats are caused by the way they are retrofitted into HTML. So it's totally natural to discuss whether a better solution can be reached by adding new stuff to the language. >>> ... >> Reminder: RDFa is one of the things the (W3C) Working Group's Charter >> mentions as candidate for inclusion (either by a generic extensibility >> mechanism, or otherwise by extending the language): >> >> "The HTML WG is encouraged to provide a mechanism to permit independently >> developed vocabularies such as Internationalization Tag Set (ITS), Ruby, and >> RDFa to be mixed into HTML documents." >> <http://www.w3.org/2007/03/HTML-WG-charter.html#other> > > As a note, this isn't the W3C's HTML WG. The WHATWG is independent > from the W3C. > ... Sounds like we need to restart the thread on the HTML WG's mailing list then. Best regards, Julian
attached mail follows:
On 3/1/09 14:02, Julian Reschke wrote: > Tab Atkins Jr. wrote: >> ... >>> Well, it'll require an N3 parser where previously none was needed. >> >> RDFa requires an RDFa parser as well, and in general *any* metadata >> requires a parser, so this point is moot. The only metadata that >> doesn't require a parser is no metadata at all. > > With RDFa, most of the parsing is done by HTML. So I would call it an > "RDFa processor". And yes, that doesn't change the fact that code needs > to be written. But it affects the type of the code that needs to be > written. Somewhat of an aside, but for the curious - here is an RDFa parser/processor app: http://code.google.com/p/rdfquery/wiki/Introduction example: http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html js: http://rdfquery.googlecode.com/svn/trunk/jquery.rdfa.js [...] >> The most successful alternative is nothing at all. ^_^ We can >> extract copious data from web pages reliably without metadata, either >> using our human senses (in personal use) or natural-language-based >> processing (in search engine use). It has not yet been established >> that sufficient and significant enough problems *exist* to justify a >> solution, let alone one that requires an addition to html. That is >> what Ian is specifically looking for. > > That's what you and Ian claim. Many disagree. My main problem with the natural language processing option is that it feels too close to waiting for Artificial Intelligence. I'd rather add 6 attributes to HTML and get on with life. But perhaps a more practical concern is that it unfairly biases things towards popular languages - lucky English, lucky Spanish, etc., and those that lend themselves more to NLP analysis. The Web is for everyone, and people shouldn't be forced to read and write English to enjoy the latest advances in Web automation. Since HTML5 is going through W3C, such considerations need to be taken pretty seriously. >> As a note, this isn't the W3C's HTML WG. The WHATWG is independent >> from the W3C. But the WHATWG HTML5 *work* is no longer entirely independent of W3C; the two organizations embarked on a major joint venture. It seems reasonable for members of the WHATWG world to take W3C-oriented considerations seriously, regardless of mailing list. cheers, Dan -- http://danbri.org/
attached mail follows:
Also sprach Dan Brickley:
> My main problem with the natural language processing option is that it
> feels too close to waiting for Artificial Intelligence. I'd rather add 6
> attributes to HTML and get on with life.
:-)
Personally, I think the 'class' attribute may still be a more
compelling option in a less-is-more way. It already exists and can
easily be used for styling purposes. Styling is bait for authors to
disclose semantics.
Cheers,
-h&kon
Håkon Wium Lie CTO °þe®ª
howcome@opera.com http://people.opera.com/howcome
attached mail follows:
On 3/1/09 16:54, Håkon Wium Lie wrote: > Also sprach Dan Brickley: > > > My main problem with the natural language processing option is that it > > feels too close to waiting for Artificial Intelligence. I'd rather add 6 > > attributes to HTML and get on with life. > > :-) Another thought re NLP. RDFa (and similar, ...) are formats that can be used for writing down the conclusions of NLP analysis. For example here see the BBC's recent Muddy Boots experiment, using DBPedia (Wikipedia in RDF) data to drive autoclassification / named entity recognition. So here we can agree with Ian and others that text analysis has much to offer, and still use RDFa (or other semantic markup - i'll sidestep that debate for now) as a notation for marking up the words with a machine-friendly indicator of their NLP-guessed meaning. http://www.bbc.co.uk/blogs/journalismlabs/2008/12/muddy_boots.html > Personally, I think the 'class' attribute may still be a more > compelling option in a less-is-more way. It already exists and can > easily be used for styling purposes. Styling is bait for authors to > disclose semantics. I'm sure there's mileage to be had there. I'm somehow incapable of writing XSLT so GRDDL hasn't really charmed me, but 'class' certainly corresponds to a lot of meaningful markup. Naturally enough it is stronger at tagging bits of information with a category than at defining relationships amongst the things defined when they're scattered around the page. But that's no reason to dismiss it entirely. Did you see the RDF-EASE draft, http://buzzword.org.uk/2008/rdf-ease/spec? From which comes: "Ten second sales pitch: CSS is an external file that specifies how your document should look; RDF-EASE is an external file that specifies what your document means." RDF-EASE uses CSS-based syntax. More discussion here, http://lists.w3.org/Archives/Public/semantic-web/2008Dec/0148.html including question of whether it ought to be expressed using css3-namespace, http://lists.w3.org/Archives/Public/semantic-web/2008Dec/0175.html chers, Dan -- http://danbri.org/
attached mail follows:
I've tried to follow all the discussion besides of its lengths and my conclusion is: You're asking the wrong question People against RDFa in HTML5 are asking "why do you need RDFa?", and supporters of the proposal are actually describing the benefits of RDFa itself. The right question is: why do you need RDFa *inside HTML5*? My personal answer to this question is: There is no needing for RDFa inside HTML5. There are other markup languages which support RDFa natively (XHTML for example). You may say that in this way you help to divide the web in two sides, users of HTML5 and users of XHTML2. Actually the web, is already divided in two big groups: - Web of data - Web of interaction Web of data means all the page whose primary objective is to provide some information, either user-readable or machine-readable to the users, while web of interaction include web application, whose primary purpose is to provide additional services to the users. These two groups have very different requirements (GMail doesn't need RDFa in application code, while Wikipedia doesn't need a progress element), so specific markup languages may suit better the web site. Moreover, this distinction is not a requirement, is just an advice: you can put metadata inside HTML5 using Microformats and you can put interactivity inside XHTML2 using XMLEvents. Summing up: if you author feel the absolute needing for metadata, because delivering content to the users is your primary goal, then switch from HTML5 to something else, and leave HTML5 to web application, focused on user interaction. Giovanni
attached mail follows:
On Sun, 04 Jan 2009 02:54:18 +1100, Håkon Wium Lie <howcome@opera.com> wrote: > Also sprach Dan Brickley: > > > My main problem with the natural language processing option is that it > > feels too close to waiting for Artificial Intelligence. I'd rather > > add 6 attributes to HTML and get on with life. ... > Personally, I think the 'class' attribute may still be a more > compelling option in a less-is-more way. It already exists and can > easily be used for styling purposes. Styling is bait for authors to > disclose semantics. I agree that this is a clear first step - and microformats were developed by paving a cowpath from authors who had done this on their own initiative. I think the reason for adding the RDFa attributes is that there are cases where the semantic richness offered by class is insufficient. The relevant cases are where people are already dealing in rich formalised semantics, not those where it is a battle to get people to provide any semantics at all. I think there is a clear benefit in drawing these people to HTML5 rather than suggesting they go off into some different Web. I used the pattern of adding semantics through class, a decade or so ago, and in some cases it met my needs perfectly, but in others was insufficient to enable re-use of the data directly from pages, and forced me to adopt external systems for managing my data which in turn implied an increased cost in management because I had to keep the data model clear although I did not have a simple formalism to specify it at the time. cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
attached mail follows:
Dan Brickley ha scritto: > On 3/1/09 14:02, Julian Reschke wrote: >> Tab Atkins Jr. wrote: >>> The most successful alternative is nothing at all. ^_^ We can >>> extract copious data from web pages reliably without metadata, either >>> using our human senses (in personal use) or natural-language-based >>> processing (in search engine use). It has not yet been established >>> that sufficient and significant enough problems *exist* to justify a >>> solution, let alone one that requires an addition to html. That is >>> what Ian is specifically looking for. >> >> That's what you and Ian claim. Many disagree. > > My main problem with the natural language processing option is that it > feels too close to waiting for Artificial Intelligence. I'd rather add > 6 attributes to HTML and get on with life. > > But perhaps a more practical concern is that it unfairly biases things > towards popular languages - lucky English, lucky Spanish, etc., and > those that lend themselves more to NLP analysis. *The Web is for > everyone*, and people shouldn't be forced to read and write English to > enjoy the latest advances in *Web automation*. Since HTML5 is going > through W3C, such considerations need to be taken pretty seriously. > My concern is: is RDFa really suitable for everyone and for Web automation? My own answer, at first glance, is no. That's because RDF(a) can perhaps address nicely very niche needs, where determining how much data can be trusted is not a problem, but in general misuses AND deliberate abuses may harm automation heavily, since an automaton is unlikely to be able to understand whether metadata express the real meaning of a web page or not (without a certain degree of AI). If an external mechanism is needed to determine trust level for metadata, that is to establish when an automation results are good or bad, such a mechanism may involve human beings at some stage, thus breaking automation (this is somehow similar to the problem of defining an "oracle machine" described by Turing, according to whom such a machine isn't an automaton). On another hand, a very custom model thought for very custom needs (and not requiring wide support) may be less prone to abuses, since it's unlikely to find someone willing to cheat himself. Thus, having third parties agreeing a certain model and related APIs, and implementing APIs on their own sides, might be more reliable in some cases (anyway, third parties should agree their respective metadata are reliable and find a way to evaluate they really are). Dan Brickley ha scritto: > On 3/1/09 16:54, Håkon Wium Lie wrote: >> Also sprach Dan Brickley: >> >> > My main problem with the natural language processing option is >> that it >> > feels too close to waiting for Artificial Intelligence. I'd >> rather add 6 >> > attributes to HTML and get on with life. >> >> :-) > > Another thought re NLP. RDFa (and similar, ...) are formats that can > be used for writing down the conclusions of NLP analysis. For example > here see the BBC's recent Muddy Boots experiment, using DBPedia > (Wikipedia in RDF) data to drive autoclassification / named entity > recognition. So here we can agree with Ian and others that text > analysis has much to offer, and still use RDFa (or other semantic > markup - i'll sidestep that debate for now) as a notation for marking > up the words with a machine-friendly indicator of their NLP-guessed > meaning. > > http://www.bbc.co.uk/blogs/journalismlabs/2008/12/muddy_boots.html > >> Personally, I think the 'class' attribute may still be a more >> compelling option in a less-is-more way. It already exists and can >> easily be used for styling purposes. Styling is bait for authors to >> disclose semantics. > > I'm sure there's mileage to be had there. I'm somehow incapable of > writing XSLT so GRDDL hasn't really charmed me, but 'class' certainly > corresponds to a lot of meaningful markup. Naturally enough it is > stronger at tagging bits of information with a category than at > defining relationships amongst the things defined when they're > scattered around the page. But that's no reason to dismiss it entirely. > > Did you see the RDF-EASE draft, > http://buzzword.org.uk/2008/rdf-ease/spec? From which comes: "Ten > second sales pitch: CSS is an external file that specifies how your > document should look; *RDF-EASE is an external file that specifies > what your document means.*" > > RDF-EASE uses CSS-based syntax. More discussion here, > http://lists.w3.org/Archives/Public/semantic-web/2008Dec/0148.html > including question of whether it ought to be expressed using > css3-namespace, > http://lists.w3.org/Archives/Public/semantic-web/2008Dec/0175.html > > chers, > > Dan > > -- > http://danbri.org/ > My question is: how often can I trust such a file specifies what your document really means, without evaluating its content? I'd distinguish two cases (not pretendig to make a complete classification), - The semantics described by metadata is used for server-side computations: there's no need to evaluate content (since I'm trusting to you when navigating your site, and it's unlikely to find you purposedly messing with yourself), as well as to have client-side support for such metadata (by the UA). This is the case of a centralised database. For instance, a *pedia page may send queries to the server, which elaborates them and sends results back the the user. - The UA must understand metadata and automatically gather informations meshed-up in a page from several sources: each source must be actively evaluated and trusted (a bot can't do such). This is the case of a decentralized database. For instance, that's easy to think of a spamming advertiser who apparently puts honest content into your pages (which maybe take reliable content from dbpedia), whereas he uses fake metadata to cheat my browser and send me irrelevant informations (or infos I'm not interested in) when I ask for related content [1], perhaps without you even guessing what's going on (and you may be loosing visitors because of that). For obvious reasons, a trust evaluation mechanism can't be as easy as getting/creating a signature to be used in a secure connection, because someone must actively evaluate at least two things: - the metadata really reflects a resource content, and - the metadata is properly used with respect to an external schema involved to model data (otherwise, no relationship would be reliable -- however, such might be a minor concern from a certain angle, since misused metadata might be less harmful than deliberately abused ones). The result can be very expensive (as certifying a driver or an application for a certain platform), or lead to a free choice to avoid any evaluation and instead to trust to any third parties. Both solutions may work, perhaps, for niche/limited cases, but I don't think such may be a good base for a "global" - and general purpose - automation. [1] That's not the same as using the @rel attribute without any relationship with other metadata: a UA may just provide a link somehow described as pointing to a related resource with respect to the surrounding content, so that I can choose to follow such a link or not; if the @rel attribute is used by an automated mechanism in response to a query and with respect to other metadata, the UA must decide on its own if a link is worth to be followed or not, and I don't think there is any easy way to take automated decisions involving trust. Best regards, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Incrementa la visibilita' della tua azienda con l'invio di newsletter e campagne email marketing. * Con investimento di soli 250 Euro puoi incrementare la tua visibilita' Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8350&d=3-1
attached mail follows:
On Jan 3, 2009, at 17:05, Dan Brickley wrote: > But perhaps a more practical concern is that it unfairly biases > things towards popular languages - lucky English, lucky Spanish, > etc., and those that lend themselves more to NLP analysis. The Web > is for everyone, and people shouldn't be forced to read and write > English to enjoy the latest advances in Web automation. Some languages are higher in the pecking order than others when software development is prioritized, and RDFa cannot level the playing field here. Suppose there's a use case that can be satisfactorily addressed by applying NLP heuristics to content for the top-tier languages. Even if there were an RDF mechanism for addressing the same use case without relying on natural language, software aimed for serving the top-tier languages would still do the NLP thing for the use case. Thus, the development of the parallel RDF-based solution would be borne by the communities using the other languages. If the other languages can't get the users of the top-tier languages to use the same technical solution, they are still at a disadvantage even if an alternative technology stack is theoretically possible, because most software development effort goes into what makes sense for the top-tier languages without the results being applicable also for the other languages. Instead of bearing the cost of developing a totally alternative technology stack for the other languages without benefiting from any spillover from the effort done for the top-tier languages, it makes more sense to invest the effort into building upon the reusable parts already developed for the top-tier languages. (Quick case study about language-sensitive technology adoption and markets: When movable type was developed, a *subset* of the alphabet used for German--the native language of printing press suppliers--was adopted for Finnish. Today, hundreds of years later, digital font availability for Finnish is better than font availability for languages of comparable installed base that adopted *extensions* for the alphabet used for German or that used a totally different script. That is, NIH *still* hasn't caught up with the first-mover advantage as far as type goes.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
On Mon, 05 Jan 2009 00:17:39 +1100, Henri Sivonen <hsivonen@iki.fi> wrote: > On Jan 3, 2009, at 17:05, Dan Brickley wrote: > >> But perhaps a more practical concern is that it unfairly biases things >> towards popular languages - lucky English, lucky Spanish, etc., and >> those that lend themselves more to NLP analysis. The Web is for >> everyone, and people shouldn't be forced to read and write English to >> enjoy the latest advances in Web automation. > > Some languages are higher in the pecking order than others when software > development is prioritized, and RDFa cannot level the playing field here. > > Suppose there's a use case that can be satisfactorily addressed by > applying NLP heuristics to content for the top-tier languages. Even if > there were an RDF mechanism for addressing the same use case without > relying on natural language, software aimed for serving the top-tier > languages would still do the NLP thing for the use case. No. There is no reason for most developers to prefer one over the other under the circumstances described. Clearly Google has an investment in text-harvesting in a bunch of languages. Equally clearly its competitors who are more sucessfeul in various languages (Yandex, Baidu, etc) have an investment in the technology they use. But developing a new indexing process, there is no a priori reason to favour NLP over some other technique that is also satisfactory, and if you happen to be interested in a global market, it makes sense to develop a system that can be more easily adapted, other things being equal. ... > Instead of bearing the cost of developing a totally alternative > technology stack for the other languages without benefiting from any > spillover from the effort done for the top-tier languages, it makes more > sense to invest the effort into building upon the reusable parts already > developed for the top-tier languages. Except that it turns out that the re-usable parts of most search engines, for the general developer, are pretty limited. Whereas the re-usable parts of the RDF stack are numerous, available for many different platforms, from GPL open source to bespoke commercial closed-source and everything between. All this does not necessarily establish the case for using RDF in HTML, it is just meant to demonstrate that this particular case *against* doesn't seem to be established, to me. cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
attached mail follows:
Charles McCathieNevile ha scritto: >>> The results of the first set of Microformats efforts were some pretty >>> cool applications, like the following one demonstrating how a web >>> browser could forward event information from your PC web browser to >>> your >>> phone via Bluetooth: >>> >>> http://www.youtube.com/watch?v=azoNnLoJi-4 >> >> It's a technically very interesting application. What has the adoption >> rate been like? How does it compare to other solutions to the problem, >> like CalDav, iCal, or Microsoft Exchange? Do people publish calendar >> events much? There are a lot of Web-based calendar systems, like >> MobileMe >> or WebCalendar. Do people expose data on their Web page that can be used >> to import calendar data to these systems? > > In some cases this data is indeed exposed to Webpages. However, > anecdotal evidence (which unfortunately is all that is available when > trying to study the enormous collections of data in private intranets) > suggests that this is significantly more valuable when it can be done > within a restricted access website. > > ... >>> In short, RDFa addresses the problem of a lack of a standardized >>> semantics expression mechanism in HTML family languages. >> >> A standardized semantics expression mechanism is a solution. The lack >> of a solution isn't a problem description. What's the problem that a >> standardized semantics expression mechanism solves? > > There are many many small problems involving encoding arbitrary data > in pages - apparently at least enough to convince you that the data-* > attributes are worth incorporating. > > There are many cases where being able to extract that data with a > simple toolkit from someone else's content, or using someone else's > toolkit without having to tell them about your data model, solves a > local problem. The data-* attributes, because they do not represent a > formal model that can be manipulated, are insufficient to enable > sharing of tools which can extract arbitrary modelled data. > That's because the data-* attributes are meant to create custom models for custom use cases not (necessarily) involving interchange and (let me say) "agnostic extraction" of data. However, data-* attributes might be used to "emulate" support for RDFa attributes, so that each one might be mapped to, let's say, a "data-rdfa-<attribute>" one and viceversa (I don't think "data-rdfa-about" vs "about" would make a great difference, at least in a test phase, since it wouldn't be much different from "rdfa:about", which might be used to embed RDFa attributes in a somewhat xml language (e.g. an "external" markup embedded in a xhtml document through the extension mechanism)). Since it seems there are several problems which may be addressed (beside other, more custom models) by RDFa for organization-wide internal use and intranet publication, without the explicit requirement of external interchange, when both HTML5 specific features and RDFa attributes are felt as necessary, it shouldn't be too difficoult to create a custom parser, comforming to RDFa spec and availing of data-* attributes, to be plugged in a certain browser supporting html5 (and data-*) for internal test first, then exposed to the community, so that html5+rdfa can be tested on a wider scale (especially once alike parsers are provided for all main browsers), looking for a widespread adoption to point out an effective need to merge RDFa into HTML5 spec (or to standardize an approach based on data-* attributes). That is, since RDFa can be "emulated" somehow in HTML5 and tested without changing current specification, perhaps there isn't a strong need for an early adoption of the former, and instead an "emulated" mergence might be tested first within current timeline. >> What is the cost of having different data use specialised formats? > > If the data model, or a part of it, is not explicit as in RDF but is > implicit in code made to treat it (as is the case with using scripts > to process things stored in arbitrarily named data-* attributes, and > is also the case in using undocumented or semi-documented XML formats, > it requires people to understand the code as well as the data model in > order to use the data. In a corporate situation where hundreds or tens > of thousands of people are required to work with the same data, this > makes the data model very fragile. > I'm not sure RDF(a) solves such a problem. AIUI, RDFa just binds (xml) properties and attributes (in the form of curies) to RDF concepts, modelling a certain kind of relationships, whereas it relies on external schemata to define such properties. Any undocumented or semi-documented XML formats may lead to misuses and, thus, to unreliably modelled data, and it is not clear to me how just creating an explicit relationship between properties is enough to ensure that a property really represents a subject and not a predicate or an object (in its wrongly documented schema), if the problem is the correct definition of the properties themselves. Perhaps it is enough to parse them, and perhaps it can "inspire" a better definition of the external schemata (if the RDFa "vision" of data as triples is suitable for the effective data to model), but if the problem is the right understanding of "what represents what" because of a lack in documentations, I think that's something RDF/RDFa can't solve. I think the same applies to data-* attributes, because _they_ describe data (and data semantics) in a custom model and thus _they_ need to be documented for others to be able to manipulate them; the use of a custom script rather than a built-in parser does not change much from this point of view. > [not clear what the context was here, so citing as it was] >>> > I don't think more metadata is going to improve search engines. In >>> > practice, metadata is so highly gamed that it cannot be relied upon. >>> > In fact, search engines probably already "understand" pages with far >>> > more accuracy than most authors will ever be able to express. >>> >>> You are correct, more erroneous metadata is not going to improve search >>> engines. More /accurate/ metadata, however, IS going to improve search >>> engines. Nobody is going to argue that the system could not be gamed. I >>> can guarantee that it will be gamed. >>> >>> However, that's the reality that we have to live with when introducing >>> any new web-based technology. It will be mis-used, abused and >>> corrupted. >>> The question is, will it do more good than harm? In the case of RDFa >>> /and/ Microformats, we do think it will do more good than harm. >> >> For search engines, I am not convinced. Google's experience is that >> natural language processing of the actual information seen by the actual >> end user is far, far more reliable than any source of metadata. Thus >> from >> Google's perspective, investing in RDFa seems like a poorer investment >> than investing in natural language processing. > > Indeed. But Google is something of an edge case, since they can afford > to run a huge organisation with massive computer power and many > engineers to address a problem where a "near-enough" solution brings > themn the users who are in turn the product they sell to advertisers. > There are many other use cases where a small group of people want a > way to reliably search trusted data. > I think the point with general purpose search engines is another one: natural language processing, whereas being expensive, grants a far more accurate solution than RDFa and/or any other kind of metadata can bring to a problem requiring data must never need to be trusted (and, instead, a data processor must be able to determine data's level of trust without any external aid). Since there is no "direct" relationship between the semantics expressed by RDFa and the real semantics of a web page content, relying on RDFa metadata would lead to widespread cheats, as it was when the keywords meta tag was introduced. Thus, a trust chain/evaluation mechanism (such as the use of signatures) would be needed, and so a general purpose search engine relying on RDFa would seem to be working more as a search directory, where human beings analyse content to classify pages, resulting in a more accurate result, but also in a smaller and very slowly growing database of classified sites (since obviously there will always be far more sites not caring of metadata and/or of making their metadata trusted, than sites using trusted RDFa metadata). (the same reasoning may apply to a local search made by a browser in its local history: results are reliable as far as the expressed semantics is reliable, that is as far as its source is reasonably trusted, which may not be true in general - in general, misuses and deliberate abuses whould be the most common case without a trust evaluation mechanism, which, in turn, would restrict the number of pages where the presence of rdf(a) metadata is really helpful). My concern is that any data model requiring any level of trust to achieve a good-working interoperability may address very small (and niche) use cases, and even if a lot of such niche use cases might be grouped in a whole category consistently addressed by RDFa (perhaps beside other models), the result might not be an enough significant use case fitting actual specification guidelines (which are somehow hostile to (xml) extensibility, as far as I've understood them) -- though they might be changed when and if really needed. Best regards, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Con Meetic trovi milioni di single, iscriviti adesso e inizia subito a fare nuove amicizie Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8290&d=3-1
attached mail follows:
On Sun, 04 Jan 2009 03:51:53 +1100, Calogero Alex Baldacchino <alex.baldacchino@email.it> wrote: > Charles McCathieNevile ha scritto: > ... it shouldn't be too difficoult to create a custom parser, comforming > to RDFa spec and availing of data-* attributes... > > That is, since RDFa can be "emulated" somehow in HTML5 and tested > without changing current specification, perhaps there isn't a strong > need for an early adoption of the former, and instead an "emulated" > mergence might be tested first within current timeline. In principle this is possible. But the data-* attributes are designed for private usage, and introducing a public usage means creating a risk of clashes that pollute RDFa data gathered this way. In other words, this is indeed feasible, but one would expect it to show that the data generated was unreliable (unless privately nobody is interested in basic terms like about). Such results have been used to suggest that poorly implemented features should be dropped, but this hypothetical case suggests to me that the argument is wrong, and that if in the face of reasons why the data would be bad people use them, one might expect better usage by formalising the status of such features and getting decent implementations. >>> What is the cost of having different data use specialised formats? >> >> If the data model, or a part of it, is not explicit as in RDF but is >> implicit in code made to treat it (as is the case with using scripts to >> process things stored in arbitrarily named data-* attributes, and is >> also the case in using undocumented or semi-documented XML formats, it >> requires people to understand the code as well as the data model in >> order to use the data. In a corporate situation where hundreds or tens >> of thousands of people are required to work with the same data, this >> makes the data model very fragile. >> > > I'm not sure RDF(a) solves such a problem. AIUI, RDFa just binds (xml) > properties and attributes (in the form of curies) to RDF concepts, > modelling a certain kind of relationships, whereas it relies on external > schemata to define such properties. Any undocumented or semi-documented > XML formats may lead to misuses and, thus, to unreliably modelled data, ... > I think the same applies to data-* attributes, because _they_ describe > data (and data semantics) in a custom model and thus _they_ need to be > documented for others to be able to manipulate them; the use of a custom > script rather than a built-in parser does not change much from this > point of view. RDFa binds data to RDF. RDF provides a well-known schema language with machine-processable definition of vocabularies, and how to merge information between them. In other words, if you get the underlying model for your data right enough, people will be able to use it without needing to know what you do. Naturally not everyone will get their data model right, and naturally not all information will be reliable anyway. However, it would seem to me that making it harder to merge the data in the first place does not assist in determining whether it is useful. On the other hand, certain forms of RDF data such as POWDER, FOAF, Dublin Core and the like have been very carefully modelled, and are relatively well-known and re-used in other data models. Making it easy to parse this data and merge it, according to the existing well-developed models seems valuable. >> Ian wrote: >>> For search engines, I am not convinced. Google's experience is that >>> natural language processing of the actual information seen by the >>> actual end user is far, far more reliable than any source of metadata. >>> Thus from Google's perspective, investing in RDFa seems like a poorer >>> investment than investing in natural language processing. >> >> Indeed. But Google is something of an edge case, since they can afford >> to run a huge organisation with massive computer power and many >> engineers to address a problem where a "near-enough" solution brings >> themn the users who are in turn the product they sell to advertisers. >> There are many other use cases where a small group of people want a way >> to reliably search trusted data. >> > > I think the point with general purpose search engines is another one: > natural language processing, whereas being expensive, grants a far more > accurate solution than RDFa and/or any other kind of metadata can bring > to a problem requiring data must never need to be trusted (and, instead, > a data processor must be able to determine data's level of trust without > any external aid). No, I don't think so. Google searches based on analysis of the open web are *not* generally more reliable than faceted searches over a reliable dataset, and in some instances are less reliable. The point is that only a few people can afford to invest in being a general-purpose search engine, whereas many can afford to run a metadata-based search system over a chosen dataset, that responds to their needs (and doesn't require either publishing their data, or paying Google to index it). > Since there is no "direct" relationship between the semantics expressed > by RDFa and the real semantics of a web page content, relying on RDFa > metadata would lead to widespread cheats, as it was when the keywords > meta tag was introduced. Sure. There would also be many many cases of organisations using decent metadata, as with existing approaches. My point was that I don't expect Google to naively trust metadata it finds on the open web, and in the general case probably not even to look at it. However, Google is not the measure of the Web, it is a company that sells advertising based on information it has gleaned about users by offering them services. So the fact that some things on the Web are not directly beneficial to Google isn't that important. I do not see how the presence of explicit metadata threatens google any more than the presence of plain text (which can also be misleading). > Thus, a trust chain/evaluation mechanism (such as the use of signatures) > would be needed, Indeed such a thing is needed for a general purpose search engine. But there are many cases where an alternative is fine. For example, T-mobile publish POWDER data about web pages. Opera doesn't need to believe all the POWDER data it finds on the Web in order to improve its offerings based on T-mobile's data, if we can decide how to read that specific data. Which can be done by deciding that we trust a particular set of URIs more than others. No signature necessary, beyond the already ubiquitous TLS and the idea that we trust people we have a relationship with and whose domains we know. > My concern is that any data model requiring any level of trust to > achieve a good-working interoperability may address very small (and > niche) use cases, and even if a lot of such niche use cases might be > grouped in a whole category consistently addressed by RDFa (perhaps > beside other models), the result might not be an enough significant use > case fitting actual specification guidelines (which are somehow hostile > to (xml) extensibility, as far as I've understood them) -- though they > might be changed when and if really needed. A concern of mine is that it is unclear what the required level of usefulness is. The "google highlight" element (once called m but I think it changed its name again) is currently in the spec, the longdesc attribute currently isn't. I presume these facts boil down to judgement calls by the editor while the spec is still an early draft, but it is not easy to understand what information would determine whether something is "sufficiently important". Which makes it hard to determine whether it is worth the considerable investment of discussing in this group, or easier to just go through the W3C process of objecting later on. cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
attached mail follows:
Calogero Alex Baldacchino wrote: > ... > This is why I was thinking about somewhat "data-rdfa-about", > "data-rdfa-property", "data-rdfa-content" and so on, so that, for the > purposes of an RDFa processor working on top of HTML5 UAs (perhaps in a > test phase, if needed at all, of course), an element dataset would give > access to "rdfa-about", instead of just "about", that is using the > prefix "rdfa-" as acting as a namespace prefix in xml (hence, as if > there were "rdfa:about" instead of "data-rdfa-about" in the markup). > ... That clashed with the documented purpose of data-*. *If* we want to support RDFa, why not add the attributes the way they are already named??? > ... > However, AIUI, actual xml serialization (xhtml5) allows the use of > namespaces and prefixed attributes, thus couldn't a proper namespace be > introduced for RDFa attributes, so they can be used, if needed, in > xhtml5 documents? I think such might be a valuable choice, because it > seems to me RDFa attributes can be used to address such cases where > metadata must stay as close as possible to correspondent data, but a > mistake in a piece of markup may trigger the adoption agency or foster > parenting algorithms, eventually causing a separation between metadata > and content, thus possibly breaking reliability of gathered > informations. From this perspective, a parser stopping on the very first > error might give a quicker feedback than one rearranging misnested > elements as far as it is reasonably possible (not affecting, and instead > improving, content presentation and users' "direct" experience, but > possibly causing side-effects with metadata). > ... That would make RDFa as used in XHTML 1.* and RDFa used in HTML 5 incompatible. What for? > ... BR, Julian
attached mail follows:
On Fri, Jan 9, 2009 at 5:46 AM, Julian Reschke <julian.reschke@gmx.de> wrote: > Calogero Alex Baldacchino wrote: >> >> ... >> This is why I was thinking about somewhat "data-rdfa-about", >> "data-rdfa-property", "data-rdfa-content" and so on, so that, for the >> purposes of an RDFa processor working on top of HTML5 UAs (perhaps in a test >> phase, if needed at all, of course), an element dataset would give access to >> "rdfa-about", instead of just "about", that is using the prefix "rdfa-" as >> acting as a namespace prefix in xml (hence, as if there were "rdfa:about" >> instead of "data-rdfa-about" in the markup). >> ... > > That clashed with the documented purpose of data-*. > > *If* we want to support RDFa, why not add the attributes the way they are > already named??? Because the issue is that we don't yet know if we want to support RDFa. That's the whole point of this thread. Nobody's given a useful problem statement yet, so we can't evaluate whether there's a problem we need to solve, or how we should solve it. Alex's suggestion, while officially against spec, has the benefit of allowing RDFa supporters to sort out their use cases through experience. That's the back door into the spec, after all; you don't have to do as much work to formulate a problem statement if you can point to large amounts of people hacking around a current lack, as that's a pretty strong indicator that there *is* a problem needing to be solved. As an added benefit, the fact that there's already multiple independent attempts at a solution gives us a wide pool of experience to draw from in formulating the actual spec, so as to make the use as easy as possible for authors. (An example that comes to mind in this regard is rounded corners. Usually you have to break semantics and put in junk elements to get rounded corners on a flexible box. This became so common that the question of whether or not rounded corners were significant enough to be added in CSS answered itself - people are trying hard to hack the support in, so it's clearly something they want, and thus it's worthwhile to spec a method (the border-radius property) to give them it. It solves a problem that authors, through their actions, made extremely clear, and it does so in a way that is enormously simpler 99% of the time. Win-win.) ~Tj
attached mail follows:
Tab Atkins Jr. wrote: >> *If* we want to support RDFa, why not add the attributes the way they are >> already named??? > > Because the issue is that we don't yet know if we want to support > RDFa. That's the whole point of this thread. Nobody's given a useful > problem statement yet, so we can't evaluate whether there's a problem > we need to solve, or how we should solve it. For the record: I disagree with that. I have the impression that no matter how many problems are presented, the answer is going to be: "not that stone -- fetch me another stone". > Alex's suggestion, while officially against spec, has the benefit of > allowing RDFa supporters to sort out their use cases through > experience. That's the back door into the spec, after all; you don't If something that is against the spec is acceptable, then it's *much* easier to just use the already defined attributes. Better breaking the spec by using new attributes then abusing existing ones. > ... BR, Julian
attached mail follows:
Julian Reschke wrote: >> Because the issue is that we don't yet know if we want to support >> RDFa. That's the whole point of this thread. Nobody's given a useful >> problem statement yet, so we can't evaluate whether there's a problem >> we need to solve, or how we should solve it. > > For the record: I disagree with that. I have the impression that no > matter how many problems are presented, the answer is going to be: "not > that stone -- fetch me another stone". For the record: I completely agree with Julian. This is why I haven't jumped into this thread yet again. The key piece of evidence here is SearchMonkey, a product by Yahoo that specifically uses RDFa. Even its microformat support funnels everything to an RDF-like metadata approach. With thousands of application developers and some concrete examples that specifically use RDFa (the Creative Commons application being one of them), the message from many on this list remains "not good enough." I'm not sure where the bar is, but it seems far from objective. -Ben
attached mail follows:
On Fri, Jan 9, 2009 at 1:48 PM, Ben Adida <ben@adida.net> wrote: > Julian Reschke wrote: >>> Because the issue is that we don't yet know if we want to support >>> RDFa. That's the whole point of this thread. Nobody's given a useful >>> problem statement yet, so we can't evaluate whether there's a problem >>> we need to solve, or how we should solve it. >> >> For the record: I disagree with that. I have the impression that no >> matter how many problems are presented, the answer is going to be: "not >> that stone -- fetch me another stone". > > For the record: I completely agree with Julian. This is why I haven't > jumped into this thread yet again. > > The key piece of evidence here is SearchMonkey, a product by Yahoo that > specifically uses RDFa. Even its microformat support funnels everything > to an RDF-like metadata approach. With thousands of application > developers and some concrete examples that specifically use RDFa (the > Creative Commons application being one of them), the message from many > on this list remains "not good enough." > > I'm not sure where the bar is, but it seems far from objective. Actually, SearchMonkey is an excellent use case, and provides a problem statement. Problem ======= Site owners want a way to provide enhanced search results to the engines, so that an entry in the search results page is more than just a bare link and snippet of text, and provides additional resources for users straight on the search page without them having to click into the page and discover those resources themselves. For example (taken directly from the SearchMonkey docs), yelp.com may want to provide additional information on restaurants they have reviews for, pushing info on price, rating, and phone number directly into the search results, along with links straight to their reviews or photos of the restaurant. Different sites will have vastly different needs and requirements in this regard, preventing natural discovery by crawlers from being effective. (SearchMonkey itself relies on the user registering an add-in on their Yahoo account, so spammers can't exploit this - the user has to proactively decide they want additional information from a site to show up in their results, then they click a link and the rest is automagical.) That really wasn't hard. I'd never seen SearchMonkey before (it's possible it was mentioned, but I know that it was never explicitly described), but it's a really sweet app that helps both authors and users. That's a check mark in my book. ~TJ
attached mail follows:
Tab Atkins Jr. wrote: > Actually, SearchMonkey is an excellent use case, and provides a > problem statement. I'm surprised, but very happily so, that you agree. My confusion stems from the fact that Ian clearly mentioned SearchMonkey in his email a few days ago, then proceeded to say it wasn't a good use case. -Ben
attached mail follows:
On Fri, Jan 9, 2009 at 2:17 PM, Ben Adida <ben@adida.net> wrote: > Tab Atkins Jr. wrote: >> Actually, SearchMonkey is an excellent use case, and provides a >> problem statement. > > I'm surprised, but very happily so, that you agree. > > My confusion stems from the fact that Ian clearly mentioned SearchMonkey > in his email a few days ago, then proceeded to say it wasn't a good use > case. I apologize; looking back into my archives, it appears there was an entire subthread specifically about SearchMonkey! Also, Ian did indeed mention it in his first email in this thread. He actually gave it more attention than any other single use-case, though. I'll quote the relevant part: > On Tue, 26 Aug 2008, Ben Adida wrote: > > > > Here's one example. This is not the only way that RDFa can be helpful, > > but it should help make things more concrete: > > > > http://developer.yahoo.com/searchmonkey/ > > > > Using semantic markup in HTML (microformats and, soon, RDFa), you, as a > > publisher, can choose to surface more relevant information straight into > > Yahoo search results. > > This doesn't seem to require RDFa or any generic data syntax at all. Since > the system is site-specific anyway (you have to list the URLs you wish to > act against), the same kind of mechanism could be done by just extracting > the data straight out of the page. This would have the advantage of > working with any Web page without requiring the page to be written using a > particular syntax. > > However, if SearchMonkey is an example of a use case, then we should > determine the requirements for this feature. It seems, based on reading > the documentation, that it basically boils down to: > > * Pages should be able to expose nested lists of name-value pairs on a > page-by-page basis. > > * It should be possible to define globally-unique names, but the syntax > should be optimised for a set of predefined vocabularies. > > * Adding this data to a page should be easy. > > * The syntax for adding this data should encourage the data to remain > accurate when the page is changed. > > * The syntax should be resilient to intentional copy-and-paste authoring: > people copying data into the page from a page that already has data > should not have to know about any declarations far from the data. > > * The syntax should be resilient to unintentional copy-and-paste > authoring: people copying markup from the page who do not know about > these features should not inadvertently mark up their page with > inapplicable data. > > Are there any other requirements that we can derive from SearchMonkey? I agree with Ian in that SearchMonkey is not *necessarily* speaking in favor of RDFa; that may be what caused you to think he was dismissing it. In truth, Ian is merely trying to take current examples of RDFa use and distill them into their essence. (To grab my previous example, it is similar to seeing what all the various rounded-corners hacks were doing, without necessarily implying that the final solution will be anything like them. It's important to distill the actual problems that users are solving from the details of particular solutions they are using.) Like I said, I think SearchMonkey sounds absolutely awesome, and genuinely useful on a level I haven't yet seen any apps of similar nature reach. I'm exclusively a Google user, but that's something I'd love to have ported over. It's similar in nature to IE8's Accelerators, in that it's an opt-in application for users that reduces clicks to get to information they actively decide they want. However, Ian has a point in his first paragraph. SearchMonkey does *not* do auto-discovery; it relies entirely on site owners telling it precisely what data to extract, where it's allowed to extract it from, and how to present it. It is likely that this can be done entirely within the confines of current html, and the fact that SearchMonkey can use Microformats suggests that this is true. A possible approach is a site-owner producing an ad-hoc microformat (little m) that the crawler can match against pages and index the information of, and then offer to the SearchMonkey application for presentation as the developer wills. This would require specified parsing rules for such things (which, as mentioned in an earlier email, the big-m Microformats community is working on). The question is, would this be sufficient? Are other approaches easier for authors? RDFa, as noted, already has a specified parsing model. Does this make it easier for authors to design data templates? Easier to communicate templates to a crawler? Easier to deploy in a site? Easier to parse for a crawler? SearchMonkey makes mention of developers producing SearchMonkey apps without the explicit permission of site owners. This use would almost certainly be better served with a looser data discovery model than RDFa, so that a site owner doesn't have to explicitly comply in order for others to extract useful data from their pages. How important is this? These are precisely the sort of questions I think Ian wants and needs asked. SearchMonkey is an awesome app; do we need to do anything to support it and similar apps? *Can* anything we do support it, or is it best served by solutions that ignore us completely? Yes, SearchMonkey operates on metadata, and the problem space doesn't allow natural-language processing to stand in for it; it is not clear, though, that a strict markup approach is best for authors or users. Nevertheless, it is an excellent use-case to distill requirements from so we *can* determine if a spec-based solution is desirable. ~TJ
attached mail follows:
Tab Atkins Jr. wrote: > However, Ian has a point in his first paragraph. SearchMonkey does > *not* do auto-discovery; it relies entirely on site owners telling it > precisely what data to extract, where it's allowed to extract it from, > and how to present it. That's incorrect. You can build a SearchMonkey infobar that is set to function on all URLs (just use "*" in your URL field.) For example, the Creative Commons SearchMonkey application: http://gallery.search.yahoo.com/application?smid=kVf.s (currently broken because of a recent change in the SearchMonkey PHP API that we need to address, so here's a photo: http://www.flickr.com/photos/ysearchblog/2869419185/ ) By adding the CC RDFa markup to your page, it will show up with the infobar in Yahoo searches. So site-specific microformats are clearly less powerful. And vocabulary-specific microformats, while useful, are also not as useful here (consider a SearchMonkey application that picks up CC-licensed items, be they video, audio, books, scientific data, etc... Different microformats = development hell.) Have you read the RDFa Primer? http://www.w3.org/TR/xhtml-rdfa-primer/ It describes (pre-SearchMonkey) the kind of applications that can be built with RDFa. SearchMonkey is an ideal example, but it's by no means the only one. -Ben
attached mail follows:
On Fri, Jan 9, 2009 at 3:22 PM, Ben Adida <ben@adida.net> wrote: > Tab Atkins Jr. wrote: >> However, Ian has a point in his first paragraph. SearchMonkey does >> *not* do auto-discovery; it relies entirely on site owners telling it >> precisely what data to extract, where it's allowed to extract it from, >> and how to present it. > > That's incorrect. > > You can build a SearchMonkey infobar that is set to function on all URLs > (just use "*" in your URL field.) > > For example, the Creative Commons SearchMonkey application: > > http://gallery.search.yahoo.com/application?smid=kVf.s > > (currently broken because of a recent change in the SearchMonkey PHP API > that we need to address, so here's a photo: > > http://www.flickr.com/photos/ysearchblog/2869419185/ > ) > > By adding the CC RDFa markup to your page, it will show up with the > infobar in Yahoo searches. Ah, hadn't considered a net-wide SearchMonkey script. Interesting. This brings up different issues, however. Something I see immediately: Say I'm a scammer. I know that the CC SearchMonkey app is in wide use (pretend, here). I start putting CC-RDF data in spam blog comments, with my own spammy stuff in the relevant fields. Now people don't even have to click on the blog link in the search results and read my obviously spammy comment to be introduced to my offers for discount Viagra! They'll just see a little CC bar, click on it to have it open in-place, and there I am. I could even hide my link in legitimate license data, so that people only hit my malicious site when they click the link to see more information about the license. Issues like these make wide-scale auto-trusted use of metadata difficult. It also makes me more reluctant to want it in the spec yet. I'd rather see the community work out these problems first. It may be that there's a relatively simple solution. It may be that the crawlers can reliably distinguish between ham and spam CC data. But then, it may be that there *is* no good solution enabling us to use this approach, and this kind of metadata on arbitrary sites just can't be trusted. I, personally, don't know the answer to this yet. I suspect that you don't, either; if the arbitrary-site CC infobar works at all, it's because few people *use* CC RDF yet, and so it's still limited to a community with implicit trust. > So site-specific microformats are clearly less powerful. And > vocabulary-specific microformats, while useful, are also not as useful > here (consider a SearchMonkey application that picks up CC-licensed > items, be they video, audio, books, scientific data, etc... Different > microformats = development hell.) Indeed, they are less powerful. As I explored above, though, too much power can be damning. It may be that the site-specific little-m microformat (or something equivalent, allowing a developer to extract metadata through actively targeting site structure) is powerful enough to be useful, but weak enough to *remain* useful in the face of abuse. (Also, I know CC is sort of the darling of the RDFa community, but there's significant enough debate over in-band vs out-of-band licensing info, etc. that detracts from the core issues we're trying to discuss here that it's probably not the best example to use.) > Have you read the RDFa Primer? > http://www.w3.org/TR/xhtml-rdfa-primer/ > > It describes (pre-SearchMonkey) the kind of applications that can be > built with RDFa. SearchMonkey is an ideal example, but it's by no means > the only one. Yup; I was an active participant in this discussion when it started last August. The example applications discussed in the paper, unfortunately, are precisely the kind where trusting metadata is likely a *bad* idea. For example, finding reviews of shows produced by friends of Alice, using foaf and hreview, is rife with opportunity for spamming. SearchMonkey seems to avoid this for the most part; when designing applications for particular URLs, at least, you are relying on relatively trustworthy data, not arbitrary data scattered across the web. Perhaps something similar has application within trusted networks, but in that case it comprises a completely different use case than what SearchMonkey hits, with possibly different requirements. ~TJ
attached mail follows:
On Fri, Jan 9, 2009 at 5:13 PM, Ben Adida <ben@adida.net> wrote: > Tab Atkins Jr. wrote: >> This brings up different issues, however. > > Is inherent resistance to spam a condition (even a consideration) for > HTML5? If so, where is the concern around <title>, which is clearly > featured in search engine results? Well, it's something that we probably want to keep in mind, because it's so relevant for the success of any such proposal. I wouldn't want to lend support to a feature that turned out to be immediately useless due to spam. Lot of wasted effort on the WG's, Ian's, and possibly browser developer's part. To answer your specific question, <title> is under the control of the site author, and search engines already have elaborate methods to tell a spammy site from a hammy one, thus downranking them. On the other hand, the hypothetical attack scenario I outlined was about metadata that could be added to the page by external parties. If we were today discussing adding <title> to HTML5 to help search engines provide a short summary of a page, and part of the proposal might allow blog commenters to change the title of pages on a whim, I'd certainly be equally concerned. ^_^ ~TJ
attached mail follows:
Tab Atkins Jr. wrote: > To answer your specific question, <title> is under the control of the > site author, and search engines already have elaborate methods to tell > a spammy site from a hammy one, thus downranking them. And RDFa is also entirely under the control of the site author. > On the other hand, the hypothetical attack scenario I outlined was > about metadata that could be added to the page by external parties. I thought your attack concerned both author markup and commenter markup. But it seems we agree on author markup: no additional risk there. So on to commenter markup. Most blogging software already white-lists the HTML elements and attributes they allow, otherwise they are easily hacked with XSS. This means that, by default, most blogging software will strip RDFa from comments, which is exactly the right approach, since comments should not have authority over the structured data of the page. -Ben
attached mail follows:
Ian Hickson wrote: > We have to make sure that whatever we specify in HTML5 actually is going > to be useful for the purpose it is intended for. If a feature intended for > wide-scale automated data extraction is especially susceptible to spamming > attacks, then it is unlikely to be useful for wide-scale automated data > extraction. It's no more susceptible to spam than existing HTML, as per my previous response. > Nobody is suggesting that user agents derive any behavior from <title>, so > it doesn't matter if <title> is spammed or not. And RDFa does not mandate any specific behavior, only the ability to express structure. The power lies in products like SearchMonkey that make use of this structure with innovative applications. Can one imagine tools that make poor use of this structured data so that they incentivize spam? Absolutely. Is this the bar for HTML5? If bad or poorly conceived applications can be imagined, then it's not in the standard? > It is less likely for a user to intentionally visit a > spammy page than for a user to visit a page that happens to contain spammy > content embedded within it (e.g. in blog comments). You've done plenty of web security work, and I suspect you know well that spammy RDFa is the least in a large set of problems that come with accepting arbitrary markup in blog comments. This is a strawman. > However, browsers don't do this kind of processing -- > indeed, this kind of processing appears to be exactly what RDFa proponents > are trying to enable (though to what end, I'm still trying to find out, > since nobody has actually replied to all the questions I asked yet [1]). While client-side processing is indeed an important use case (Ubiquity, Fuzzbot, etc...), it's not the only one. SearchMonkey, which you continue to ignore, is an important use case. Before I invest significant time in responding to your barrage of questions, I'm looking for a hint of objective evaluation on your end. I thought I saw an opportunity for productive discussion based on common ground with SearchMonkey, but this has led again into a new and close-to-bogus reason for blocking consideration of RDFa. > Note that search engines aren't the problem here Actually, we were discussing SearchMonkey, so I think it's very much the context for this sub-thread. You continue to ignore SearchMonkey, for reasons which, as I've pointed out in a response earlier today, are factually incorrect. -Ben
attached mail follows:
Ben Adida ha scritto: > Ian Hickson wrote: > >> We have to make sure that whatever we specify in HTML5 actually is going >> to be useful for the purpose it is intended for. If a feature intended for >> wide-scale automated data extraction is especially susceptible to spamming >> attacks, then it is unlikely to be useful for wide-scale automated data >> extraction. >> > > It's no more susceptible to spam than existing HTML, as per my previous > response. > > Perhaps this is why general purpose search engines do not rely (entirely) on metadata and markup semantics to classify content, nor does Yahoo with SearchMonkey. SearchMonkey documentation points out that metadata never affects page ranks, nor is semantics interpreted for any purpose; metadata only affects additional informations presented to the user at the user will, and if the user chose to get informations of a certain kind (gathered by a certain data service), thus spammy metadata can be thought as circumscribed in this case, they might corrupt SearchMonkey additional data, but not the user's overall experience with the search engine. From this point of view, SearchMonkey is some kind of wide-range but small-scale use case (with respect to each tool and each site the user might enable), because the user can easily choose which sources to trust (e.g. which data services to use, or which sites to look for additional infos), and in any case he can get enough infos without metadata. On the other hand, a client UA implementing a feature entirely based on metadata couldn't easily circumscribe abused metadata and bring valid informations to the user attention, nor could the average user take easily trusted and spammy sites apart, because he wouldn't understand the problem (and a site with spammy metadata might still contain informations users were interested in previously, or in a different context), whereas in SearchMonkey the average user would notice something doesn't work in enhanced results, but he'd also get the basic infos he was looking for. Thus there are different requirements to be taken into account for different scenarios (SearchMonkey and client UA are such different scenarios) Moreover, SearchMonkey is a kind of centralised service based on distributed metadata, it doesn't need collaboration by any other UA (that is, it doesn't need support for metadata in other software) by default (whereas it allows custom data services to autonomously extract metadata, but always for the purposes of SearchMonkey), it only requires that web sites adhering to the project (or just willing to provide additional infos) embed some kind of metadata only for the purpose of making them available to SearchMonkey services, or at least that authors create appropriate metadata and send them to Yahoo (in the form of dataRSS embedded in a Atom document). That is, SearchMonkey seems to me a clear example of a use case for metadata not requiring any changes to html5 spec, since any kind of supported metadata are used by SearchMonkey as if they were custom, private metadata; whatever happens to such metadata client-side, even if they're just stripped by a browser, doesn't really matter. Furthermore, SearchMonkey supports several kinds of metadata, not only RDFa, but also eRDF, microformats and dataRSS external to the document. So, why should SearchMonkey be the reason to introduce explicit support to RDFa and not also for eRDF, which doesn't require new attributes, but just a parser? One might think one solution is better than the other, and this might be true in theory, but what really counts is what people do find easier to use, and this might be determined by experience with SearchMonkey (that is, let's see what people use more often, then decide what's more needed). Moreover, RDFa is thought for xhtml, thus it can't be introduced in html serialization just by defining a few new attributes: a processor would or might need some knowledge over /namespaces/, thus the whole "family" of *xmlns* attributes (with and without prefixes) should be specified for use with the html serialization, unless an alternative mechanism, similar to the one chosen for eRDF, were defined, and maybe such would result in a new, hybrid mechanism (stitching together pieces from eRDF and RDFa). Buf if we introduce xmlns and xmlns:<prefix> into html serialization, why not also prefixed attributes? That is, can RDFa be introduced into html serialization "as is", without resorting to the whole xml extensibility? This should be taken into account as well, because just adding new attributes to the language might work fine for xml-serialized documents, but might not for html-serialized ones. This means RDFa support might be more difficult than it may seem at first glance, whereas it might not be needed for custom and/or small scale use cases (and I think SearchMonkey is one such case). >> Nobody is suggesting that user agents derive any behavior from <title>, so >> it doesn't matter if <title> is spammed or not. >> > > And RDFa does not mandate any specific behavior, only the ability to > express structure. The power lies in products like SearchMonkey that > make use of this structure with innovative applications. > > Can one imagine tools that make poor use of this structured data so that > they incentivize spam? Absolutely. Is this the bar for HTML5? If bad or > poorly conceived applications can be imagined, then it's not in the > standard? > > I think the right question should be whether there are effective counter measures to circumscribe bad uses and make possible damages less significant then advantages from good uses. When a feature in the standard is thought to be a possible security (or privacy) issue, counter-measures are proposed. Since spam is a possible immediate issue for abused metadata, especially in wide-scale and automated data extraction, we should also think to possible counter-measures to be specc'ed out along with RDFa attributes. WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Innammorarsi è facile con Meetic, milioni di single si sono iscritti, si sono conosciuti e hanno riscoperto l'amore. Tutto con Meetic, prova anche tu! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8292&d=10-1
attached mail follows:
On 10/1/09 00:37, Ian Hickson wrote: > On Fri, 9 Jan 2009, Ben Adida wrote: >> Is inherent resistance to spam a condition (even a consideration) for >> HTML5? > > We have to make sure that whatever we specify in HTML5 actually is going > to be useful for the purpose it is intended for. If a feature intended for > wide-scale automated data extraction is especially susceptible to spamming > attacks, then it is unlikely to be useful for wide-scale automated data > extraction. I've been looking at such concerns a bit for RDFa. One issue (shared with HTML in general I think) is user-supplied content, eg. blog comments and 'rel=nofollow' scenarios). Is there any way in HTML5 to indicate that a whole chunk of Web page is from an (in some to-be-defined sense) untrusted source? I see http://www.whatwg.org/specs/web-apps/current-work/#link-type-nofollow "The nofollow keyword indicates that the link is not endorsed by the original author or publisher of the page, or that the link to the referenced document was included primarily because of a commercial relationship between people affiliated with the two pages." While I'm unsure about the "commercial relationship" clause quite capturing what's needed, the basic idea seems sound. Is there any provision (or plans) for applying this notion to entire blocks of markup, rather than just to simple hyperlinks? This would be rather useful for distinguishing embedded metadata that comes from the page author from that included from blog comments or similar. Thanks for any pointers, cheers, Dan -- http://danbri.org/
attached mail follows:
Ben Adida ha scritto: > Tab Atkins Jr. wrote: > >> Actually, SearchMonkey is an excellent use case, and provides a >> problem statement. >> > > I'm surprised, but very happily so, that you agree. > > My confusion stems from the fact that Ian clearly mentioned SearchMonkey > in his email a few days ago, then proceeded to say it wasn't a good use > case. > > -Ben > > It seems to me that's a very custom use case - though requiring metadata to be embedded in a big number of pages, but that's an optional requirement, because search results don't rely only on metadata - since metadata are used as an optional source for informations by the server and don't require any collaboration by other kinds of UA (excluding, at most, some custom data services - whereas, for instance, a search engine using the mark element to highlight a keyword would require a client UA to understand and style it properly -- I expect it not to be working on IE6, for instance, because IEx browsers deal with unknown elements as if their content where misplaced). That is, Yahoo might develop his own data model and work fine with sites implementing it; perhaps RDF(a) was chosen because they might think RDF is a natural way to model data which are sparse in a web page (and re-mapping microformats on RDF might result in an easier implementation); anyway, in this case the only UA needing to understand RDFa, in this case, is SearchMonkey itself, thus a client browser might just drop RDFa attributes without breaking SearchMonkey functionalities -- at least, this is my first impression. Furthermore, it's a very recent (yet potentially interesting) application, so why not to wait and see how it grows, if the opt-in mechanism will effectively prevent spam (e.g. spammers might model data basing on widely diffused vocabularies and data services, and find a way to make such data available in searches when users asks for additional infos, for instance through an ad within a page of an accomplice author, or exploiting some kind of errors in authors' selection of URLs to be crawled for metadata, or the alike), or just which model become the most used among RDFa, eRDF, Microformats, Atom embedding dataRSS and whatever else Yahoo might decide to support, before choosing to include one or the other into html5 specification (or to include each one because equally diffused)? Moreover, it seems that some xml processing is needed to create a custom data service, thus it might be natural to use xhtml (possibly along with namespaces and prefixed attributes) to provide metadata to such a data service, which might rely on an xml parser instead of implementing one from scratch (and html parser might not support namespaces for the purpose to expose them through DOM interfaces, as I understand html serialization) -- the use of prefixed RDFa attributes, or perhaps even unprefixed ones, within an xml-serialized document, shouldn't require a formalization in html5 spec, as far as there is no strict requirement for UAs to support RDF processing - as it is for the purposes of SearchMonkey and its related data services. WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8551&d=9-1
attached mail follows:
On Sat, 10 Jan 2009 06:41:10 +1100, Julian Reschke <julian.reschke@gmx.de> wrote: > Tab Atkins Jr. wrote: >>> *If* we want to support RDFa, why not add the attributes the way they >>> are >>> already named??? >> Because the issue is that we don't yet know if we want to support >> RDFa. That's the whole point of this thread. Nobody's given a useful >> problem statement yet, so we can't evaluate whether there's a problem >> we need to solve, or how we should solve it. > > For the record: I disagree with that. I have the impression that no > matter how many problems are presented, the answer is going to be: "not > that stone -- fetch me another stone". There does appear to be some of this. I have no idea if that is just an impression or the truth. Hence my continued following of the thread. >> Alex's suggestion, while officially against spec, has the benefit of >> allowing RDFa supporters to sort out their use cases through >> experience. That's the back door into the spec, after all; you don't > > If something that is against the spec is acceptable, then it's *much* > easier to just use the already defined attributes. Better breaking the > spec by using new attributes then abusing existing ones. Indeed. I the data-* attributes had some reserved values, then one might expect people to invest in them on the scale that they have typically made RDF investments. But then there would be no need to change the attribute names at all (nor, for that matter, to put much effort into other attribute names following the design pattern. It just becomes another approach to namespaces with another centralisation process required). The question is what would convince the editors of the spec that there is in fact a use case for RDF in HTML which is what has led to the request to include RDFa (a form of RDF carefully designed to fit into HTML). cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
attached mail follows:
Julian Reschke ha scritto:
> Calogero Alex Baldacchino wrote:
>> ...
>> This is why I was thinking about somewhat "data-rdfa-about",
>> "data-rdfa-property", "data-rdfa-content" and so on, so that, for the
>> purposes of an RDFa processor working on top of HTML5 UAs (perhaps in
>> a test phase, if needed at all, of course), an element dataset would
>> give access to "rdfa-about", instead of just "about", that is using
>> the prefix "rdfa-" as acting as a namespace prefix in xml (hence, as
>> if there were "rdfa:about" instead of "data-rdfa-about" in the markup).
>> ...
>
> That clashed with the documented purpose of data-*.
Hmm, I'm not sure there is a clash, since I was suggesting a *custom*
and essentially *private* mechanism to experiment with RDFa in
conjunction with HTML serialization, for the *small-scale* needs of some
organizations willing to embed RDFa metadata in text/html documents, and
to exchange them with each other by using a convention likely avoiding
name clashes with other private metadata. Since I think it's unlikely to
find data-rdfa-* used with different semantics in the very same page,
and in a small-scale scenario involving a few *selected* sources for
RDFa-modelled information, it should be likely to know in advance that
someone else is using the same conventions. Such a modelled document
might be used in conjunction with an external RDFa processor, thus
avoiding any direct support in a browser.
However, such a convention might be enough "clash-free" to work on a
wider scale, thus it might become widespread and provide an evidence
that the web /needs/, or at least /has chosen/ to use RDFa as (one of)
the most common way to embed metadata in a document, and such might be
enough to add a native support for the whole range of "RDFa" attributes,
eventually along with support for earlier experimental ones (such as
"data-rdfa-*" and "rdfa:*" ones, for backward compatibility). And
actually I can't see much of a problem if a private-born feature became
the base of a widespread and widely accepted convention (I'm not saying
the spec should name data-rdfa-* as a mean to implement RDFa, instead I
think that, if a general agreement on if and how RDFa must be spec'ed
out and implemented can't be found, such an experiment might be proposed
to the semantic web industry and wait for the results - given a lack in
support might prevent any interested party to use RDFa and HTML5
altogether).
>
> *If* we want to support RDFa, why not add the attributes the way they
> are already named???
>
For instance, to experiment whether it is worth to change the "if we
want" into "we do want", without requiring an early implementation and
specification, nor relying on if and what a certain browser vendor might
want to experiment differently from others (such a convention would only
require support for HTML5 datasets and a script or a plugin capable to
handle them as representing RDFa metadata). -- the point here is that
after introducing data-* attributes as a mean to support custom
attributes any browser vendors might decide to drop support for other
kind of custom attributes in html serialization (that is, for attributes
being neither part of the language nor data-* ones), therefore if they
(or any of them) decided to avoid to support RDFa attributes until they
were introduced in a specification there might be no mean to experiment
with them (in general, that is cross-browser) without resorting either
to data-* or to "rdfa:*" (the latter in xhtml).
Anyway, /in general/ what should a browser do with RDFa metadata, on a
*wide scale*, other than classifying a portion of the open web (e.g. in
its local history), eventually allowing users to select trusted sources?
Actually, I don't think such would bring enough benefits for *average*
users, compared to the risk to get a lot of spam metadata from
/heterogeneous/ sources. I really don't expect average users to
understand how to filter sites basing on metadata reliability (and just
for the purpose to use a metadata-based query interface, because a site
with wrong metadata might still contain usefull informations); instead
they might just try and use a query interface the same way they use a
default search bar, get wrong results (once spam metadata became
widespread) and decide the mechanism doesn't work fine (eventually
complaining for that). A somewhat antispam filter might help, but I
think that understanding if metadata are reliable, that is if they
really correspond to a web page content, is an odd problem to be solved
by a bot without a good degree of Artificial Intelligence (filtering
emails by looking for suspicious patterns is far easier than
implementing a filter capable to /understand/ metadata, /understand/
natural language and compare /semantics/ ).
As well, I don't expect the great majority of web pages to contain
"valid" metadata: most people would not care of them, and a potentially
growing number might copy&paste code containing metadata from other
sites as a kind of template, then edit the content and ignore any
metadata, thus breaking reliability. I do think wide-scale use of
metadata coming from heterogeneous sources can be more harmful than
useful. *If* we do agree that small-scale needs is the main context
where RDFa can bring benefits, perhaps a custom mechanism and external
plugins are all we need; otherwise, it should be proved that /misused/
and /abused/ metadata can be filtered out *easily* and *automatically*,
without requiring average users to understand the problem, nor affecting
the overall efficiency. IMHO.
>> ...
>> However, AIUI, actual xml serialization (xhtml5) allows the use of
>> namespaces and prefixed attributes, thus couldn't a proper namespace
>> be introduced for RDFa attributes, so they can be used, if needed, in
>> xhtml5 documents? I think such might be a valuable choice, because it
>> seems to me RDFa attributes can be used to address such cases where
>> metadata must stay as close as possible to correspondent data, but a
>> mistake in a piece of markup may trigger the adoption agency or
>> foster parenting algorithms, eventually causing a separation between
>> metadata and content, thus possibly breaking reliability of gathered
>> informations. From this perspective, a parser stopping on the very
>> first error might give a quicker feedback than one rearranging
>> misnested elements as far as it is reasonably possible (not
>> affecting, and instead improving, content presentation and users'
>> "direct" experience, but possibly causing side-effects with metadata).
>> ...
>
> That would make RDFa as used in XHTML 1.* and RDFa used in HTML 5
> incompatible. What for?
>
> > ...
>
> BR, Julian
Because I'm not sure RDFa can work fine with HTML serialization. To
clarify that, let me take and modify an example from W3C Recommendation
(without pretending it to be a good example to build a good worst-case
scenario, but just to give an idea):
[...]
<p>
I'm holding
<span property="cal:summary">
one last summer Barbecue
</span>, to meet friends and have a party before the end of holidays
on
<span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
datatype="xsd:dateTime">
September 16th at 4pm
</span>.
</p>
[...]
Now let consider it written as:
[...]
<p>
I'm holding
<span property="cal:summary">
one last summer Barbecue
<!-- now the </span> close tag is missing here -->,
to meet friends and have a party before the end of holidays
on
<span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
datatype="xsd:dateTime">
September 16th at 4pm
</span>.
</p>
[...]
The above would result in a parse error as an xml-serialized document,
since the document isn't well formed. Instead, as part of an
html-serialized document, the above fragment would be processed anyway,
improving users' experience (with respect to a page stopping rendering
on a missing close tag), but potentially causing metadata to be
imprecisely binded to any data, thus potentially harming automated data
extraction (for some purpose). Therefore, perhaps using such metadata
only inside xml serialized pages might give a quick feedback on such a
problem as soon as the author checked a page appearance (which I think
would be the very first check, as well as I think about no one would
check the _whole_ range of possible queries people might make over a
document, to look for errors).
*If* this is meaningful, supporting RDFa attributes as "rdfa:*" might
ensure that xml serialization is preferred by people really needing to
use this kind of metadata (while leaving a chance to experiment RDFa
with html serialization, because no one can be prohibited to use
data-<prefix>-* for this purpose beside a proper script or plugin),
whereas introducing "about", "property", "content", "datatype" and so on
directly in html namespace, as attributes shared by all elements, would
make the choice of one serialization or the other indifferent, thus
leading to every possible side-effects html serialization may cause.
As a side note, It seems that people from the W3C are evaluating a
resort to extensibility to introduce RDFa attributes into xml-serialized
html documents, and they also have some doubts whether allow use of RDFa
attributes within html serialization or not:
"The HTML WG is encouraged to provide a mechanism to permit
independently developed vocabularies such as Internationalization Tag
Set (ITS), Ruby, and RDFa to be mixed into HTML documents. /Whether this
occurs through the extensibility mechanism of XML, *whether it is also
allowed in the classic HTML serialization*, and whether it uses the DTD
and Schema modularization techniques/, is for the HTML WG to determine."
(from <http://www.w3.org/2007/03/HTML-WG-charter#deliverables>)
WBR, Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Meetic: il leader italiano ed europeo per trovare l'anima gemella online. Provalo ora
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8291&d=9-1
attached mail follows:
Calogero Alex Baldacchino wrote: > That is, choosing a proper level of integration for RDF(a) support into > a web browser might divide success from failure. I don't know what's the > best possible level, but I guess the deepest may be the worst, thus > starting from an external support through out plugins, or scripts to be > embedded in a webbapp, and working on top of other feature might work > fine and lead to a better, native support by all vendors, yet limited to > an API for custom applications There seems to be a bit of confusion over what RDFa can and can't do as well as the current state of the art. We have created an RDFa Firefox plugin called Fuzzbot (for Windows, Linux and Mac OS X) that is a very rough demonstration of how an browser-based RDFa processor might operate. If you're new to RDFa, you can use it to edit and debug RDFa pages in order to get a better sense of how RDFa works. There is a primer[1] to the semantic web and an RDFa basics[2] tutorial on YouTube for the completely un-initiated. The rdfa.info wiki[3] has further information. ---------------- (sent to public-rdfa@w3.org earlier this week): We've just released a new version of Fuzzbot[4], this time with packages for all major platforms, which we're going to be using at the upcoming RDFa workshop at the Web Directions North 2009 conference[5]. Fuzzbot uses librdfa as the RDFa processing back-end and can display triples extracted from webpages via the Firefox UI. It is currently most useful when debugging RDFa web page triples. We use it to ensure that the RDFa web pages that we are editing are generating the expected triples - it is part of our suite of Firefox web development plug-ins. There are three versions of the Firefox XPI: Windows XP/Vista (i386) http://rdfa.digitalbazaar.com/fuzzbot/download/fuzzbot-windows.xpi Mac OS X (i386) http://rdfa.digitalbazaar.com/fuzzbot/download/fuzzbot-macosx-i386.xpi Linux (i386) - you must have xulrunner-1.9 installed http://rdfa.digitalbazaar.com/fuzzbot/download/fuzzbot-linux.xpi There is also very preliminary support for the Audio RDF and Video RDF vocabularies, demos of which can be found on YouTube[6][7]. To try it out on the Audio RDF vocab, install the plugin, then click on the Fuzzbot icon at the bottom of the Firefox window (in the status bar): http://bitmunk.com/media/6566872 There should be a number of triples that show up in the frame at the bottom of the screen as well as a music note icon that shows up in the Firefox 3 AwesomeBar. To try out the Video RDF vocab, do the same at this URL: http://rdfa.digitalbazaar.com/fuzzbot/demo/video.html Please report any installation or run-time issues (such as the plug-in not working on your platform) to me, or on the librdfa bugs page: http://rdfa.digitalbazaar.com/librdfa/trac -- manu [1] http://www.youtube.com/watch?v=OGg8A2zfWKg [2] http://www.youtube.com/watch?v=ldl0m-5zLz4 [3] http://rdfa.info/wiki [4] http://rdfa.digitalbazaar.com/fuzzbot/ [5] http://north.webdirections.org/ [6] http://www.youtube.com/watch?v=oPWNgZ4peuI [7] http://www.youtube.com/watch?v=PVGD9HQloDI -- Manu Sporny President/CEO - Digital Bazaar, Inc. blog: Fibers are the Future: Scaling Past 100K Concurrent Requests http://blog.digitalbazaar.com/2008/10/21/scaling-webservices-part-2
attached mail follows:
Manu Sporny ha scritto:
> Calogero Alex Baldacchino wrote:
>
>> That is, choosing a proper level of integration for RDF(a) support into
>> a web browser might divide success from failure. I don't know what's the
>> best possible level, but I guess the deepest may be the worst, thus
>> starting from an external support through out plugins, or scripts to be
>> embedded in a webbapp, and working on top of other feature might work
>> fine and lead to a better, native support by all vendors, yet limited to
>> an API for custom applications
>>
>
> There seems to be a bit of confusion over what RDFa can and can't do as
> well as the current state of the art. We have created an RDFa Firefox
> plugin called Fuzzbot (for Windows, Linux and Mac OS X) that is a very
> rough demonstration of how an browser-based RDFa processor might
> operate. If you're new to RDFa, you can use it to edit and debug RDFa
> pages in order to get a better sense of how RDFa works.
>
>
The concern is about every kind of metadata with respect to their
possible uses; but, while it's been stated that Microforamts (for
instance) don't require any purticular support by UAs (thus they're
backward compatible), RDFa would be a completely new feature, thus html5
specification should say what UAs are espected to do with such new
attributes.
Shall UAs just "accept" them and expose an API to extract triples, so
that a web application can build a query mechanism upon such an API?
This might work fine, and fulfill small-scale scenarios, such as
organization-wise data modelling and interchanging, as suggested by
Charls McCathieNevile; this can also be accomplished by an external plugin.
Shall UAs (browsers) also provide an interface to view bare triples (as
does Fuzzbot), as a kind of debugging tool? As above.
Shall UAs (browsers) also provide metadata-based features, such as a
query interface to look for content in a local history? This is a wider
scale application, and also a use case where problems may arise. From
this angle, metadata can't be assumed as reliable apriori (instead,
their reliability is uncertain), nor can users be deemed capable to
understand the problem and filter out wrong/misused/abused metadata (in
general). This is the scenario were spammy metadata may become an issue.
For instance, some code like,
<div typeof="foaf:Person">
<p property="foaf:name" content="Manu Sporny">We sell
<a href="http://www.cheatingcarseller.com"
rel="foaf:homepage">cars</a>
</p>
</div>
would produce the following triples,
_:bnode0 rdf:type http://xmlns.com/foaf/0.1/Person
_:bnode0 foaf:homepage http://www.cheatingcarseller.com
_:bnode0 foaf:name Manu Sporny
(this is exactly what Fuzzbot outputs)
thus, a metadata-based search feature might output a link to a
"metadata-spammy" site when queried for "Manu Sporny". That is, cheating
a metadata-based bot by the mean of fake metadata can be very easy.
Metadata-based features, but this is true for most of xml-related
technologies (such as RDF/RDFa), work fine if properly used. Unluckily,
"things must be used properly to work fine" is not the basic principle
of the web (and this is specially true for html and related
technologies), which instead has always been about "people will mess
everything up, but UAs will work fine as well", that is "robustness
before all, as far as possible". For what concerns html serialization,
in particular, I'd consider some code like,
<p typeof="cal:Vevent">
I'm holding
<span property="cal:summary">
one last summer Barbecue
<!-- /span -->, to meet friends and have a party before the end of
holidays
on
<span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
datatype="xsd:dateTime">
September 16th at 4pm
</span>.
</p>
(taken from <http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/> and
purposedly modified)
which is rendered properly, but produces,
_:bnode1 rdf:type http://www.w3.org/2002/12/cal/icaltzd#Vevent
_:bnode1 cal:dtstart 2007-09-16T16:00:00-05:00
_:bnode1 cal:summary one last summer Barbecue , to meet friends
and have a party before the end of holidays on <span
xmlns:cal="http://www.w3.org/2002/12/cal/icaltzd#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
datatype="xsd:dateTime" datatype="xsd:dateTime"
content="2007-09-16T16:00:00-05:00" property="cal:dtstart">September
16th at 4pm</span>
(taken from Fuzzbot keeping namespace declarations in the root element;
without xmlns:* attributes all triples are lost)
which is not the desired result. Perhaps it might work better as an xml
feature on a "strict" xml parser (aborting with an error because of a
missing end tag), even considering RDFa relies on namespaces (thus,
adding RDFa attributes to HTML5 spec would require some features from
xml extensibility to be added to html serialization). But RDFa in an
XHTML document might look like "rdfa:about", "rdfa:property",
"rdfa:content", and so on, that is as an external module, thus not
requiring any changes to the spec.
WBR, Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Incrementa la visibilita' della tua azienda con l'invio di newsletter e campagne email marketing.
* Con investimento di soli 250 Euro puoi incrementare la tua visibilita'
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8350&d=11-1
attached mail follows:
On 09.01.2009, at 01:54, Calogero Alex Baldacchino wrote: > > This is why I was thinking about somewhat "data-rdfa-about", "data- > rdfa-property", "data-rdfa-content" and so on, so that, for the > purposes of an RDFa processor working on top of HTML5 UAs One can also use <link rel="alternate" href="description.rdf">. I don't see why RDF metadata must be in the HTML document. It could be in a separated file, maybe embedded in RSS/Atom feeds (RSS1.0 is pretty close already). Websites that have a lot of useful data to share usually keep it in a database, and this allows them to easily generate RDF as separate documents without risk of getting out of sync with the HTML version. IMHO even RDFa metadata is invisible, and errors in RDFa wouldn't be much easier to spot than erorrs in external RDF files, e.g.: <section typeof="atom:Entry" xmlns:foaf="http://xmlns.com/foaf/1.0/" xmlns:atom="http://purl.org/atom/ns#"> <address rel="atom:author"> On <time property="atom:published" content="2009-01-10" >10 Jan 2009</time>, <a property="foaf:name" rel="foaf:page" href="http://joe.example.com">Joe Bloggs</a> wrote: </address> -- regards, Kornel
attached mail follows:
Kornel Lesiński ha scritto: > On 09.01.2009, at 01:54, Calogero Alex Baldacchino wrote: >> >> This is why I was thinking about somewhat "data-rdfa-about", >> "data-rdfa-property", "data-rdfa-content" and so on, so that, for the >> purposes of an RDFa processor working on top of HTML5 UAs > > One can also use <link rel="alternate" href="description.rdf">. I > don't see why RDF metadata must be in the HTML document. It could be > in a separated file, maybe embedded in RSS/Atom feeds (RSS1.0 is > pretty close already). > > Websites that have a lot of useful data to share usually keep it in a > database, and this allows them to easily generate RDF as separate > documents without risk of getting out of sync with the HTML version. > In principle, I agree (also, Atom 1.0 embedding RDFa as dataRSS is the base of SearchMonkey). But if people feel the need to embed metadata in their documents and to use them as a distributed database, well, let's give them a chance to do so. :-P eRDF might be a working compromise, because it doesn't need any changes to the spec; RDFa covers a wider range of RDF semantics, but requires new attributes and also namespaces (a sort of hybrid beteween them might avoid the need to bring namespaces - xmlns:* attributes - into html serialization). My suggestion was meant as a mean to test RDFa in HTML documents without changing the spec (perhaps in conjunction with data-xmlns-*, data-xmlns-prefixes="rdfa foaf <whatever>" to "emulate" namespaces - an ugly hack, I know, but at least would avoid changes to html serialization, at least in a test phase) -- even if I think that xml serialization should work better for such rdf metadata. WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8549&d=11-1
attached mail follows:
On 11/1/09 02:51, Calogero Alex Baldacchino wrote: > eRDF might be a working compromise, because it doesn't need any changes > to the spec It's not possible to author conforming HTML5 that functions as eRDF since eRDF requires a 'profile' attribute, but HTML5 has removed the attribute. http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml ; RDFa covers a wider range of RDF semantics, but requires > new attributes and also namespaces (a sort of hybrid beteween them might > avoid the need to bring namespaces - xmlns:* attributes - into html > serialization). To avoid xmlns:* attributes, one could drop CURIEs in the text/html serialization and use markup like: <div> <div about="http://dbpedia.org/resource/Albert_Einstein"> ... </div> </div> instead of <div xmlns:db="http://dbpedia.org/"> <div about="[db:resource/Albert_Einstein]"> ... </div> </div> There's no data loss. > My suggestion was meant as a mean to test RDFa in HTML > documents without changing the spec (perhaps in conjunction with > data-xmlns-*, data-xmlns-prefixes="rdfa foaf <whatever>" to "emulate" > namespaces - an ugly hack, I know, but at least would avoid changes to > html serialization, at least in a test phase) -- even if I think that > xml serialization should work better for such rdf metadata. I really can't see anybody violating the spec in that way rather than violating the spec by just adding the RDFa attributes outright, especially given that there are already people publishing these attributes in text/html so the "namespace" has already been polluted and we already have services like SearchMonkey not only using these attributes but promoting them. It may therefore already be problematic for a future version of HTML to use these attributes as extension points without breaking existing sites. The "test" is already in progress, for better or worse. HTML5 conformance checkers don't have to bless this test, of course, any more than CSS validators have to give the all clear to vendor-specific properties. Moreover, the damage done by immediately breaking the principle that data-* should be for private use only and turning it into a distributed extension point may be worse than the alternatives. -- Benjamin Hawkes-Lewis
attached mail follows:
Benjamin Hawkes-Lewis ha scritto: > On 11/1/09 02:51, Calogero Alex Baldacchino wrote: >> eRDF might be a working compromise, because it doesn't need any changes >> to the spec > > It's not possible to author conforming HTML5 that functions as eRDF > since eRDF requires a 'profile' attribute, but HTML5 has removed the > attribute. > I didn't noticed that before, thanks for the info :-) However, actually it's the same for RDFa attributes, because they're not in the spec. From this point of view, introducing six new attributes, or resorting to an older one is not very different, thus (again) why RDFa and not eRDF? Or why not both? Or not also RDFa embedded in Atom embedded, in turn, in HTML (like SVG or MathML)? It seems to me, for instance, that at this stage SearchMonkey might be a reason to consider all of them. > > ; RDFa covers a wider range of RDF semantics, but requires >> new attributes and also namespaces (a sort of hybrid beteween them might >> avoid the need to bring namespaces - xmlns:* attributes - into html >> serialization). > > To avoid xmlns:* attributes, one could drop CURIEs in the text/html > serialization and use markup like: > > <div> > <div about="http://dbpedia.org/resource/Albert_Einstein"> > ... > </div> > </div> > > instead of > > <div xmlns:db="http://dbpedia.org/"> > <div about="[db:resource/Albert_Einstein]"> > ... > </div> > </div> > > There's no data loss. > Well, that's a chance, of course, but that's *not* RDFa as specified by W3C; for instance, @property is specified as accepting _only_ CURIEs (whereas @about can accept also URIs - and eRDF allows curies, even if in a different format than what specified for RDFa and what is used for XML in general). That is, to do that not one, but _two_ specifications need to be changed, current HTML5 (which is a draft, thus not a problem) and RDFa (which now is a Recommendation, thus, might it be more difficoult? should a different specification be derived?), unless we want that to be just an unofficial, yet widely accepted, convention - and I think that an unofficial convention is worth the others (any processors conforming to standard RDFa would need deep changes to cope with that - it doesn't work in Fuzzbot when CURIEs are expected, for instance). I'm the first to say that my suggestion was an ugly hack, but at least it would have been working and conformant without changing anything. >> My suggestion was meant as a mean to test RDFa in HTML >> documents without changing the spec (perhaps in conjunction with >> data-xmlns-*, data-xmlns-prefixes="rdfa foaf <whatever>" to "emulate" >> namespaces - an ugly hack, I know, but at least would avoid changes to >> html serialization, at least in a test phase) -- even if I think that >> xml serialization should work better for such rdf metadata. > > I really can't see anybody violating the spec in that way rather than > violating the spec by just adding the RDFa attributes outright, -- Indeed, current specs are violated, and I was just considering a way to use RDFa without such violations before deciding if it's worth to be added to the spec, no more (and I don't want to push that hack anymore, just trying to point out my aim). > --especially given that there are already people publishing these > attributes in text/html so the "namespace" has already been polluted > and we already have services like SearchMonkey not only using these > attributes but promoting them. It seems to me that SearchMontky doesn't promote RDFa more than it promotes Microformats, eRDF and dataRSS (RDFa embedded in external Atom feeds). It's also a very recent feature, and I really can't guess which kind of RDF serialization is going to "win the battle" (that is, choosing one against the others *might* be a premature choice right now, as well as introducing all of them). > It may therefore already be problematic for a future version of HTML > to use these attributes as extension points without breaking existing > sites. The "test" is already in progress, for better or worse. HTML5 > conformance checkers don't have to bless this test, of course, any > more than CSS validators have to give the all clear to vendor-specific > properties. It's the same with every possible existing custom (non-standard) attributes and elements out there, since there is no standard for them, and instead data-* has been created; it's also the same for accesskey, actually, since it's not in current spec (whereas it was in HTML4). After all, support for unknown attributes/elements has never been a standard "de jure", but more of a quirk, and there are no grants it will work fine in the future (as well as actually it doesn't work consistently for unknown elements cross-browsers -- there are strong differences between IE and other browsers with this respect). Moreover, the use of such attributes /for the purposes of SearchMonkey/ is a very, very custom use case, since they're used just for server-side computations, thus no collaboration is required by other UAs; if browsers just ignored and dropped such attributes (as they do with unknown, proprietary CSS extensions), no page would be broken, whereas SearchMonkey would work as fine. Problems might arise if they were used in different contexts (e.g. as CSS selectors - but dropping unknown CSS rules is allowed by CSS spec), but who cares of them might just run a regex tool to map them to a new, standard-compliant version (given that, for instance, "data-rdfa-about", "rdfa:about" and "about" are in a 1-to-1 correspondence, thus such might be done very easily by UAs as a quirk). From this point of view, SearchMonkey might use its own custom dataset and model without any changes to its functionalities (AIUI, the basic format for RDF metadata in SearchMonkey is dataRSS). Since there are standards for embedding RDF into (x)html documents, it just makes sense to support them all for Yahoo. > > Moreover, the damage done by immediately breaking the principle that > data-* should be for private use only and turning it into a > distributed extension point may be worse than the alternatives. > > -- > Benjamin Hawkes-Lewis I really don't see the problem if a *custom* convention became widely accepted and reused by other people (given that my idea started from a Charls McCathieNevile's mail presenting small-scale scenarios, such as organizations' internal use and external interchange with other selected organizations, as a main context for RDFa - and I've never said HTML specification should even mention it, I was thinking to it just as an unofficial convention to experiment with in such scenarios). I really can't get, right now, why it should be different, for instance, from the case of a freely reusable widget using a custom data model based on private data-* attributes inserted by people in thousands of websites (the widget with relitive metadata, I mean), then liked by other people and reused in different contexts (the same data model based on data-*, now), unless we agree this should be avoided, but I can't guess how to prevent people from reusing a "private-only" data model they happened to like (unless it resulted in a copyright infringment, but I'm not sure this may happen because of the mere use of the same name for some "variables" elaborated by a similar script, yet different in source code -- given that copyright is evaluated at source code level, not per the resulting functionalities, as far as I know). WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Partecipa al concorso Danone Activia e vinci MacBook Air e Nokia N96. Prova Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8552&d=11-1
attached mail follows:
On 11/1/09 16:52, Calogero Alex Baldacchino wrote: > Well, that's a chance, of course, but that's *not* RDFa as specified by > W3C; for instance, @property is specified as accepting _only_ CURIEs Good point; I hadn't spotted that. > It's the same with every possible existing custom (non-standard) > attributes and elements out there, since there is no standard for them, > and instead data-* has been created; Emphatically, data-* has been created for private use data encoding (basically for scripting purposes) not as a replacement for the existing practices of adding new elements and attributes to HTML without going through W3C/WHATWG. Existing custom attributes intended for use by scripts (e.g. "action" in Gmail and Yahoo! Mail), have a direct migration path open for them (i.e. to "data-action" or a HTML5-native feature). Proprietary attributes intended for use by user agents (e.g. "autocomplete"), on the other hand, must be adopted by HTML5 if they are not to be remain non-conforming. > it's also the same for accesskey, > actually, since it's not in current spec (whereas it was in HTML4). I suspect the behavior for "accesskey" will ultimately be defined by the spec, whether or not it is made conforming. > After all, support for unknown attributes/elements has never been a > standard "de jure", but more of a quirk Depends what you mean by "support" I guess. > I really don't see the problem if a *custom* convention became widely > accepted and reused by other people Then you I think you don't agree with the fundamental design principle of the "data-*" attribute. The theory is that extensions to HTML benefit from going through a community process like WHATWG or W3C, and blessing extension points encourages people to circumvent that process, with the result that browsers have to support poorly designed features in order to have an interoperable web. > I really can't get, right now, why it should be different, for instance, > from the case of a freely reusable widget using a custom data model > based on private data-* attributes inserted by people in thousands of > websites (the widget with relitive metadata, I mean), then liked by > other people and reused in different contexts (the same data model based > on data-*, now) Reuse of "data-*" by DHTML widgets would not impose any additional requirements on user agents, so it would be fine from the perspective elaborated above. It wouldn't change the language by the back door. -- Benjamin Hawkes-Lewis
attached mail follows:
Benjamin Hawkes-Lewis wrote: > On 11/1/09 16:52, Calogero Alex Baldacchino wrote: > >> Well, that's a chance, of course, but that's *not* RDFa as specified by >> W3C; for instance, @property is specified as accepting _only_ CURIEs > > Good point; I hadn't spotted that. > >> It's the same with every possible existing custom (non-standard) >> attributes and elements out there, since there is no standard for them, >> and instead data-* has been created; > > Emphatically, data-* has been created for private use data encoding > (basically for scripting purposes) not as a replacement for the existing > practices of adding new elements and attributes to HTML without going > through W3C/WHATWG. It should, perhaps set alarm bells ringing that almost every time data-* attributes come up, people suggest using them to publish data to the web at large rather than as internal scripting hooks. Since the restrictions on data-* are not machine checkable, even the majority of "standards aware" authors are unlikely to heed them. Therefore the net effect of the restriction will be to prevent conscientious standards bodies from using data-* attributes in their specifications. It is quite possible that popular technologies will arise from sources other than such standards organisations and so use of data-* for more than just private scripting may be inevitable. It is also possible that features that start off as private scripting hooks will evolve into data publishing features. This again would lead to the natural breaking of the restriction of data-* attributes. (I know I have said this before but I forget whether I posted it or just discussed it on IRC.)
attached mail follows:
James Graham wrote: > It should, perhaps set alarm bells ringing that almost every time data-* > attributes come up, people suggest using them to publish data to the web > at large rather than as internal scripting hooks. Since the restrictions > on data-* are not machine checkable, even the majority of "standards > aware" authors are unlikely to heed them. Therefore the net effect of > the restriction will be to prevent conscientious standards bodies from > using data-* attributes in their specifications. It is quite possible > that popular technologies will arise from sources other than such > standards organisations and so use of data-* for more than just private > scripting may be inevitable. > > It is also possible that features that start off as private scripting > hooks will evolve into data publishing features. This again would lead > to the natural breaking of the restriction of data-* attributes. > > (I know I have said this before but I forget whether I posted it or just > discussed it on IRC.) Agreed. So what does this tell us about the point of view that distributed extensibility should not be supported by HTML5? Best regards, Julian
attached mail follows:
Benjamin Hawkes-Lewis ha scritto: > >> After all, support for unknown attributes/elements has never been a >> standard "de jure", but more of a quirk > > Depends what you mean by "support" I guess. > I just mean that, as far as I know, there is no official standard requiring UAs to support (parse and expose through the DOM) attributes and elements which are not part of the HTML language but are found in text/html documents. Usually, browsers support them for robustness sake and/or backward compatibility with existing pages, but they might do it with significant differences (actually it happens for unknown elements but not for unknown attributes, but one shouldn't assume such common behavior might not change in the future, or that will be adopted by newer vendors (even if that might be a quite safe assumption), thus any hack to the language /for custom purposes and script elaboration/ should be done by the mean of existing attributes/elements instead of creating new ones (I mean, "data-rdfa-about" might be a bit safer than just "about" to use a conservative approach based on the assumption "I know what happens today, not what will happen tomorrow") -- before data-* it was possible through the class attribute, now also data-* can be used for custom hacks) >> I really don't see the problem if a *custom* convention became widely >> accepted and reused by other people > > Then you I think you don't agree with the fundamental design principle > of the "data-*" attribute. The theory is that extensions to HTML > benefit from going through a community process like WHATWG or W3C, and > blessing extension points encourages people to circumvent that > process, with the result that browsers have to support poorly designed > features in order to have an interoperable web. > Yet it is *possible* to use data-* attributes to define a proper *private* convention by choosing names carefully in order to avoid clashes with other private conventions (for instance, a widget might need metadata to be put within the host page, and a careful choice of data-* names might avoid clashes with other metadata needed by other widgets or by the page itself). More people might find a certain convention useful and enough reusable for their purposes (because of non-clashing names), and the result would be a clearer "caw path" that community "cawboys" might follow to catch the free problem running away from standards. The *only* difference with "data-rdfa-*" here would be that a higher number of authors/developers should agree with such a convention from the beginning, but only if they were interested in exchanging the same metadata with each others for their respective *custom* uses (through a custom script or plugin, either developed independently or shared). From this point of view, the only difference between "data-rdfa-about" and "about" - as used for the purposes of SearchMonkey - is that the former is immediately conforming to HTML5 spec and, thus, surely exposed through the DOM by every possible HTML5 compliant UA, as it happens for classes used by Microformats. I've never thought to any requirements for UAs not coming from a clearly traced "caw path", the same way there is no requirement for UAs not involved in SearchMonkey to support any kind of metadata - for the purposes of SearchMonkey itself. Unless one thinks that everyone facing a problem not solved (at all or enough for his purposes) by an official standard should either create a private hack disregarding any possible hacks for similar problems he might have happened to find on the web, or start a new community process eventually without knowing if other people are facing the same problem, or a similar one, I really can't understand why a *custom* and *born-private* (eventually within a group of authors/developers) and then become a widely accepted convention should be a problem, as far as it is based on existing, standard features and doesn't require any additional support and results in a possible cawpath to be then standardized as needed. And I really don't understand why class="xyz" is a good hack whereas "data-some-thing" is not, assuming both are designed for and used by "caws opening a path" ( :-P ) >> I really can't get, right now, why it should be different, for instance, >> from the case of a freely reusable widget using a custom data model >> based on private data-* attributes inserted by people in thousands of >> websites (the widget with relitive metadata, I mean), then liked by >> other people and reused in different contexts (the same data model based >> on data-*, now) > > Reuse of "data-*" by DHTML widgets would not impose any additional > requirements on user agents, so it would be fine from the perspective > elaborated above. It wouldn't change the language by the back door. Really? Is it so much different from the case of the pattern attribute (which addresses, at the UA and language level, a problem earlier solved by scripts -- e.g. getting elements by their ids)? I don't think it's very different. From this perspective, if data-* attributes existed before the pattern attribute, someone might have used them to declare a regex then used by a script implementing a generic checking, and such might have been a good reason to add the pattern attribute to form inputs, requiring UAs to contrast the input value to its relative regular expression (a solution wich also works for UAs not supporting scripts, for instance). I guess closing a language to every kind of "back-door changes" may be in contrast with the principle of paving a cawpath. I also guess that, if microformats experience (or the "realworld semantics" they claim to be based on) had suggested the need to add a new element/attribute to the language, a new element/attribute would have been added. WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Partecipa al concorso Danone Activia e vinci MacBook Air e Nokia N96. Prova Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8550&d=12-1
attached mail follows:
On 12/1/09 20:26, Calogero Alex Baldacchino wrote: > I just mean that, as far as I know, there is no official standard > requiring UAs to support (parse and expose through the DOM) attributes > and elements which are not part of the HTML language but are found in > text/html documents. Perhaps, but then prior to HTML5, much of what practical user agents must do with HTML has not been required by any official standard. ;) RFC 2854 does say that "Due to the long and distributed development of HTML, current practice on the Internet includes a wide variety of HTML variants. Implementors of text/html interpreters must be prepared to be 'bug-compatible' with popular browsers in order to work with many HTML documents available the Internet." http://tools.ietf.org/html/rfc2854 HTML 4.01 does recommend that "[i]f a user agent encounters an element it does not recognize, it should try to render the element's content" and "[i]f a user agent encounters an attribute it does not recognize, it should ignore the entire attribute specification (i.e., the attribute and its value)". http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.2 Clearly these suggestions are incompatible with respect to attributes; AFAIK all popular UAs insert unrecognized attributes into the DOM and plenty of web content depends on that behaviour. >> Reuse of "data-*" by DHTML widgets would not impose any additional >> requirements on user agents, so it would be fine from the perspective >> elaborated above. It wouldn't change the language by the back door. > > Really? Is it so much different from the case of the pattern attribute > (which addresses, at the UA and language level, a problem earlier solved > by scripts -- e.g. getting elements by their ids)? I don't think it's > very different. From this perspective, if data-* attributes existed > before the pattern attribute, someone might have used them to declare a > regex then used by a script implementing a generic checking, and such > might have been a good reason to add the pattern attribute to form > inputs, requiring UAs to contrast the input value to its relative > regular expression (a solution wich also works for UAs not supporting > scripts, for instance). Just like proprietary elements/attributes introduced with user agent behaviours (marquee, autocomplete, canvas), scripted uses of "data-*" might suggest new features to be added to HTML, which would then become requirements for UAs. But unlike proprietary elements/attributes introduced with user agent behaviors, scripted uses of "data-*" do not impose new processing requirements on UAs. Therefore, unlike proprietary elements/attributes introduced with user agent behaviors, scripted uses of "data-*" impose _no_ design constraints on new features. Establishing user agent behaviours with "data-*" attributes, on the other hand, imposes almost as many design constraints as establishing them with proprietary elements and attributes. (There's just less pollution of the primary HTML "namespace".) If no RDFa was in deployment, you could argue it would be less wrong (from this perspective) to abuse "data-*" than introduce new attributes. But to the extent that these attributes are already in use in text/html and standardized within the "http://www.w3.org/1999/xhtml" namespace, processing requirements are effectively already being imposed on user agents (such as not introducing conflicting treatment of the "about" attribute). All that adding user agent behaviours with "data-rdfa*" attributes would do at this point is add _more_ requirements, without rescuing the polluted attributes. > I also guess that, > if microformats experience (or the "realworld semantics" they claim to > be based on) had suggested the need to add a new element/attribute to > the language, a new element/attribute would have been added. I'm not really sure what you mean. (It's watching the microformats community struggle with the problem of encoding machine data equivalents, for things like dates and telephone number types and measurements, that persuaded me HTML5 should include a generic machine data attribute, because it seems likely to me that the problem will be recurrent.) -- Benjamin Hawkes-Lewis
attached mail follows:
Benjamin Hawkes-Lewis ha scritto: > On 12/1/09 20:26, Calogero Alex Baldacchino wrote: >> I just mean that, as far as I know, there is no official standard >> requiring UAs to support (parse and expose through the DOM) attributes >> and elements which are not part of the HTML language but are found in >> text/html documents. > > Perhaps, but then prior to HTML5, much of what practical user agents > must do with HTML has not been required by any official standard. ;) > > RFC 2854 does say that "Due to the long and distributed development of > HTML, current practice on the Internet includes a wide variety of HTML > variants. Implementors of text/html interpreters must be prepared to > be 'bug-compatible' with popular browsers in order to work with many > HTML documents available the Internet." > > http://tools.ietf.org/html/rfc2854 > > HTML 4.01 does recommend that "[i]f a user agent encounters an element > it does not recognize, it should try to render the element's content" > and "[i]f a user agent encounters an attribute it does not recognize, > it should ignore the entire attribute specification (i.e., the > attribute and its value)". > > http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.2 > > Clearly these suggestions are incompatible with respect to attributes; > AFAIK all popular UAs insert unrecognized attributes into the DOM and > plenty of web content depends on that behaviour. > Very, very true. HTML 4.01 also says the recommended behaviours are ment "to facilitate experimentation and interoperability between implementations of various versions of HTML", whereas the "specification does not define how conforming user agents handle general error conditions, including how user agents behave when they encounter elements, attributes, attribute values, or entities not specified in this document", and since "user agents may vary in how they handle error conditions, authors and users must not rely on specific error recovery behavior". I just think the last sentence defines a best practice everyone should follow instead of relying on a common quirk supporting invalid markup. However, beside something being a good or bad practice, there will always be authors doing whatever they please, therefore it is quite safe to assume UAs will always expose invalid/unrecognized attributes (that's unavoidable, given the need for backward compatibility). > > Just like proprietary elements/attributes introduced with user agent > behaviours (marquee, autocomplete, canvas), scripted uses of "data-*" > might suggest new features to be added to HTML, which would then > become requirements for UAs. > > But unlike proprietary elements/attributes introduced with user agent > behaviors, scripted uses of "data-*" do not impose new processing > requirements on UAs. > > Therefore, unlike proprietary elements/attributes introduced with user > agent behaviors, scripted uses of "data-*" impose _no_ design > constraints on new features. > > Establishing user agent behaviours with "data-*" attributes, on the > other hand, imposes almost as many design constraints as establishing > them with proprietary elements and attributes. (There's just less > pollution of the primary HTML "namespace".) > > If no RDFa was in deployment, you could argue it would be less wrong > (from this perspective) to abuse "data-*" than introduce new attributes. Oh, well, I don't want to argue about that. For me the idea to use "data-rdfa-*" can rest in peace, since in practice it's not different from using RDFa attributes as they are, at least as far as they're handled by scripts, either client- or server-side. However I think that, * actually it seems not to be enough clear what UAs not involved in a particular project should do with RDFa attributes, beside exposing their content for the purpose of a script elaboration, whereas a precise behaviour should be defined, as well as an eventual class of UAs clearly identified as not required to support it, and eventual caveats on possible problems and relative solutions, before introducing any new elements/attributes in a formal specification; * actual deployment might be harmed by the use of xml namespaces in html serialization. Also, I see design suggestions more than impositions. If a new (and proprietary/private) attribute/element/convention is convincingly useful/needed, it is supported by other UAs and introduced in a specification, otherwise, if a not enough significant number of pages would be broken, it might even be redefined for use with a different semantics. And a possible process involving data-* attributes would/could be experiment privately => extend the scale involving other people finding it useful for their needs => get it in the primary namespace of an official specification (discarding the "data-" part and any other useless parts of the experimental name), so that existing pages may still work with their custom scripts or easily migrate to the new standard (and benefit of the new default support) by running a simple regex. > > But to the extent that these attributes are already in use in > text/html and standardized within the "http://www.w3.org/1999/xhtml" > namespace, processing requirements are effectively already being > imposed on user agents (such as not introducing conflicting treatment > of the "about" attribute). All that adding user agent behaviours with > "data-rdfa*" attributes would do at this point is add _more_ > requirements, without rescuing the polluted attributes. > For what concerns html serialization, introducing xml namespaces (and, thus, xml extensibility - as a whole or partly) might be worse than breaking current experimentaions. Since xhtml about all W3C production has converged towards XML, suggesting a direction the web didn't embraced completely, and instead causing objections with respect to xml features felt as useless or unwanted by a good number of people, herein namespaces and extensibility, hence the need to evolve html serialization to address new demands without forcing a migration towards xml. Therefore, introducing pieces of xml inside text/html documents may be problematic; of course, other surrogate mechanisms might be defined to indicate a namespace for the sole purposes of RDFa, but this would rise consitence issues between html and xhtml (as reported by Henri Sivonen), perhaps solvable by specifing a double mechanism as working for xhtml (the html specific one, and the "classic" xml one), but such a choice might add complexity to UAs and be confusing for authors. For what concerns XHTML, I disagree with the introduction of RDFa attribute into the basic namespace, and I wouldn't encourage the same in HTML5 spec. In first place, I think there is a possible conflict with respect to the "content" attribute semantics, because it now requires a different processing when used as an RDFa attribute and as a <meta> attribute associated to an "http-equiv" or a "name" value (for instance). In second place, it might be confusing for authors and lead to the misconception that every xhtml 1.x processor is also capable to process rdfa metadata (this is a limit of namespace + dtd/schema based modularization, because one can define the structure of a document, but not "orthogonal" behaviours requiring a specific support, not covered by the basic document model - such as collecting rdf triples declared by rdfa attributes, or calling a plugin and embedding its output - however, defining a proper namespace, maybe including its creation date somehow, may suggest what to expect from UAs). In third place, creating a different namespace would have resulted in a far easier introduction of RDFa attributes into other xml languages without having to change the language to host them (by the way, the xhtml namespace and a related prefix can be used, but this require a more specific support due to the "content" attribute issue, especially by UAs not supporting DTDs or schemata - that is, what should happen if an element were declared with both xhtml:name or xhtml:http-equiv, xhtml:content and xhtml:datatype, in an xml document accepting any attributes from external namespaces? of course, this is solvable, but rdfa:content, rdfa:datatype and so on would make things easier, or at least _cleaner_ and less confusing for authors having to understand that an XML and RDF processor can/must support the xhtml namespace and its _whole_ semantics, not just dom-related structures, but limited to RDFa attributes, so that no <meta> or <object> or <link> can be used hoping their semantics is supported, despite the support for the xhtml namespace...). Also there might have been fewer attributes, each one with a different semantic (assuming someone might not find useful to have a link with rel="stylesheet" representing a triple, for instance). Of course, this is my opinion. > > I also guess that, >> if microformats experience (or the "realworld semantics" they claim to >> be based on) had suggested the need to add a new element/attribute to >> the language, a new element/attribute would have been added. > > I'm not really sure what you mean. > > (It's watching the microformats community struggle with the problem of > encoding machine data equivalents, for things like dates and telephone > number types and measurements, that persuaded me HTML5 should include > a generic machine data attribute, because it seems likely to me that > the problem will be recurrent.) > > -- > Benjamin Hawkes-Lewis If there were a general agreement, a new element/attribute would be introduced as a result of a "bottom up" process (starting from experimentations) integrated with a "top down" community evaluation - for specific purposes, not generic machine exposure, I mean. (I'm not sure a generic machine data attribute - in general, not just referring to rdfa - would solve that, because each new occurrence of the problem might require a "brand new" datatype that only newer, updated UAs would understand (older ones would just parse the attribute and provide it as a string for further elaboration by a script, at most, but this might not be much better than using a data-* attribute for private script consumption), therefore, that wouldn't be necessarily different than creating a new appropriate attribute/element as needed and providing such new feature in newer, compliant UAs). WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Blu American Express: gratuita a vita! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8613&d=4-2
attached mail follows:
On 4/2/09 03:15, Calogero Alex Baldacchino wrote: > For what concerns XHTML, I disagree with the introduction of RDFa > attribute into the basic namespace, and I wouldn't encourage the same in > HTML5 spec. In first place, I think there is a possible conflict with > respect to the "content" attribute semantics, because it now requires a > different processing when used as an RDFa attribute and as a <meta> > attribute associated to an "http-equiv" or a "name" value (for instance). What conflict? 1. Attributes in XHTML can be distinguished by the elements they apply to as well as their name (e.g. the "name" attribute). 2. In XHTML+RDFa, "content" actually means the same thing on "meta" as on any other element in XHTML, which is presumably why they reused that attribute rather than introducing a new (better-named?) one: http://www.w3.org/TR/rdfa-syntax/#rdfa-attributes > In second place, it might be confusing for authors and lead to the > misconception that every xhtml 1.x processor is also capable to process > rdfa metadata (this is a limit of namespace + dtd/schema based > modularization, because one can define the structure of a document, but > not "orthogonal" behaviours requiring a specific support, not covered by > the basic document model - such as collecting rdf triples declared by > rdfa attributes, or calling a plugin and embedding its output - however, > defining a proper namespace, maybe including its creation date somehow, > may suggest what to expect from UAs). There's no way to query a user agent about support for the specifications associated with a particular namespace, and namespaces are an unreliable guide to what user agents actually support, so I don't buy this concern. Existing XHTML 1.x user agents don't always implement all the features of XHTML 1.x (e.g. exposing "longdesc" and "cite" to the user). HTML5 is introducing new elements and attributes into the same namespace, and authors would be wrong to assume that any XHTML-supporting browser will know what to do with them beyond inserting them into the DOM. XHTML modularization means you can't count on an XHTML user agent to implement any particular feature in the XHTML namespace. A more reliable guide to what user agents support is looking at the list of supported features (as opposed to namespaces or modules or any other proxy) in their documentation. > In third place, creating a different namespace would have resulted in a > far easier introduction of RDFa attributes into other xml languages > without having to change the language to host them (by the way, the > xhtml namespace and a related prefix can be used, but this require a > more specific support due to the "content" attribute issue, especially > by UAs not supporting DTDs or schemata - that is, what should happen if > an element were declared with both xhtml:name or xhtml:http-equiv, > xhtml:content and xhtml:datatype, in an xml document accepting any > attributes from external namespaces? I cannot understand how RDFa attributes in a different namespace would be easier to reuse either in another language or a XML document where the host is not XHTML. "content" and "datatype" mean the same on all elements, so your particular example seems like a non-problem to me - at least from the perspective of RDFa, which doesn't define processing for "name" or "http-equiv". In so far as there is a problem, it's already a problem with bog-standard XHTML. How should <myml:bar xhtml:name="foo" xhtml:http-equiv="baz" xhtml:content="quux"> be processed? > of course, this is solvable, but > rdfa:content, rdfa:datatype and so on would make things easier, or at > least _cleaner_ and less confusing for authors having to understand that > an XML and RDF processor can/must support the xhtml namespace and its > _whole_ semantics, not just dom-related structures, but limited to RDFa > attributes, so that no <meta> or <object> or <link> can be used hoping > their semantics is supported, despite the support for the xhtml > namespace...). An "XML and RDF processor" doesn't have to support XHTML or RDFA - XML and RDF are independent specifications. A conforming XHTML+RDFa UA "user agent MUST support all of the features required in this specification. A conforming user agent must also support the User Agent conformance requirements as defined in XHTML Modularization [XHTMLMOD] section on "XHTML Family User Agent Conformance". http://www.w3.org/TR/rdfa-syntax/#uaconf Those further requirements can be read at: http://www.w3.org/TR/xhtml-modularization/conformance.html#s_conform_user_agent An XHTML+RDFa conforming user agent does not have to implement "meta", "object", or "link", and as a explained above, authors cannot assume support for particular features based on namespaces. > Also there might have been fewer attributes, each one > with a different semantic (assuming someone might not find useful to > have a link with rel="stylesheet" representing a triple, for instance). I don't follow. link with rel="stylesheet" _does_ represent information expressible as a triple, why would it be useful to pretend otherwise? And how would doing so make for fewer attributes? > If there were a general agreement, a new element/attribute would be > introduced as a result of a "bottom up" process (starting from > experimentations) integrated with a "top down" community evaluation - > for specific purposes, not generic machine exposure, I mean. There is no general agreement to that AFAICT, and I doubt think using unstandardized elements or attributes or using data-* for public use would be good approaches to extending HTML: the former blocks potential extension points (e.g. "canvas") and the later pointlessly introduces the risk that a private use might be confused with a public one. > (I'm not sure a generic machine data attribute - in general, not just > referring to rdfa - would solve that, because each new occurrence of the > problem might require a "brand new" datatype that only newer, updated > UAs would understand (older ones would just parse the attribute and > provide it as a string for further elaboration by a script, at most, but > this might not be much better than using a data-* attribute for private > script consumption), therefore, that wouldn't be necessarily different > than creating a new appropriate attribute/element as needed and > providing such new feature in newer, compliant UAs). It would be very different in practice, because (like new "class" names), new "content" values wouldn't need to go through the W3C/WHATWG standards process. That has a cost of course. You might end up with a worse design, especially if you don't go through a community like microformats. But that cost arguably isn't so bad when you're talking about embedding arbitrary data rather than features like "canvas" or "datagrid" that require new parsing, DOM APIs, and user interface from popular user agents. This cost appears to be acceptable in the case of microformat "class" names, for example. Now, you could already embed data with a bad design using HTML5's other extension mechanisms (e.g. "script"). It's just that microformats choose to abuse other attributes ("title") instead, partly because they allow you to wrap some human-readable content with its machine-readable equivalent (i.e. it's a more "markup-like" way of doing things). My feeling is that the cost of bad designs for embedded data is (1) unavoidable and (2) less than the benefits of avoiding misuse of other (X)HTML features for embedding data. -- Benjamin Hawkes-Lewis
attached mail follows:
On Jan 11, 2009, at 18:52, Calogero Alex Baldacchino wrote: > However, actually it's the same for RDFa attributes, because they're > not in the spec. From this point of view, introducing six new > attributes, or resorting to an older one is not very different, thus > (again) why RDFa and not eRDF? eRDF is very different in not relying on attributes whose qname contains the substring "xmlns". -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
On Fri, 09 Jan 2009 12:54:08 +1100, Calogero Alex Baldacchino <alex.baldacchino@email.it> wrote: > I admit I'm not very expert in RDF use, thus I have a few questions. > Specifically, maybe I can guess the advantages when using the same > (carefully modelled, and well-known) vocabulary/ies; but when two > organizations develop their own vocabularies, similar yet different, to > model the same kind of informations, is merging of data enough? Can a > processor give more than a collection of triples, to be then interpreted > basing on knowledge on the used vocabulary/ies? RDF consists of several parts. One of the key parts explains how to make an RDF vocabulary self-describing in terms of other vocabularies. > I mean, I assume my tools can extract RDF(a) data from whatever > document, but my query interface is based on my own vocabulary: when I > merge informations from an external vocabulary, do I need to translate > one vocabulary to the other (or at least to modify the query backend, so > that certain curies are recognized as representing the same concepts - > e.g. to tell my software that 'foaf:name' and 'ex:someone' are > equivalent, for my purposes)? If so, merging data might be the minor > part of the work I need to do, with respect to non-RDF(a) metadata (that > is, I'd have tools to extract and merge data anyway, and once I > translated external metadata to my format, I could use my own tools to > merge data), specially if the same model is used both by mine and an > external organization (therefore requiring an easier translation). If a vocabulary is described, then you can do an automated translation from one RDF vocabulary to another by using your original query based in your original vocabulary. This is one of the strengths of RDF. > Thus, I'm thinking the most valuable benefit of using RDF/RDFa is the > sureness that both parties are using the very same data model, despite > the possible use of different vocabularies -- it seems to me that the > concept of triples consisting of a subject, a predicate and an object is > somehow similar to a many-to-many association in a database, whereas one > might prefer a one-to-many approach - though, the former might be a > natural choice to model data which are usually sparse, as in a document > prose. I don't see the ananlogy, but yes, I think the big benefit is being able to ensure that you know the data model without knowing the vocabulary a priori - since this is sufficient to automate the process of merging data into your model. cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
attached mail follows:
Charles McCathieNevile ha scritto: > On Fri, 09 Jan 2009 12:54:08 +1100, Calogero Alex Baldacchino > <alex.baldacchino@email.it> wrote: > >> I admit I'm not very expert in RDF use, thus I have a few questions. >> Specifically, maybe I can guess the advantages when using the same >> (carefully modelled, and well-known) vocabulary/ies; but when two >> organizations develop their own vocabularies, similar yet different, >> to model the same kind of informations, is merging of data enough? >> Can a processor give more than a collection of triples, to be then >> interpreted basing on knowledge on the used vocabulary/ies? > > RDF consists of several parts. One of the key parts explains how to > make an RDF vocabulary self-describing in terms of other vocabularies. > >> I mean, I assume my tools can extract RDF(a) data from whatever >> document, but my query interface is based on my own vocabulary: when >> I merge informations from an external vocabulary, do I need to >> translate one vocabulary to the other (or at least to modify the >> query backend, so that certain curies are recognized as representing >> the same concepts - e.g. to tell my software that 'foaf:name' and >> 'ex:someone' are equivalent, for my purposes)? If so, merging data >> might be the minor part of the work I need to do, with respect to >> non-RDF(a) metadata (that is, I'd have tools to extract and merge >> data anyway, and once I translated external metadata to my format, I >> could use my own tools to merge data), specially if the same model is >> used both by mine and an external organization (therefore requiring >> an easier translation). > > If a vocabulary is described, then you can do an automated translation > from one RDF vocabulary to another by using your original query based > in your original vocabulary. This is one of the strengths of RDF. > Certainly, this is a strong benefit. However, when comparing different vocabularies in depth to their basic description (if any), I guess there may be a chance to find vocabularies which are not described in terms of each other, or of a third common vocabulary, thus a translation might be needed anyway. This might be true for small-time users developing a vocabulary for internal use before starting an external partnership, or regardless of the partnership (sometimes, small-time users may find it easier/faster to "reinvent the wheel" and modify it to address evolving problems; potentially someone might be unable to afford an extensive investigation to find an existing vocabulary fulfilling his requirments, or to develope a new one in conjunction with a partner having similar but slightly different needs, and thus potentially leading to a longer process to mediate respective needs. In such a case, I wouldn't expect that such a person will look for existing, more generic vocabularies which can describe the new one in order to ensure the widest possible interchange of data - that is, until a requirement for interchange arises, designing a vocabulary for that might be an overengineered task, and once the requirement was met, addressing it with a translation or with a description in term of a vocabulary known to be involved (each time the problem recurres) might be easier/faster than engineering a good description once and for all). Anyway, let's assume we're going to deal with well-described vocabularies. Is the automated translation a task of a parser/processor creating a graph of triples, or a task of a query backend? And what are the requirements for a UA, from this perspective? Must it just parse the triples and create a graph or also take care of a vocabulary description? Must it be a complete query backend? Must it also provide a query interface? How much basic or advanced must the interface be? I think we should answer questions like this, and try and figure out possible problems arising with each answer and possible related solutions, because the concern here should be what UAs must do with RDF embedded in a non-RDF (and non-XML) document. >> Thus, I'm thinking the most valuable benefit of using RDF/RDFa is >> the sureness that both parties are using the very same data model, >> despite the possible use of different vocabularies -- it seems to me >> that the concept of triples consisting of a subject, a predicate and >> an object is somehow similar to a many-to-many association in a >> database, whereas one might prefer a one-to-many approach - though, >> the former might be a natural choice to model data which are usually >> sparse, as in a document prose. > > I don't see the ananlogy, but yes, I think the big benefit is being > able to ensure that you know the data model without knowing the > vocabulary a priori - since this is sufficient to automate the process > of merging data into your model. > I understand the benefit with respect to well-known and/or well-described vocabularies, but I wonder if an average small-time user would produce a well-described or a very-custom vocabulary. In the latter case, a good knowledge of a foreing vocabulary should be needed before querying it and I guess the translation can't be automated, but requires an understanding level which might be close to the one needed to translate from a (more or less) different model. In this case, the benefit of an automated merging of data from similar models might be lost in front of a non-automated translation which might be as difficult as translating from different models (but with a sufficient verbal documentation - that is with a natural language description, which should be easier to produce than a code-level description), given that translated data should be easy to merge. I'm pushing this concept because I think it should be clear what scenario is more likely to happen, to avoid to introduce features perfectly designed for the same people who can develop a "perfect" vocabulary with a "perfect" generic description, and I suppose to be the same who can afford to develop a generic toolkit on their own, or to adjust an existing one (thus, they might be pleased with a basic support and a basic API), but not for most small-time users, who might develop a custom vocabulary the same way they develop a custom model, thus needing more custom tools (again, a basic support and a basic API might satisfay their needs, more than a complete backend working fine with well-described vocabularies but not with completely unknown ones, thus requiring a custom developement anyway). Assuming this is true, there should be an evidence that the same people who'd produce a "bad" vocabulary do not prefer a completely custom model, because, if they were the great majority, we would risk to invest resources (on the UAs side, if we made of it a general requirement) to help people who may be pleased with the help, but not really need it (because they're not small-time users maybe, and can do it on their own without too much effort -- this doesn't mean that their requirements are less significant and worth to be taken into account, but in general UA developers might not be very happy to invest their resources to implement something which is or appear overengineered with respect to the real needs "in the wild", thus we should carefully establish how strong is the need to support RDFa and accurately define support requirements for UAs). > cheers > > Chaals > WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Blu American Express: gratuita a vita! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8613&d=4-2
attached mail follows:
The use cases for RDFa are pretty much the same as those for Microformats. For example, if a person's name and contact details are marked up on a web page using hCard, the user-agent can offer to, say, add the person to your address book, or add them as a friend on a social networking site, or add a reminder about that person's birthday to your calendar. If an event is marked up on a web page using hCalendar, then the user- agent could offer to add it to a calendar, or provide the user with a map of its location, or add it to a timeline that the user is building for their school history project. Providing rich semantics for the information on a web page allows the user-agent to know what's on a page, and step in and perform helpful tasks for the user. So why RDFa and not Microformats? Firstly, RDFa provides a single unified parsing algorithm that Microformats do not. Separate parsers need to be created for hCalendar, hReview, hCard, etc, as each Microformat has its own unique parsing quirks. For example, hCard has N-optimisation and ORG- optimisation which aren't found in hCalendar. With RDFa, a single algorithm is used to parse everything: contacts, events, places, cars, songs, whatever. Secondly, as the result of having one single parsing algorithm, decentralised development is possible. If I want a way of marking up my iguana collection semantically, I can develop that vocabulary without having to go through a central authority. Because URIs are used to identify vocabulary terms, I can be sure that my vocabulary won't clash with other people's vocabularies. It can be argued that going through a community to develop vocabularies is beneficial, as it allows the vocabulary to be built by "many minds" - RDFa does not prevent this, it just gives people alternatives to community development. Lastly, there are a lot of parsing ambiguities for many Microformats. One area which is especially fraught is that of scoping. The editors of many current draft Microformats[1] would like to allow page authors to embed licensing data - e.g. to say that a particular recipe for a pie is licensed under a Creative Commons licence. However, it has been noted that the current rel=license Microformat can not be re-used within these drafts, because virtually all existing rel=license implementations will just assume that the license applies to the whole page rather than just part of it. RDFa has strong and unambiguous rules for scoping - a license, for example, could apply to a section of the page, or one particular image. RDFa was largely borne of looking at Microformats, looking at what was successful about them, considering problems with them, and finding ways to resolve those problems. ____ 1. It has been discussed in hAudio, figure, hRecipe and others. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
attached mail follows:
On 2009-01-01 15:24, Toby A Inkster wrote: > The use cases for RDFa are pretty much the same as those for Microformats. Right, but microformats can be used without any changes to the HTML language, whereas RDFa requires such changes. If they fulfill the same use cases, then there's not much point in adding RDFa. > For example, if a person's name and contact details are marked up on a > web page using hCard, the user-agent can offer to, say, add the person > to your address book, or add them as a friend on a social networking > site, or add a reminder about that person's birthday to your calendar. > > If an event is marked up on a web page using hCalendar, then the > user-agent could offer to add it to a calendar, or provide the user with > a map of its location, or add it to a timeline that the user is building > for their school history project. > > Providing rich semantics for the information on a web page allows the > user-agent to know what's on a page, and step in and perform helpful > tasks for the user. > > So why RDFa and not Microformats? > > Firstly, RDFa provides a single unified parsing algorithm that > Microformats do not. Separate parsers need to be created for hCalendar, > hReview, hCard, etc, as each Microformat has its own unique parsing > quirks. For example, hCard has N-optimisation and ORG-optimisation which > aren't found in hCalendar. With RDFa, a single algorithm is used to > parse everything: contacts, events, places, cars, songs, whatever. This is not necessarily beneficial. If you have separate parsing algorithms, you can code in shortcuts for common use-cases and thus optimise the authoring experience. Also, as has been pointed out before in the distributed extensibility debate, parsing is a very small part of doing useful things with content. > Secondly, as the result of having one single parsing algorithm, > decentralised development is possible. If I want a way of marking up my > iguana collection semantically, I can develop that vocabulary without > having to go through a central authority. You can develop vocabularies without going through a central authority already, via class or id, and many people already do. > Because URIs are used to > identify vocabulary terms, I can be sure that my vocabulary won't clash > with other people's vocabularies. Again, you can do this with class, by putting your domain name in the class attribute. It also depends on how much of an issue you think clashes will be with an iguana collection-- I would suggest that due to the specialised nature of the markup, clashes would be quite unlikely. > It can be argued that going through a > community to develop vocabularies is beneficial, as it allows the > vocabulary to be built by "many minds" - RDFa does not prevent this, it > just gives people alternatives to community development. RDFa does not give anything over what the class attribute does in terms of community vs individual development, so this doesn't really speak in RDFa's favour. > Lastly, there are a lot of parsing ambiguities for many Microformats. > One area which is especially fraught is that of scoping. The editors of > many current draft Microformats[1] would like to allow page authors to > embed licensing data - e.g. to say that a particular recipe for a pie is > licensed under a Creative Commons licence. However, it has been noted > that the current rel=license Microformat can not be re-used within these > drafts, because virtually all existing rel=license implementations will > just assume that the license applies to the whole page rather than just > part of it. RDFa has strong and unambiguous rules for scoping - a > license, for example, could apply to a section of the page, or one > particular image. Are there other cases where this granularity of scoping would be genuinely helpful? If not, it would seem better to work out a solution for scoping licence information instead of bringing in a whole new vocabulary to solve it. What would you do with scoped copyright information, anyway? I can see images being an issue, but ideally information about a resource should be kept in that resource, and as such the licence should be embedded in the image rather than given by a Web page. In the case of particular sections having particular licences, is there any practical use of marking up different sections with different licences over just doing that with text? > RDFa was largely borne of looking at Microformats, looking at what was > successful about them, considering problems with them, and finding ways > to resolve those problems. Andi
attached mail follows:
On Fri, 02 Jan 2009 05:43:05 +1100, Andi Sidwell <andi@takkaria.org> wrote: > On 2009-01-01 15:24, Toby A Inkster wrote: >> The use cases for RDFa are pretty much the same as those for >> Microformats. > > Right, but microformats can be used without any changes to the HTML > language, whereas RDFa requires such changes. If they fulfill the same > use cases, then there's not much point in adding RDFa. ... >> So why RDFa and not Microformats? (I think the question should be why RDFa is needed *as well as* µformats) >> Firstly, RDFa provides a single unified parsing algorithm that >> Microformats do not. ... > This is not necessarily beneficial. If you have separate parsing > algorithms, you can code in shortcuts for common use-cases and thus > optimise the authoring experience. On the other hand, you cannot parse information until you know how it is encoded, and information encoded in RDFa can be parsed without knowing more. And not only can you optimise your parsing for a given algorithm, you can also do for a known vocabulary - or you can optimise the post-parsing treatment. > Also, as has been pointed out before in the distributed extensibility > debate, parsing is a very small part of doing useful things with content. Yes. However many of the use cases that I think justify the inclusion of RDFa are already very small on their own, and valuable when several vocabularies are combined. So being able to do off-the-shelf parsing is valuable, compared to working out how to parse a combination of formats together. >> Secondly, as the result of having one single parsing algorithm, >> decentralised development is possible. If I want a way of marking up my >> iguana collection semantically, I can develop that vocabulary without >> having to go through a central authority. > > You can develop vocabularies without going through a central authority > already, via class or id, and many people already do. > >> Because URIs are used to >> identify vocabulary terms, I can be sure that my vocabulary won't clash >> with other people's vocabularies. > > Again, you can do this with class, by putting your domain name in the > class attribute. It also depends on how much of an issue you think > clashes will be with an iguana collection-- I would suggest that due to > the specialised nature of the markup, clashes would be quite unlikely. It depends how many people work on iguana collections - or Old Norse and Anglo Saxon text, which was the use case that got me involved in the Web in the very early 90s. It turns out that people don't, in the µformats world, use unambiguous names, especially when they are privately developing their own information. By contrast, those who come from an RDF world do this by habit. >> It can be argued that going through a >> community to develop vocabularies is beneficial, as it allows the >> vocabulary to be built by "many minds" - RDFa does not prevent this, it >> just gives people alternatives to community development. > > RDFa does not give anything over what the class attribute does in terms > of community vs individual development, so this doesn't really speak in > RDFa's favour. In principle no, but in real world usage the class attribute is considered something that is primarily local, whereas RDFa is generally used by people who have a broader outlook on the desirable permanence and re-usability of their data. >> Lastly, there are a lot of parsing ambiguities for many Microformats. >> One area which is especially fraught is that of scoping. The editors of >> many current draft Microformats[1] would like to allow page authors to >> embed licensing data - e.g. to say that a particular recipe for a pie is >> licensed under a Creative Commons licence. However, it has been noted >> that the current rel=license Microformat can not be re-used within these >> drafts, because virtually all existing rel=license implementations will >> just assume that the license applies to the whole page rather than just >> part of it. RDFa has strong and unambiguous rules for scoping - a >> license, for example, could apply to a section of the page, or one >> particular image. > > Are there other cases where this granularity of scoping would be > genuinely helpful? If not, it would seem better to work out a solution > for scoping licence information... Yes. Being able to describe accessibility of various parts of content, or point to potential replacement content for particular use cases, benefits enormously from such scoping (this is why people who do industrial-scale accessibility often use RDF as their infrastructure). ARIA has already taken the approach of looking for a special-purpose way to do this, which significantly bloats HTML but at least allows important users to satisfy their needs to be able t produce content with certain information included. Government and large enterprises produce content that needs to be maintained, and being able to include production, cataloguing, and similar metadata directly, scoped to the document, would be helpful. As a trivial example, it would be useful to me in working to improve the Web content we produce at Opera to have a nice mechanism for identifying the original source of various parts of a page. > What would you do with scoped copyright information, anyway? I can see > images being an issue, but ideally information about a resource should > be kept in that resource, and as such the licence should be embedded in > the image rather than given by a Web page. In the case of particular > sections having particular licences, is there any practical use of > marking up different sections with different licences over just doing > that with text? Mash-ups. If they have a use-case, and I think it is widely accepted that they do, then it would seem obvious that being able to identify the source of each part, and any conditions that vary between different sources, is a use case. cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
attached mail follows:
On Fri, Jan 2, 2009 at 12:12 AM, Charles McCathieNevile <chaals@opera.com> wrote: > On Fri, 02 Jan 2009 05:43:05 +1100, Andi Sidwell <andi@takkaria.org> wrote: > >> On 2009-01-01 15:24, Toby A Inkster wrote: >>> >>> The use cases for RDFa are pretty much the same as those for >>> Microformats. >> >> Right, but microformats can be used without any changes to the HTML >> language, whereas RDFa requires such changes. If they fulfill the same use >> cases, then there's not much point in adding RDFa. > > ... Why the non-response? This is precisely the point of contention. Things aren't added to the spec on a whim. Things get added when it is demonstrated that authors will significantly benefit from the inclusion of the feature in the language. Microformats (used as an example only) use only features already in the language, and thus do not need any spec support. If they already solve the problem adequately, then there is no need to go further. >>> So why RDFa and not Microformats? > > (I think the question should be why RDFa is needed *as well as* µformats) This is correct. Microformats exist already. They solve current problems. Are there further problems that Microformats don't address which can be solved well by RDFa? Are these problems significant enough to authors to be worth addressing in the spec, or can we wait and let the community work out its own solutions further before we make a move? We generally want to wait until a given item is truly established before speccing it, so that we can work with existing use-cases and solve known problems. To do otherwise risks us inventing use-cases that don't commonly exist in reality, solving non-problems while leaving gaping holes that will cause authors problems down the line. For an example (used several times, but that's because it's a really good example), consider <video>. Flash-based video players are already extremely common. We know how people use them, we know what authors generally expect from them, and we know what problems exist with how they are currently implemented and used. We also feel that extending the language would allow us to solve these problems, and help authors significantly. Thus, <video>. Microformats are the metadata equivalent of Flash-based video players. They are hacks used to allow authors to accomplish something not explicitly accounted for in the language. Are there significant problems with this approach? Is metadata embedding used widely enough to justify extending the language for it, or are the current hacks (Microformats, in this case) enough? Are current metadata embedding practices mature enough that we can be relatively sure we're solving actual problems with our extension? These are all questions that must be asked of any extention to the language. >>> Firstly, RDFa provides a single unified parsing algorithm that >>> Microformats do not. ... > >> This is not necessarily beneficial. If you have separate parsing >> algorithms, you can code in shortcuts for common use-cases and thus optimise >> the authoring experience. > > On the other hand, you cannot parse information until you know how it is > encoded, and information encoded in RDFa can be parsed without knowing more. > > And not only can you optimise your parsing for a given algorithm, you can > also do for a known vocabulary - or you can optimise the post-parsing > treatment. What is the benefit to authors of having an easily machine-parsed format? (Note: this is completely separate from the question of the benefits of metadata at all.) Are they greater than the benefits of a format that is harder to parse, but easier for authors to write? > >> Also, as has been pointed out before in the distributed extensibility >> debate, parsing is a very small part of doing useful things with content. > > Yes. However many of the use cases that I think justify the inclusion of > RDFa are already very small on their own, and valuable when several > vocabularies are combined. So being able to do off-the-shelf parsing is > valuable, compared to working out how to parse a combination of formats > together. Can you provide these use-cases? The discussion has an astonishing dearth of use-cases by which we can evaluate the effectiveness of proposals. >>> Secondly, as the result of having one single parsing algorithm, >>> decentralised development is possible. If I want a way of marking up my >>> iguana collection semantically, I can develop that vocabulary without >>> having to go through a central authority. >> >> You can develop vocabularies without going through a central authority >> already, via class or id, and many people already do. >> >>> Because URIs are used to >>> identify vocabulary terms, I can be sure that my vocabulary won't clash >>> with other people's vocabularies. >> >> Again, you can do this with class, by putting your domain name in the >> class attribute. It also depends on how much of an issue you think clashes >> will be with an iguana collection-- I would suggest that due to the >> specialised nature of the markup, clashes would be quite unlikely. > > It depends how many people work on iguana collections - or Old Norse and > Anglo Saxon text, which was the use case that got me involved in the Web in > the very early 90s. It turns out that people don't, in the µformats world, > use unambiguous names, especially when they are privately developing their > own information. By contrast, those who come from an RDF world do this by > habit. Is this a problem that needs to be solved in the spec, or is it one that can be solved socially? More importantly, is it a problem that needs to be solved at all? Is there any indication that use of ambiguous names produces significant problems for authors? >>> It can be argued that going through a >>> community to develop vocabularies is beneficial, as it allows the >>> vocabulary to be built by "many minds" - RDFa does not prevent this, it >>> just gives people alternatives to community development. >> >> RDFa does not give anything over what the class attribute does in terms of >> community vs individual development, so this doesn't really speak in RDFa's >> favour. > > In principle no, but in real world usage the class attribute is considered > something that is primarily local, whereas RDFa is generally used by people > who have a broader outlook on the desirable permanence and re-usability of > their data. Can we extract a requirement from this, then? >>> Lastly, there are a lot of parsing ambiguities for many Microformats. >>> One area which is especially fraught is that of scoping. The editors of >>> many current draft Microformats[1] would like to allow page authors to >>> embed licensing data - e.g. to say that a particular recipe for a pie is >>> licensed under a Creative Commons licence. However, it has been noted >>> that the current rel=license Microformat can not be re-used within these >>> drafts, because virtually all existing rel=license implementations will >>> just assume that the license applies to the whole page rather than just >>> part of it. RDFa has strong and unambiguous rules for scoping - a >>> license, for example, could apply to a section of the page, or one >>> particular image. >> >> Are there other cases where this granularity of scoping would be genuinely >> helpful? If not, it would seem better to work out a solution for scoping >> licence information... > > Yes. > > Being able to describe accessibility of various parts of content, or point > to potential replacement content for particular use cases, benefits > enormously from such scoping (this is why people who do industrial-scale > accessibility often use RDF as their infrastructure). ARIA has already taken > the approach of looking for a special-purpose way to do this, which > significantly bloats HTML but at least allows important users to satisfy > their needs to be able t produce content with certain information included. > > Government and large enterprises produce content that needs to be > maintained, and being able to include production, cataloguing, and similar > metadata directly, scoped to the document, would be helpful. As a trivial > example, it would be useful to me in working to improve the Web content we > produce at Opera to have a nice mechanism for identifying the original > source of various parts of a page. Can we distill this into use-cases, then? You, as an author, want to be able to specify the original source of a piece of content. What's the practical use of this? Does it require an embedded, machine-readable vocabulary to function? Are existing solutions adequate (frex, footnotes)? >> What would you do with scoped copyright information, anyway? I can see >> images being an issue, but ideally information about a resource should be >> kept in that resource, and as such the licence should be embedded in the >> image rather than given by a Web page. In the case of particular sections >> having particular licences, is there any practical use of marking up >> different sections with different licences over just doing that with text? > > Mash-ups. If they have a use-case, and I think it is widely accepted that > they do, then it would seem obvious that being able to identify the source > of each part, and any conditions that vary between different sources, is a > use case. Not quite. Specifically, is there any practical use for marking up various sections of a site with licensing information specific to that section *in an embedded, machine-readable manner*? Are the existing solutions adequate (frex, simply putting a separate copyright notice on each section, or noting the various copyrights on a licensing page)? (Note: I responded to your email rather than the OP because it presented better points to respond to.) ~TJ
attached mail follows:
On Fri, Jan 2, 2009 at 12:02 PM, Julian Reschke <julian.reschke@gmx.de> wrote: > Tab Atkins Jr. wrote: >>>> >>>> Right, but microformats can be used without any changes to the HTML >>>> language, whereas RDFa requires such changes. If they fulfill the same >>>> use >>>> cases, then there's not much point in adding RDFa. >>> >>> ... >> >> Why the non-response? This is precisely the point of contention. >> Things aren't added to the spec on a whim. Things get added when it >> is demonstrated that authors will significantly benefit from the >> inclusion of the feature in the language. Microformats (used as an >> example only) use only features already in the language, and thus do >> not need any spec support. If they already solve the problem >> adequately, then there is no need to go further. >> ... > > I think the supporters of RDFa (me included) claim that Microformats only > address a subset of the problem solved by RDFa. The next step, then, is to list these problems, establish that they truly aren't solved by existing solutions (not just Microformats), establish that solving them would be of significant benefit to authors, and finally that solving them within HTML is the most appropriate course of action. >>>>> So why RDFa and not Microformats? >>> >>> (I think the question should be why RDFa is needed *as well as* µformats) >> >> This is correct. Microformats exist already. They solve current >> problems. Are there further problems that Microformats don't address >> which can be solved well by RDFa? Are these problems significant >> enough to authors to be worth addressing in the spec, or can we wait >> and let the community work out its own solutions further before we >> make a move? We generally want to wait until a given item is truly >> established before speccing it, so that we can work with existing > > Oh really? That's news to me. > > If this is principle we agree on that we really should start cutting lots of > things from the spec. It is a general principle, though not a necessary one. As Ian noted in his earlier email, speccing a solution too early runs the risk of solving the wrong problem, and then poisoning that area of the solution space entirely. If we wait for authors to develop their own hacks around features missing in the language, we can be sure that we're solving a problem authors want solved, and we have some measure of implementation experience already (even if just in author-deployed Javascript) that we can learn from. Other groups use this principle as well - browser vendors prefix their early versions of new CSS properties, for example, so that authors using these early versions don't poison the space and prevent problems from being addressed that would 'break' uses of the property. Most the additions in HTML5 are designed on this principle. For example, the <video> element and the additional values for <input> are drawn directly from javascript and flash-based solutions currently in use, with the intent to make them easier for authors to use. Others, such as the additional sectioning elements and the new header-parsing algorithm, were meant to embrace and bless well-established authoring practices (splitting your content into header, footer, and content <div>s, or building documents from smaller fragments which use <h1> and such with clear intent to create a . Finally, some additions (the Workers spec, the SQL spec) have little in the way of current-practice analogues (though much of those things are presaged in Gears, frex) because they are designed to specifically address a current lack and enable future uses. These, though, still solve well-defined problems and bring benefits which significantly outweigh their downsides. RDFa isn't a well-established authoring pattern needing to be blessed and made explicit. That means it's either a simplification of existing widespread hacks (Microformats?) intended to make authors lives easier, or it's intended to fill a gaping hole that can be established to be of significant benefit to authors to have filled. Either way, one needs some justification. Frex, it's possible that the language *could* be extended to make Microformat-type things easier to use for authors. We'd need to establish that Microformats (or some other embedded metadata) really are commonly used, though, and that the proposed simplification is really significant enough (existing validation and video libraries, for example, are much more complex to use than <input type="email"> or <video>, and can impose significant extra bandwidth costs which are undesirable). It's also possible that embedded metadata support *is* a gaping hole that needs to be filled. We'd still need to (a) establish the problem clearly (so we can evaluate possible solutions) and (b) decide that RDFa is a good solution to the problem as stated before we add it into the language. ~TJ
attached mail follows:
On Sat, 03 Jan 2009 04:52:35 +1100, Tab Atkins Jr. <jackalmage@gmail.com> wrote: > On Fri, Jan 2, 2009 at 12:12 AM, Charles McCathieNevile > <chaals@opera.com> wrote: >> On Fri, 02 Jan 2009 05:43:05 +1100, Andi Sidwell <andi@takkaria.org> >> wrote: >> >>> On 2009-01-01 15:24, Toby A Inkster wrote: >>>> >>>> The use cases for RDFa are pretty much the same as those for >>>> Microformats. >>> >>> Right, but microformats can be used without any changes to the HTML >>> language, whereas RDFa requires such changes. If they fulfill the >>> same use >>> cases, then there's not much point in adding RDFa. >> >> ... > > Why the non-response? Because the response comes in the next paragraph, to the first question that was worth asking. >>>> So why RDFa and not Microformats? >> >> (I think the question should be why RDFa is needed *as well as* >> µformats) > > This is correct. Microformats exist already. They solve current > problems. (Elsewhere in this thread you wrote [[[ It has not yet been established that there is a problem worth solving that metadata would address at all. ]]] Do you consider that µformats do not encode metadata? Otherwise, I am not sure how to reconcile these statements. In any case I would greatly appreciate clarification of what you think microformats do, since I do believe that microformats are very explicitly directed to allowing the encoding of metadata, anbd therefore it is not clear that we are discussing from similar premises). > Are there further problems that Microformats don't address > which can be solved well by RDFa? Are these problems significant > enough to authors to be worth addressing in the spec, or can we wait > and let the community work out its own solutions further before we > make a move? In my opinion, yes there are further problems µformats don't solve (that RDFa does), yes they are significant, and the community has come up with a way to solve them - RDFa. > Microformats are the metadata equivalent of Flash-based video players. > They are hacks used to allow authors to accomplish something not > explicitly accounted for in the language. Are there significant > problems with this approach? Yes. The problems are that they rely on precoordination on a per-vocabulary basis before you can do anything useful with the data. In practical usage they rely on choosing attribute names that hopefully don't clash with anything - in other words, trying to solve the problem of disambiguation that namespaces solves, but by choosing names that are wierd enough not to clash or by circumscribing the problem spaces that can be addressed to the extent that you can expect no clashes. (This is hardly news, by the way). > Is metadata embedding used widely enough > to justify extending the language for it, or are the current hacks > (Microformats, in this case) enough? Are current metadata embedding > practices mature enough that we can be relatively sure we're solving > actual problems with our extension? Current metadata embedding is done using µformats, and it's pretty clear that they are not sufficient. A large body of work uses RDF data models (Dublin Core, IMS, LOM, FOAF, POWDER are all large-scale formats. The people who are testing RDF engines with hundreds of millions of triples and more are doing it with real data, not stuff generated for the experiment). It is also clear that people would like to develop further small-scale formats, and that µformats through its requirement for community consultation is effectively too heavyweight for the purposes of many developers. > These are all questions that must > be asked of any extention to the language. > >>>> Firstly, RDFa provides a single unified parsing algorithm that >>>> Microformats do not. ... >> >>> This is not necessarily beneficial. If you have separate parsing >>> algorithms, you can code in shortcuts for common use-cases and thus >>> optimise the authoring experience. >> >> On the other hand, you cannot parse information until you know how it is >> encoded, and information encoded in RDFa can be parsed without knowing >> more. >> >> And not only can you optimise your parsing for a given algorithm, you >> can also do for a known vocabulary - or you can optimise the >> post-parsing treatment. > > What is the benefit to authors of having an easily machine-parsed > format? Assuming that the format is sufficiently easy to write, and to generate, I am not sure what isn't obvious about the answer to the question. (In case I am somehow very clever, and others aren't, the benefit is that it is easy to machine parse and use the information). > Are they greater than the benefits of a > format that is harder to parse, but easier for authors to write? For a certain set of authors, yes the benefits are greater. >>> Also, as has been pointed out before in the distributed extensibility >>> debate, parsing is a very small part of doing useful things with >>> content. >> >> Yes. However many of the use cases that I think justify the inclusion of >> RDFa are already very small on their own, and valuable when several >> vocabularies are combined. So being able to do off-the-shelf parsing is >> valuable, compared to working out how to parse a combination of formats >> together. > > Can you provide these use-cases? The discussion has an astonishing > dearth of use-cases by which we can evaluate the effectiveness of > proposals. The small-scale use cases are difficult to provide, since they are based on the fact that people do something quickly because they need it. One set of potential use cases is all the microformats that haven't been blessed by the µformats community as formally agreed "standards" - writing them in RDFa is sufficient to have them be usable. Another use case is noting the source of data in mashups. This enables information to be carried about the licensing, the date at which the data was mashed (or smushed, to use the older terminology from the Semantic Web), and so on. Another (the second time I have noted it in two emails) is to provide information useful for improving the accessibility of Web content. The set of use cases that led to the development of GRDDL are also use cases for RDFa - since RDFGa allows a direct extraction to RDF without having to develop a new parser for each data model, authors can simplify the way they extract data by using RDFa to encode it, saving themselves the bother of explaining how to extract it. This time saving means that they can afford to develop a smaller, more specialised vocabulary. > Is there any indication that use of > ambiguous names produces significant problems for authors? Not that I am aware of, although I think the question is poorly considered so I haven't given it much thought. There is plenty of evidence (for example the attempts to use Dublin Core within existing HTML mechanisms) that it causes problems for data consumers. >>>> It can be argued that going through a >>>> community to develop vocabularies is beneficial, as it allows the >>>> vocabulary to be built by "many minds" - RDFa does not prevent this, >>>> it >>>> just gives people alternatives to community development. >>> >>> RDFa does not give anything over what the class attribute does in >>> terms of >>> community vs individual development, so this doesn't really speak in >>> RDFa's >>> favour. >> >> In principle no, but in real world usage the class attribute is >> considered something that is primarily local, whereas RDFa is generally >> used by people who have a broader outlook on the desirable permanence >> and re-usability of their data. > > Can we extract a requirement from this, then? A poor formulation (I hope that those who are better at very detailed requirements can help improve my phrasing) could be: Provide an easy mechanism to encode new data in a way that can be machine-extracted without requiring any explanation of the data model. >>>> Lastly, there are a lot of parsing ambiguities for many Microformats. >>>> One area which is especially fraught is that of scoping. The editors >>>> of >>>> many current draft Microformats[1] would like to allow page authors to >>>> embed licensing data - e.g. to say that a particular recipe for a pie >>>> is >>>> licensed under a Creative Commons licence. However, it has been noted >>>> that the current rel=license Microformat can not be re-used within >>>> these >>>> drafts, because virtually all existing rel=license implementations >>>> will >>>> just assume that the license applies to the whole page rather than >>>> just >>>> part of it. RDFa has strong and unambiguous rules for scoping - a >>>> license, for example, could apply to a section of the page, or one >>>> particular image. >>> >>> Are there other cases where this granularity of scoping would be >>> genuinely >>> helpful? If not, it would seem better to work out a solution for >>> scoping >>> licence information... >> >> Yes. >> >> Being able to describe accessibility of various parts of content, or >> point >> to potential replacement content for particular use cases, benefits >> enormously from such scoping (this is why people who do industrial-scale >> accessibility often use RDF as their infrastructure). ARIA has already >> taken >> the approach of looking for a special-purpose way to do this, which >> significantly bloats HTML but at least allows important users to satisfy >> their needs to be able t produce content with certain information >> included. >> >> Government and large enterprises produce content that needs to be >> maintained, and being able to include production, cataloguing, and >> similar >> metadata directly, scoped to the document, would be helpful. As a >> trivial >> example, it would be useful to me in working to improve the Web content >> we >> produce at Opera to have a nice mechanism for identifying the original >> source of various parts of a page. > > Can we distill this into use-cases, then? Sure. It just takes a small amount of thinking. How many use cases would you think will be sufficient to demonstrate that this would be important. Or do you measure it by how many people each use case applies to? (It is far easier to justify the cost of developing use cases where there is more clarity about the goals for those use cases - and it enables people to decide whether to develop their own, or go find the people who are doing this and ask them to provide the information). > You, as an author, want to > be able to specify the original source of a piece of content. What's > the practical use of this? Does it require an embedded, > machine-readable vocabulary to function? Are existing solutions > adequate (frex, footnotes)? ... > Not quite. Specifically, is there any practical use for marking up > various sections of a site with licensing information specific to that > section *in an embedded, machine-readable manner*? Are the existing > solutions adequate (frex, simply putting a separate copyright notice > on each section, or noting the various copyrights on a licensing > page)? Let me treat these as the same question since I don't think they introduce anything usefully different between them. I will add to that Henri's questions about my use case for this already published elsewhere in this thread. A practical use case is in an organisation where different people are responsible for different parts of content. Instead of having to look up, myself, who is responsible for each piece, and what rights are associated with it, I can automate the process. (This is one of the value propositions offered by content management systems. I hope we can agree that these are sufficiently widely used to a priori assume a use case, but if not please say so). This means that instead of manually checking many pages for things like accessibility or being up to date, and then having to find which part of the page was produced by which part of the organisation (which is what I do at Opera) I can simply have this information trawled and presented as I please by a program (which many large organisations do, or partially do). Another example is that certain W3C pages (the list of specifications produced by W3C, for example, and various lists of translations) are produced from RDF data that is scraped from each page through a customised and thus fragile scraping mechanism. Being able to use RDFa would free authors of the draconian constraints on the source-code formatting of specifications, and merely require them to us the right attributes, in order to maintain this data. An example of how this data can be re-used is that it is possible to determine many of the people who have translated W3C specifications or other documents - and thus to search for people who are familiar with a given technology at least at some level, and happen to speak one or more languages of interest. This is at least as important to me in looking for potential people to recruit as any free-text search I can do - and has the benefit that while I don't have the resources to develop large-scale free-text searching, I do have the resources to develop simple queries based on a standardised data model and an encoding of it. Alternatively I could use the same information to seed a reputation manager, so I can determine which of the many emails I have no time to read in WHAT-WG might be more than usually valuable. cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
attached mail follows:
On Jan 1, 2009, at 17:24, Toby A Inkster wrote: > So why RDFa and not Microformats? There's a possibility that this is a false dichotomy and both are bad. > Firstly, RDFa provides a single unified parsing algorithm that > Microformats do not. Separate parsers need to be created for > hCalendar, hReview, hCard, etc, as each Microformat has its own > unique parsing quirks. For example, hCard has N-optimisation and ORG- > optimisation which aren't found in hCalendar. With RDFa, a single > algorithm is used to parse everything: contacts, events, places, > cars, songs, whatever. More to the point, Microformats not only require per-format processing but the processing required for each Microformat isn't specified at all. That's bad. RDFa, on the other hand, uses CURIEs, which is bad. (More generally, I think using URIs as identifiers instead of using them for above-TCP- layer protocol addressing is bad, but relying on the namespace mapping context is even worse.) Have there been any attempts to remove the badness of Microformats without introducing the badness of RDFa in the process? That is, have there been attempts of defining unified parsing while retaining the feel of Microformats without relying on the namespace mapping context from the layer below? If not, why not? I'm assuming that people in the Microformat community have clue. Yet, on the face of it, viewed from outside the community, their formats seem to have a big problem. Why hasn't the community fixed it? Is it a non-problem after all in practice? > Lastly, there are a lot of parsing ambiguities for many > Microformats. One area which is especially fraught is that of > scoping. The editors of many current draft Microformats[1] would > like to allow page authors to embed licensing data - e.g. to say > that a particular recipe for a pie is licensed under a Creative > Commons licence. However, it has been noted that the current > rel=license Microformat can not be re-used within these drafts, > because virtually all existing rel=license implementations will just > assume that the license applies to the whole page rather than just > part of it. RDFa has strong and unambiguous rules for scoping - a > license, for example, could apply to a section of the page, or one > particular image. Is the problem in the case of recipes that the provider of the page navigation around the recipe is unwilling to license the navigation bits under the same license as the content proper? In the case of images, why should a program inferring something about licensing trust assertions made in a different HTTP resource (possibly even from a different Origin)? -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
On 2/1/09 10:38, Henri Sivonen wrote: > More to the point, Microformats not only require per-format processing > but the processing required for each Microformat isn't specified at all. > That's bad. Some do have processing specified (at least to some degree): http://microformats.org/wiki/hcard-parsing For the rest, this seems like something fixable, so I'm not sure how this is more to the point? > That is, have > there been attempts of defining unified parsing while retaining the feel > of Microformats without relying on the namespace mapping context from > the layer below? I suppose - * http://microformats.org/wiki/design-patterns (reusable microformat components) * http://microformats.org/wiki/parsing-brainstorming (attempt to actually specify precise parsing rules for all microformats) * http://microformats.org/discuss/mail/microformats-discuss/2008-August/012435.html (proposal for specifying generic mapping of microformats to RDF - I think there's been more detailed work by various parties in this regard, but I'm not sure where best to link to) - are approaching this problem from three different angles. > Why hasn't the community fixed it? I think the microformats community moves slowly, for better or worse, even when it agrees that there's a problem to solve. For example, progress on the problems with the abbr-design-pattern has been snail-like while losing the community an important user (the BBC), although admittedly the problems are basically intractable in HTML4/XHTML1. I'm not sure how far the community as a whole does or doesn't view the lack of unified parsing as one of its bigger problems; I'm no spokesman though. > Is it a non-problem after all in practice? It's an additional barrier to creating and using (especially new) microformats or other extractable patterns. The microformats community isn't there to support the creation of new extractable patterns outside the microformats community, which is where an iguana database pattern would likely need to be. It could of course be the RDFa curie is worse than the disease. An advantage of RDFa that is not related to curies and for which the three approaches towards unified extraction mentioned above are not a substitute is that RDFa provides a generic way to include hidden machine-friendly equivalents to human-readable information in the form of the (not especially well-named) "content" attribute. http://www.w3.org/TR/rdfa-syntax/#rdfa-attributes In general, this is something microformats rightly try to avoid: http://microformats.org/wiki/principles But sometimes it's unavoidable: http://microformats.org/wiki/machine-data http://microformats.org/wiki/value-excerption-pattern-issues I do not believe that HTML5 as currently specified would remove the need to employ similar hacks as are mentioned on those pages, although it will remove the need in many cases (e.g. for datetimes within a given range), which is an improvement. > Is the problem in the case of recipes that the provider of the page > navigation around the recipe is unwilling to license the navigation bits > under the same license as the content proper? I thought Toby's example was that each recipe on the page needed a different licence, rather than a distinction between the main content area and the navigation. > In the case of images, why should a program inferring something about > licensing trust assertions made in a different HTTP resource (possibly > even from a different Origin)? Why should it trust assertions made in the same resource? For example, presumably you could download an image, change its licencing metadata, and host it at your own Origin? Admittedly, that's a little more work than just hotlinking. -- Benjamin Hawkes-Lewis
attached mail follows:
On Jan 2, 2009, at 14:01, Benjamin Hawkes-Lewis wrote: > On 2/1/09 10:38, Henri Sivonen wrote: >> More to the point, Microformats not only require per-format >> processing >> but the processing required for each Microformat isn't specified at >> all. >> That's bad. > > Some do have processing specified (at least to some degree): > > http://microformats.org/wiki/hcard-parsing That's still not a proper parsing spec. Do all microformat consumers with significant market share do it that way? > For the rest, this seems like something fixable, so I'm not sure how > this is more to the point? HTML parsing is fixable, too, but actually fixing it is something that didn't happen until the fixing effort was taken to the spec level. > * http://microformats.org/wiki/parsing-brainstorming (attempt to > actually specify precise parsing rules for all microformats) This one I hadn't seen before. It's clearly a step into a more spec- like direction. > It could of course be the RDFa curie is worse than the disease. I suspect that is the case. >> Is the problem in the case of recipes that the provider of the page >> navigation around the recipe is unwilling to license the navigation >> bits >> under the same license as the content proper? > > I thought Toby's example was that each recipe on the page needed a > different licence, rather than a distinction between the main > content area and the navigation. Oh. That can be solved by giving each recipe its own URI & HTML page and scraping those pages instead of summary pages that might contain multiple recipes. >> In the case of images, why should a program inferring something about >> licensing trust assertions made in a different HTTP resource >> (possibly >> even from a different Origin)? > > Why should it trust assertions made in the same resource? > > For example, presumably you could download an image, change its > licencing metadata, and host it at your own Origin? Admittedly, > that's a little more work than just hotlinking. Good point. That's a problem if you are examining a previously unknown and untrusted site that might have all its content copied from somewhere else. Trusting the origin of the data for it licensing does help, though, if you are browsing a site you believe to be reputable and clueful and want to automate the license discovery part only. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
On Mon, 05 Jan 2009 01:21:33 +1100, Henri Sivonen <hsivonen@iki.fi> wrote: > On Jan 2, 2009, at 14:01, Benjamin Hawkes-Lewis wrote: >> On 2/1/09 10:38, Henri Sivonen wrote: >>> Is the problem in the case of recipes that the provider of the page >>> navigation around the recipe is unwilling to license the navigation >>> bits under the same license as the content proper? >> >> I thought Toby's example was that each recipe on the page needed a >> different licence, rather than a distinction between the main content >> area and the navigation. > > Oh. That can be solved by giving each recipe its own URI & HTML page and > scraping those pages instead of summary pages that might contain > multiple recipes. Sure. In which case the problem becomes "doing mashups where data needs to have different metadata associated is impossible", so the requirement is "enable mashups to carry different metadata about bits of the content that are from different sources. A use case for this: There are mapping organisations and data producers and people who take photos, and each may place different policies. Being able to keep that policy information helps people with further mashups avoiding violating a policy. For example, if GreatMaps.com has a public domain policy on their maps, CoolFotos.org has a policy that you can use data other than images for non-commercial purposes, and Johan Ichikawa has a photo there of my brother's café, which he has licensed as "must pay money", then it would be reasonable for me to copy the map and put it in a brochure for the café, but not to copy the data and photo from CoolFotos. On the other hand, if I am producing a non-commercial guide to cafés in Melbourne, I can add the map and the location of the cafe photo, but not the photo itself. Another use case: My wife wants to publish her papers online. She includes an abstract of each one in a page, but because they are under different copyright rules, she needs to clarify what the rules are. A harvester such as the Open Access project can actually collect and index some of them with no problem, but may not be allowed to index others. Meanwhile, a human finds it more useful to see the abstracts on a page than have to guess from a bunch of titles whether to look at each abstract. cheers Chaals -- Charles McCathieNevile Opera Software, Standards Group je parle français -- hablo español -- jeg lærer norsk http://my.opera.com/chaals Try Opera: http://www.opera.com
attached mail follows:
Charles McCathieNevile ha scritto: > On Mon, 05 Jan 2009 01:21:33 +1100, Henri Sivonen <hsivonen@iki.fi> > wrote: >> On Jan 2, 2009, at 14:01, Benjamin Hawkes-Lewis wrote: >>> On 2/1/09 10:38, Henri Sivonen wrote: > >>>> Is the problem in the case of recipes that the provider of the page >>>> navigation around the recipe is unwilling to license the navigation >>>> bits under the same license as the content proper? >>> >>> I thought Toby's example was that each recipe on the page needed a >>> different licence, rather than a distinction between the main >>> content area and the navigation. >> >> Oh. That can be solved by giving each recipe its own URI & HTML page >> and scraping those pages instead of summary pages that might contain >> multiple recipes. > > Sure. In which case the problem becomes "doing mashups where data > needs to have different metadata associated is impossible", so the > requirement is "enable mashups to carry different metadata about bits > of the content that are from different sources. > > A use case for this: > > There are mapping organisations and data producers and people who take > photos, and each may place different policies. Being able to keep that > policy information helps people with further mashups avoiding > violating a policy. > > For example, if GreatMaps.com has a public domain policy on their > maps, CoolFotos.org has a policy that you can use data other than > images for non-commercial purposes, and Johan Ichikawa has a photo > there of my brother's café, which he has licensed as "must pay money", > then it would be reasonable for me to copy the map and put it in a > brochure for the café, but not to copy the data and photo from > CoolFotos. On the other hand, if I am producing a non-commercial guide > to cafés in Melbourne, I can add the map and the location of the cafe > photo, but not the photo itself. > It seems a scenario where a human should carefully evaluate each licence and perhaps put a careful and human readable prose into the mashed-up page, or a link to such a prose. Metadata may or may not be accurate (e.g. may be misplaced and not contain the whole license, or refer to a wrong kind of license, different from the one stated in the prose), but the whole prose (and perhaps only that) is legally binding for sure (I'm not aware of any international law recognizing metadata and/or machine-processable/machine-friendly extracted content as a valid legal agreement/notice - in your example, Johan Ichikawa might put the "must pay money" license in a span containing a metadata reference to a creative commons license, but only the "must pay money" license is surely valid as a legal notice, as far as I can tell). > Another use case: > My wife wants to publish her papers online. She includes an abstract > of each one in a page, but because they are under different copyright > rules, she needs to clarify what the rules are. A harvester such as > the Open Access project can actually collect and index some of them > with no problem, but may not be allowed to index others. Meanwhile, a > human finds it more useful to see the abstracts on a page than have to > guess from a bunch of titles whether to look at each abstract. > > I'm not strongly for one solution or the other in this case (an actual choice may depend on several considerations, such as harvesters reputation, or the need to use metadata anyway for private purposes), but this case might be addressed by embedding each abstract in an iframe, so that human users would get all of them in a single page, while a harvester would need to navigate each page to index/copy it, and a proper metadata might be put into each page, or each page might have a different rule to restrict access (e.g. through a robot file, or the Access-Control semantics, or any kind of white- or black- lists), specially to prevent a malicious harvester (that is one deliberately ignoring metadata and licenses) from accessing certain contents. WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Email.it offre alle aziende il servizio di Email Marketing con pacchetti di invio a 10.000 utenti a soli 250 Euro Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8352&d=9-1
attached mail follows:
Calogero Alex Baldacchino wrote: > My concern is: is RDFa really suitable for everyone and for Web > automation? My own answer, at first glance, is no. That's because > RDF(a) > can perhaps address nicely very niche needs, where determining how > much > data can be trusted is not a problem, but in general misuses AND > deliberate abuses may harm automation heavily If your agent isn't going to trust the data gleaned from RDFa, then why should it trust the data gleaned from the web page's natural language? If the page has been authored by a reprobate that cannot be trusted to put honest and correct data in a few RDFa attributes, why should we trust their prose text? An oft-quoted answer is that the prose text is "visible" whereas the RDFa is somehow "invisible". Apart from the fact that UIs which make use of data pulled in from RDFa will make this data visible, there is also the fact that RDFa, unlike an external RDF/XML file, or some metadata embedded in a <script> block, makes use of as much visible data as possible: visible links, visible text, etc. <p>My name is <span property="foaf:name" about="#me">Toby Inkster</span>.</p> If you can't trust someone to correctly mark up what their name is, then why trust them to mark up what deserves <em>phasis? Why believe the <address> they provide? What if the instance they marked up with <dfn> is not really the defining one? What if a <var> is really a constant? -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
attached mail follows:
Toby A Inkster ha scritto: > Calogero Alex Baldacchino wrote: > >> My concern is: is RDFa really suitable for everyone and for Web >> automation? My own answer, at first glance, is no. That's because RDF(a) >> can perhaps address nicely very niche needs, where determining how much >> data can be trusted is not a problem, but in general misuses AND >> deliberate abuses may harm automation heavily > > If your agent isn't going to trust the data gleaned from RDFa, then > why should it trust the data gleaned from the web page's natural > language? If the page has been authored by a reprobate that cannot be > trusted to put honest and correct data in a few RDFa attributes, why > should we trust their prose text? > If you sell computers but your site talks about cars I'll never buy a notebook from you; thus you're not cheating me, but yourself and damaging your business. But if you believe cars are searched more often than computers (just an example), one may use false metadata to cheat any UAs relying on metadata instead of prose, and take me on a store selling computers instead of cars. Reliability of metadata (with respect to the described data) is an issue separated from reliability of content: it's not up to any UA to understand AND filter content basing on the author being trusted to be saing the truth (such would be a form of censorship), but if I ask the UA to bring me a page talking about horses, I don't want it to bring me a page talking about v.i.a.g.r.a. (that's spam), thus it is up to any UA relying on metadata to understand AND filter them basing on their reliability. > An oft-quoted answer is that the prose text is "visible" whereas the > RDFa is somehow "invisible". Apart from the fact that UIs which make > use of data pulled in from RDFa will make this data visible, there is > also the fact that RDFa, unlike an external RDF/XML file, or some > metadata embedded in a <script> block, makes use of as much visible > data as possible: visible links, visible text, etc. > > <p>My name is <span property="foaf:name" > about="#me">Toby Inkster</span>.</p> > > If you can't trust someone to correctly mark up what their name is, > then why trust them to mark up what deserves <em>phasis? Why believe > the <address> they provide? What if the instance they marked up with > <dfn> is not really the defining one? What if a <var> is really a > constant? > I don't really need a proper markup to understand a name is a name, a variable is a variable, a definition is a definition, and so on; you can use plain text and I'll understand your content the same way. If one makes a mistake when combining a <dfn> with an anchor, the result may be a broken link, perhaps making me look for a better site. If one's misusing <var> or <em>, the worst possible consequence is a bad presentation, and a bad presentation can be an attempt to cheat a UA (as when people puts a lot of keywords in a page and style them with the same color as the background to cheat search engines), but such is only if it is a deliberate choice, not a misuse (and I'm concerning mainly on abuses) -- anyway, it is easier to cheat a UA by the mean of false metadata than cheating a human person by the mean of wrong markup. If some markup is like, <p>We sell <a href="www.cheatingcarseller.com" property="foaf:name" content="Toby Inkster">cars</a></p> in any advertisement, I'll notice it's about cars and I'll choice whether to follow it or not, basing on my interest at the moment, but if I query "Toby Inkster" to a semantic UA blindly relying on metadata, I might get a page of a cars webstore instead of your homepage (for instance). Furthermore, I started my replies from a Charles McCathieNevile's mail, explicitly talking about trusted data and (mainly) small use cases, not a wide-scale web automation. If there's no agreement about what kind of needs are best addressed by RDFa, maybe I have to agree with people saying that technology must grow and become more mature (or, at least, better understood) before it is merged into HTML5 specification (and 2023 is far enough to accomplish such a goal :-) ). And I re-throw my suggestion to map RDFa attribute to data-rdfa-* attributes and build RDFa processor plugins for most common browsers, to test HTML5 and RDFa convergence in a wider scale before having browser natively supporting RDFa in HTML5 documents (for the purpose of a test - but not only - I don't think "data-rdfa-property" vs "rdfa:property" vs "property" would be much of a problem). I'm not saying RDFa is a bad thing, or it is useless, I just don't think any kind of markup can fit perfectly the semantic of "random" content for the purposes of a "global", wide-scale and automatic classification of content. Best regards, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Incrementa la visibilita' della tua azienda con l'invio di newsletter e campagne email marketing. * Con investimento di soli 250 Euro puoi incrementare la tua visibilita' Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8350&d=4-1
attached mail follows:
Dan Brickley wrote: > While I'm unsure about the "commercial relationship" clause quite > capturing what's needed, the basic idea seems sound. Is there any > provision (or plans) for applying this notion to entire blocks of > markup, rather than just to simple hyperlinks? This would be rather > useful for distinguishing embedded metadata that comes from the page > author from that included from blog comments or similar. While that might be useful for natural language processing, for RDFa it is actually completely unneeded. The syntax of RDFa allows for blocks of markup to be made "invisible" by making an ancestor node into an XMLLiteral. For example, a comment might be marked up as: <section typeof="atom:Entry" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:atom="http://bblfish.net/work/atom-owl/2006-06-06/#"> <address rel="atom:author"> On <time property="atom:published" content="2009-01-10" >10 Jan 2009</time>, <a property="foaf:name" rel="foaf:page" href="http://joe.example.com">Joe Bloggs</a> wrote: </address> <div rel="atom:content"> <blockquote property="atom:xhtml"> <!-- The comment goes here. --> </blockquote> </div> </section> The RDFa processing instructions say that as the blockquote doesn't have an explicit datatype set, it is to be treated entirely as a string literal (if it doesn't have any child elements) or an XML literal (if it does), and that parsers must not look inside it for triples. Thus spammers can't use the comment form for stuffing triples into the page. It should be noted in this case that RDFa also allows natural language parsers to be made more useful. By looking at the RDFa which marks up the author's name and website, they may be able to determine that the comment has been written by someone other than the page's main author, and thus not afford it the same level of trust granted to the rest of the page. So the natural language processing can benefit from RDFa. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
attached mail follows:
Toby A Inkster ha scritto: > > It should be noted in this case that RDFa also allows natural language > parsers to be made more useful. By looking at the RDFa which marks up > the author's name and website, they may be able to determine that the > comment has been written by someone other than the page's main author, > and thus not afford it the same level of trust granted to the rest of > the page. So the natural language processing can benefit from RDFa. > That's true only if one can assume metadata are trustful, but they are only if they can be under a strict control, that is on a small-scale application. On a wider scale, one needs to make the opposite assumption, because it would or might be more common to find fake metadata with "honest" content (the prose of an advertisement does not lie, but related metadata can tell it's a different think to cheat a metadata-based UA), either because a site author can be a party to the spammer, or because authors can mess up metadata (yeah, they can mess up html code too, but that's either not a problem, because a UA can present the content as well, or it is but it might damage the author more than it may harm the user). If metadata are created/used for external consumption, they can be just ignored by authors, who instead may just copy&paste code or reuse templates in different contexts, without being able to set proper metadata for the new content. Thus UAs can't rely on metadata /in general/, while they might /in some/, small-scale scenarios. WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8547&d=10-1
attached mail follows:
Hi Steven, (cc www-archive, libby) Re the alumni/people page scenario, I asked on the whatwg list about whether html5 is attempting any particular mechanism for saying which bits of a page are 'comments' or untrusted. But it seems from Toby's reply that RDFa is quite handy here. I've been thinking about how one might use the hypertext path from http://www.w3.org/ to /People and ..etc/Alumni to indicate that they have the same creator/publisher. 1st idea - use a custom relation like 'alumniPage' 2nd idea - generalise that - 'staffInfoPage', 'aboutOrg page' 3rd idea - generalise further - use RDF to state that those pages have a dc:creator / foaf:maker which is the organization W3C 4th idea - use POWDER to claim that all pages matching some URI prefix have these properties I think 4. is probably the way to go, but haven't dug into current state of POWDER. The others would cause needless proliferation of properties and clutter each hyperlink with additional link-typing annotations. This would allow some Org (companies, nonprofits, whatever) to say in RDF on their homepage "all HTML pages whose URI matches http://eg.example.com/aboutus/*html" are pages whose foaf:maker is the organization whose homepage is http://eg.example.com/ and whose name is "E.G. Org.". The point of this being that we need a way of picking out those pages (and pieces of pages) whose provenance/source is the main publisher, versus other things on the site (or in the page) that might be user supplied. On w3.org, the msgid: proxy that includes all of lists.w3.org into www.w3.org is a good use case; but also various W3C-linked people, WG/IG members etc., have write access to bits of the site. In parallel to this I'm still exploring the xmldsig route. Here is a test (linked by wot:assurance from foaf.rdf) signing of my foaf file: http://danbri.org/foaf.rdf.sigdata ... although done with a random generated key that I didn't write the java code to manage properly. Use case for that is: how do we know whether to believe the foaf:tipjar property claim in http://danbri.org/foaf.rdf and buy danbri a book? Hope this makes some sense! So I think next step is to check out POWDER. http://www.w3.org/TR/2008/WD-powder-primer-20081114/ I think they're using GRDDL due to the need to include quoted fragments of full RDF within each site 'label', something that's ugly to do in pure RDF (we tried in the earlier WCL design)... cheers, Dan -------- Original Message -------- Subject: Re: [whatwg] Trying to work out the problems solved by RDFa Date: Sat, 10 Jan 2009 13:51:26 +0000 From: Toby A Inkster <mail@tobyinkster.co.uk> To: whatwg@lists.whatwg.org Dan Brickley wrote: > While I'm unsure about the "commercial relationship" clause quite > capturing what's needed, the basic idea seems sound. Is there any > provision (or plans) for applying this notion to entire blocks of > markup, rather than just to simple hyperlinks? This would be rather > useful for distinguishing embedded metadata that comes from the page > author from that included from blog comments or similar. While that might be useful for natural language processing, for RDFa it is actually completely unneeded. The syntax of RDFa allows for blocks of markup to be made "invisible" by making an ancestor node into an XMLLiteral. For example, a comment might be marked up as: <section typeof="atom:Entry" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:atom="http://bblfish.net/work/atom-owl/2006-06-06/#"> <address rel="atom:author"> On <time property="atom:published" content="2009-01-10" >10 Jan 2009</time>, <a property="foaf:name" rel="foaf:page" href="http://joe.example.com">Joe Bloggs</a> wrote: </address> <div rel="atom:content"> <blockquote property="atom:xhtml"> <!-- The comment goes here. --> </blockquote> </div> </section> The RDFa processing instructions say that as the blockquote doesn't have an explicit datatype set, it is to be treated entirely as a string literal (if it doesn't have any child elements) or an XML literal (if it does), and that parsers must not look inside it for triples. Thus spammers can't use the comment form for stuffing triples into the page. It should be noted in this case that RDFa also allows natural language parsers to be made more useful. By looking at the RDFa which marks up the author's name and website, they may be able to determine that the comment has been written by someone other than the page's main author, and thus not afford it the same level of trust granted to the rest of the page. So the natural language processing can benefit from RDFa. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
attached mail follows:
Henri Sivonen wrote: > eRDF is very different in not relying on attributes whose qname > contains the substring "xmlns". eRDF is very different in that it is incredibly annoying to use in real world scenarios (i.e. not hypothetical "Hello World" examples). Calogero Alex Baldacchino wrote: > I guess closing a language to every kind of "back-door changes" may be > in contrast with the principle of paving a cawpath. I also guess that, > if microformats experience (or the "realworld semantics" they claim to > be based on) had suggested the need to add a new element/attribute to > the language, a new element/attribute would have been added. But Microformats experience *does* suggest that new attributes are needed for semantics. Look at the debate around accessibility within Microformats which has been going on for ages. Because of the Microformats process of working *within* existing HTML standards it has not been solved, and I can't see a solution reaching consensus in the foreseeable future. HTML5's <time> goes part of the way to solving this, but it doesn't address the whole problem like RDFa's "content" attribute does. Another reason the Microformat experience suggests new attributes are needed for semantics is the overloading of an attribute (class) previously mainly used for private convention so that it is now used for public consumption. Yes, in real life, there are pages that use class="vcard" for things other than encoding hCard. (They mostly use it for linking to VCF files.) Incredibly, I've even come across pages that use class="vcard" for non-hCard uses, *and* hCard - yes, on the same page! As the Microformat/POSHformat space becomes more crowded, accidental collisions in class names become ever more likely. The Microformats community hasn't added any new attributes for Microformats, because that was one of the guiding principles when the community was established: however, that does not mean it hasn't shown that new attributes are needed for encoding rich semantics in HTML. On the contrary, I think it's proved that they are. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
attached mail follows:
On 2009-01-12 23:15, Toby A Inkster wrote: > Henri Sivonen wrote: > >> eRDF is very different in not relying on attributes whose qname >> contains the substring "xmlns". > > > eRDF is very different in that it is incredibly annoying to use in real > world scenarios (i.e. not hypothetical "Hello World" examples). > > Calogero Alex Baldacchino wrote: > >> I guess closing a language to every kind of "back-door changes" may be >> in contrast with the principle of paving a cawpath. I also guess that, >> if microformats experience (or the "realworld semantics" they claim to >> be based on) had suggested the need to add a new element/attribute to >> the language, a new element/attribute would have been added. > > But Microformats experience *does* suggest that new attributes are > needed for semantics. Look at the debate around accessibility within > Microformats which has been going on for ages. Because of the > Microformats process of working *within* existing HTML standards it has > not been solved, and I can't see a solution reaching consensus in the > foreseeable future. HTML5's <time> goes part of the way to solving this, > but it doesn't address the whole problem like RDFa's "content" attribute > does. Right, so some microformats brought to attention a need which HTML5 could easily solve by adding <time>. Why does this mean that RDFa should be added? > Another reason the Microformat experience suggests new attributes are > needed for semantics is the overloading of an attribute (class) > previously mainly used for private convention so that it is now used for > public consumption. But HTML4 itself says that class can be used "for general purpose processing by user agents", so this seems to be a weird argument. If we introduced RDFa and it got used, would you argue you need something more than RDFa, because it is being used for what it is specced for? > Yes, in real life, there are pages that use > class="vcard" for things other than encoding hCard. (They mostly use it > for linking to VCF files.) Incredibly, I've even come across pages that > use class="vcard" for non-hCard uses, *and* hCard - yes, on the same > page! As the Microformat/POSHformat space becomes more crowded, > accidental collisions in class names become ever more likely. Right, but is it much of an issue? If you have a hCard extractor, the user can see easily that it's not useful data. And if doesn't follow any of the other rules for an hCard, then the UA can safely ignore it (e.g. it has no fields). In practice, this kind of collision seems fairly non-problematic. > The Microformats community hasn't added any new attributes for > Microformats, because that was one of the guiding principles when the > community was established: however, that does not mean it hasn't shown > that new attributes are needed for encoding rich semantics in HTML. On > the contrary, I think it's proved that they are. Given that the only example of the microformats process needing an addition to the HTML language has been <time>, I'm not sure that's a conclusive proof. Andi
attached mail follows:
Toby A Inkster ha scritto: > > Another reason the Microformat experience suggests new attributes are > needed for semantics is the overloading of an attribute (class) > previously mainly used for private convention so that it is now used > for public consumption. Maybe this is true, but, personally, I prefere this approach to the addition of new features/attributes/elements to an official specification without a clear support requirement for UAs beside just parsing. A similar (if not stronger) argument may be raised against the reuse of the content attribute in the context of RDFa, which I think has caused a significant change with respect to its original semantics (now it should be shared by every element, originally it was a <meta> specific attribute; now it should be part of an RDF _triple_, in origin it was - and is still - part of a _pair_ when used in conjunction with the "name" attribute, and constitutes a pragma directive in conjunction with the "http-equiv" attribute, which is somehow closer to an XML processing instruction than to an RDF triple - the same applies to a <link> with rel="stylesheet", for instance). > Yes, in real life, there are pages that use class="vcard" for things > other than encoding hCard. (They mostly use it for linking to VCF > files.) Incredibly, I've even come across pages that use class="vcard" > for non-hCard uses, *and* hCard - yes, on the same page! As the > Microformat/POSHformat space becomes more crowded, accidental > collisions in class names become ever more likely. > Indeed, that's a possible source of troubles. I think that's the same if people misused prefixes, e.g. if after merging some content from different documents they got a different namespace binded to a previously declared prefix in a scope where both namespaces are involved (in an xhtml document). Also, a custom script may distinguish between different uses of "vcard" by the mean of a further, private classname, or by enveloping elements in containers (divs) with proper ids, which may be a good solution in some cases, and not in other ones; a more generic parser, being specialized by design, has a chance to recognize a correct structure for a given format and to discard wrong informations, which may work fine in some cases, but not in others. As always, each choice has its own downsides, and what counts is the costs/benefits ratio; it seems that any solution not requiring to be supported has the lowest costs for UA implementors. I do not doubt xml extensibility (which effectively is the base of curies) has its own benefits, it's flexible and suitable for a quick developement of custom solutions, but it's also got its own downsides, such as leading to a possible heavy fragmentation, being difficoult to understand and use for many people (who are usually fooled by the concept of namespaces) and thus potentially causing misuses and errors. It doesn't seems that xml extensibility brought more benefits than costs, and a proof can lay in the majority of the web not having followed the envisioned xml-alike evolution. Anyway, I'm not strongly against RDFa in HTML, instead, I can be quite neutral (I'd live with it); I'm not convinced it is worth to add it to the spec at this stage and until it would be possible to establish what UAs must do with them beside parsing (and how to deal with namespaces while parsing). Also, I'm not fully convinced by the need to embed metadata in a page and keep them in sync with that page. For instance, it require that every page reporting the same informations must duplicate the same metadata structure, and this doesn't grant that those informations, in first place, are in sync with real world (some pages might be out-of-date, others might be up-to-date). Instead, a separate file containing metadata to be linked when appropriate might solve both the problems: it doesn't require duplicates and can have a somewhat versioning to keep trace of changes and to present updated machine-friendly information to help users visiting an outdated page (assuming users can trust those metadata). Of course, this solution has its own downsides too. WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Blu American Express: gratuita a vita! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8615&d=4-2
attached mail follows:
1. THE WAR OF THE WORLDS The Semantic Web is based on the concept of being able to express in a standard format relations between different data and different entities (including real-world entities) [1]. Today this is mainly based on RDF. The traditional web focuses on web-pages ranging from interchange of static documents to dynamic applications [2]. Today this is mainly based on HTML. It is clear that there is a missing link between this two worlds, because the data are, most of the time, included in documents in ways that are non-structured and non-standardized or the data is managed by applications in a proprietary/closed manner. The risk is that the two worlds (semantic web and traditional web) instead of collaborate will be in competition. Today we are witnessing a lot of talks (not only in this list) where there are visionary-supporters of the "unspecified wonders" of the "semantic-web-that-will-be" which are opposed by pragmatic-supporters of Traditions that believes in "unlimited-evolutionism" of full-text searches and consider a "titanic&impossible" enterprise the rdf-izing of the world :-). But this conflict has any reason to exist? We need to leave, on both sides, all preconceived positions. I believe the two worlds are being developed as separate reality and this is a concrete problem that we have to resolve. Today we have the opportunity to do so with HTML5. 2. LOWER THE BARRIER It is clear that publishing simple web documents and applications it is easier that structuring information in a semantic manner but we must find ways to make this possible in a unified framework: documents + applications + semantics = HTML5 If we want the promises of semantic web to become a reality we must lower the entry level for generic users. HTML5 certainly must not solve problems that today we can't prefigure. But there is clearly a problem that HTML5 faces today: there is no widespread use of semantic tools because the barrier to use them is too high for users. This is the main reason behind the fact that semantic web is being developed as a world in itself, mainly yet academic. As well as the original HTML has enabled users to easily publish hypertext documents, today HTML5 must allow users to easily semantify their data, documents & applications. At the moment, an user who wants to create/use semantically structured informations finds browsers that, natively, don't give him solutions to do that. The user is forced to move in a "jungle" of tools (without GUI or with poor usability), plugins and languages that are not widespread standards. Exactily the same situation faced by an user who had tried to create hypertexts in 1990. 3. LINKS AND BEYOND As well as the power of the traditional web is in "hypertextual-links" among documents identified by URLs, the power of semantic web is in "semantic-links" between documents/data/entities identified by URL/URI. We must give users an easy way to create these semantic-links, in a way that is as simple as creating classic hyperlinks. Semantic-links could be collected by search engines (machines) to enhance their functionalities, and could be used in other automatic processing. But, first of all, can represent a big value for the browser's user (human) if we find in HTML5 a standard way to visualize/interact with these semantic-links. We could define a "semantic-link" as a connection to "semantically structured informations" (embedded or in external resource), that is presented to the user in a fashion similar (but not the same) to classic hyperlinks. A semantic-link could be considered as a sort of "semantic annotation" enhancing the main content delivered to the user and enabling him further interactions with "linked data". We absolutely need for this a "common minimum standard" although nothing will prevents to continue developing additional or alternative ways of visualization/interaction (via plugins, proprietary implementations in browsers, new languages versions). 4. OVERVIEW OF SCENARIO'S USE CASES With respect to use cases, are certainly to be considered all the use cases developed by RDFa [3] but also those developed by the "Semantic Web Activity" [4], and other could be derived for each one of microformats [5] or in the scenarios described by Adrian Holovaty in the article "A fundamental way newspaper sites need to change" [6]. For example, would be interesting to have a standard for a) structuring b) normally visualize in the page (via CSS) b) have the possibility to interact/manipulate via the browser, the data present in "Wikipedia's Infobox" [7]. Another example could be a standard for the visualization of "access doors" to semantically structured informations "hidden" in the pages and the "possible user's actions" (see "IE8 Activities" [8])? Other interesting issues, in terms of user interface, are raised by Alex Faaborg in the article "User interface of microformat detection" [9], and from the fact that we need something more user-friendly and standardized of "bookmarklets" [10], from the fact that structured information can improve features in scenarios raised by projects like Ubiquity [11], and, last but not least, some evaluation recently exposed by Ian Hickson in WHATWG [12]. 5. TWO REAL PROBLEMS I think it's good, first of all, to abstract from single use cases depicted above and find a solution to two fundamental problems that lie at the root of the use cases, two problems that, today, have no solution in the current version of HTML: I) User agents must allow users to see that there are "semantic-links" (connections to semantically structured informations) in a HTML document/application. Consequently user agents must allow users to "follow" the semantic-link, (access/interact with the linked data, embedded or external) and this involves primarily the ability to: a) view the informations b) select the informations c) copy the informations in the clipboard d) drag and drop the informations e) send that informations to another web application (or to OS applications) selected by the user. II) User agents must allow users to "semantically annotate" an existing HTML document (insert a semantic link and linked data) and this involves primarily the ability to: a) editing the document to insert semantically structured informations (starting from the existing text or from information already structured in the edited portion of the page) b) send the result of the editing to another web application (or to OS applications) selected by the user. Solving the first problem we will spread to *all* users the possibility to access the semantic web in normal browser (target impossible to achieve simply through microformats & plugins and without an effective standard incorporation in HTML). Solving the second problem we will spread to *all* users (to all interested users) the possibility to access the semantic potential at personal level (for examle build an archive of personal semantic annotation) and at social level (for example contribute to collective effort to "semantify" originally unstructured web resources). 6. SEARCHING POSSIBLE SOLUTIONS The first solution that we can think of is a new attribute @semantic (don't focus on his name) used like this <A href =".." semantic =".." class =".." <DIV semantic =".." class =".." in @semantic we can have: a) URL of a resource that semantically describes the content (in RDF, RDFa, JSON, CSV) like this semantic="http://www.foo.com/desc.rdf" b) direct semantically structured information, in @style manner, probably something like this (thinking at RDFa) semantic="property: ..; about: ..;" Furthermore, in the hypothesis of some sort of "Cascading Semantics" (see for example cRDF [13]) we can also think to create a new element SEMANTIC like this <SEMANTIC Type=".."> ...</ SEMANTIC> to embed semantically structured information along the way in a CSS manner in several format. Naturally we need further investigation on *all points*. But, probably, we need some new properties/elements because not all the exposed problems are simply solvable through a generic extension mechanism [14] that makes possible to insert RDFa in HTML. A generic extension mechanism remains desirable for other reasons (MathML, SVG, etc.), but we need also a very different thing, set in the heart of HTML, that makes it possible to bridge the gap between the two worlds of semantic Web and traditional web... to make them become one. [1] http://www.w3.org/2001/sw/ [2] http://dev.w3.org/html5/spec/Overview.html#scope [3] http://www.w3.org/TR/xhtml-rdfa-scenarios/ [4] http://www.w3.org/2001/sw/sweo/public/UseCases/ [5] http://microformats.org/wiki/Main_Page [6] http://www.holovaty.com/blog/archive/2006/09/06/0307 [7] http://en.wikipedia.org/wiki/Help:Infobox [8] http://blogs.msdn.com/ie/archive/2008/03/06/activities-and-webslices-in-internet-explorer-8.aspx [9] http://blog.mozilla.com/faaborg/2007/02/04/microformats-part-4-the-user-interface-of-microformat-detection/ [10] http://en.wikipedia.org/wiki/Bookmarklet [11] http://ubiquity.mozilla.com/ [12] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-December/018023.html [13] http://www.xanthir.com/rdfa-vs-crdf.php [14] http://www.w3.org/html/wg/tracker/issues/41 -- Giovanni Gentili - giovanni.gentili@gmail.com
attached mail follows:
Tom Morris wrote: > You can do that already with HTML 4 and XHTML 1.x using GRDDL. GRDDL > no longer works in HTML 5 as the profile attribute has been removed. > (We get some nonsense about GRDDL still working but just not > 'requiring' profile. This is nonsense. It's a bit like saying that > you've taken the wheels off the car but it still works because you can > turn the engine on.) I think much of this nonsense has arisen because GRDDL effectively uses the profile attribute for two purposes, and that has gotten people confused. By taking care of one of the purposes in HTML5, it has been assumed that GRDDL will thus work in HTML5; tick it off the list; done; taken care of. GRDDL uses the profile attribute firstly as a flag which says "this page has some GRDDL transformations linked from it". HTML5 has said that all pages may have GRDDL transformations linked from them, thus this flag is not needed in HTML5. Fair enough, you can say "this works in HTML5 without requiring a profile" and that will work. It introduces incompatibilities between how GRDDL is processed in HTML5 and how it is processed in earlier versions of (X)HTML, which is annoying (particularly as XHTML does not require a DOCTYPE, so there is no easy way of differentiating between XHTML5 and earlier versions of XHTML) but still doable. But GRDDL uses the profile attribute in another manner: a GRDDL agent is supposed to loop through all the page's profiles, perform an HTTP request for each one, and use the data it finds in them. If you say "this works in HTML5 without requiring a profile" then that is clearly nonsense. To loop through the profiles, there has to be some profiles! rel="profile" does work as a substitute here, but again it introduces inconsistencies between HTML5 and previous versions of (X) HTML. This syntax change for linking to profiles seems to be an entirely gratuitous one: if profiles are going to be allowed, then why change the syntax for them? As an analogy, there are many very good arguments both ways for dropping or retaining the <b> and <i> elements, but to suggest renaming them instead to <bold> and <italic> would be silly - even if the unabbreviated names would be clearer, the headaches caused by the syntax change would be massive. Similarly, rel="profile" does seem like a slightly nicer syntax than the profile attribute, but that small advantage comes at a cost of breaking compatibility with existing tools that use profiles. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
attached mail follows:
Calogero Alex Baldacchino wrote:
> The concern is about every kind of metadata with respect to their
> possible uses; but, while it's been stated that Microforamts (for
> instance) don't require any purticular support by UAs (thus they're
> backward compatible), RDFa would be a completely new feature, thus
> html5
> specification should say what UAs are espected to do with such new
> attributes.
RDFa doesn't require any special support beyond the special support
that is required for Microformats. i.e. nothing. User agents are free
to ignore the RDFa attributes. In that sense, RDFa already "works" in
pretty much every existing browser, even going back to dinosaurs like
Mosaic and UdiWWW.
Agents are of course free to offer more than that. Look at what they
do with Microformats: Firefox for instance offers an API to handle
Microformats embedded on a page; Internet Explorer offers its "Web
Slices" feature.
> For what concerns html serialization, in particular, I'd consider
> some code like [...] which is rendered properly
Is it though? Try adding the following CSS:
span[property="cal:summary"] { font-weight: bold; }
And you'll see that CSS doesn't cope with a missing ending tag in
that situation either.
If you miss out a non-optional end tag, then funny things will happen
- RDFa isn't immune to that problem, but neither is the DOM model,
CSS, microformats, or anything else that relies on knowing where
elements end. A better comparison would be a missing </p> tag, which
is actually allowed in HTML, and HTML-aware RDFa processors can
generally handle just fine.
> considering RDFa relies on namespaces (thus,
> adding RDFa attributes to HTML5 spec would require some features from
> xml extensibility to be added to html serialization).
RDFa *does not* rely on XML namespaces. RDFa relies on eight
attributes: about, rel, rev, property, datatype, content, resource
and typeof. It also relies on a CURIE prefix binding mechanism. In
XHTML and SVG, RDFa happens to use XML namespaces as this mechanism,
because they already existed and they were convenient. In non-XML
markup languages, the route to define CURIE prefixes is still to be
decided, though discussions tend to be leaning towards something like:
<html prefix="dc=http://purl.org/dc/terms/ foaf=http://xmlns.com/foaf/
0.1/">
<address rel="foaf:maker" rev="foaf:made">This document was made by
<a href="http://joe.example.com" typeof="foaf:Person"
rel="foaf:homepage" property="foaf:name">Joe Bloggs</a>.</address>
</html>
This discussion seems to be about "should/can RDFa work in HTML5?"
when in fact, RDFa already can and does work in HTML5 - there are
approaching a dozen interoperable implementations of RDFa, the
majority of which seem to handle non-XHTML HTML. Assuming that people
see value in RDFa, and assuming that the same people see value in
using HTML5, then these people will use RDFa in HTML5. The question
we should be discussing is not "should it work?" (because it already
does), but rather, "should it validate?"
--
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>
attached mail follows:
Toby A Inkster wrote: > Calogero Alex Baldacchino wrote: > >> The concern is about every kind of metadata with respect to their >> possible uses; but, while it's been stated that Microforamts (for >> instance) don't require any purticular support by UAs (thus they're >> backward compatible), RDFa would be a completely new feature, thus html5 >> specification should say what UAs are espected to do with such new >> attributes. > > RDFa doesn't require any special support beyond the special support that > is required for Microformats. i.e. nothing. User agents are free to > ignore the RDFa attributes. In that sense, RDFa already "works" in > pretty much every existing browser, even going back to dinosaurs like > Mosaic and UdiWWW. > > Agents are of course free to offer more than that. Look at what they do > with Microformats: Firefox for instance offers an API to handle > Microformats embedded on a page; Internet Explorer offers its "Web > Slices" feature. > If it is true that RDFa can work today with no ill-effect in downlevel user-agents, what's currently blocking its implementation? Concern for validation? It seems to me that many HTML extensions are implemented first and specified later[1], so perhaps it would be in the interests of RDFa proponents to get some implementations out there and get RDFa adopted, at which point it will hopefully seem a much more useful proposition for inclusion in HTML5. In the short term the RDFa community can presumably provide a specialized "HTML5 + RDFa" validator for adopters to use until RDFa is incorporated into the core spec and tools. It would seem that it's much easier to get into the spec when your feature is proven to be useful by real-world adoption. [1] canvas, keygen, frames and script are examples of this phenomenon.
attached mail follows:
Martin Atkins wrote: > ... > If it is true that RDFa can work today with no ill-effect in downlevel > user-agents, what's currently blocking its implementation? Concern for > validation? > > It seems to me that many HTML extensions are implemented first and > specified later[1], so perhaps it would be in the interests of RDFa > proponents to get some implementations out there and get RDFa adopted, > at which point it will hopefully seem a much more useful proposition for > inclusion in HTML5. > > In the short term the RDFa community can presumably provide a > specialized "HTML5 + RDFa" validator for adopters to use until RDFa is > incorporated into the core spec and tools. > > It would seem that it's much easier to get into the spec when your > feature is proven to be useful by real-world adoption. > ... What he said. Although I *do* believe that in the end we'll want RDFa-in-HTML5, what's really important right now is *not* RDFa-in-HTML5 but RDFa-in-HTML4. Define that, make it a success, and the rest will be simple. Best regards, Julian
attached mail follows:
Toby A Inkster ha scritto:
> Calogero Alex Baldacchino wrote:
>
>> The concern is about every kind of metadata with respect to their
>> possible uses; but, while it's been stated that Microforamts (for
>> instance) don't require any purticular support by UAs (thus they're
>> backward compatible), RDFa would be a completely new feature, thus html5
>> specification should say what UAs are espected to do with such new
>> attributes.
>
> RDFa doesn't require any special support beyond the special support
> that is required for Microformats. i.e. nothing. User agents are free
> to ignore the RDFa attributes. In that sense, RDFa already "works" in
> pretty much every existing browser, even going back to dinosaurs like
> Mosaic and UdiWWW.
>
> Agents are of course free to offer more than that. Look at what they
> do with Microformats: Firefox for instance offers an API to handle
> Microformats embedded on a page; Internet Explorer offers its "Web
> Slices" feature.
>
Well, at the beginning of this thread the possible need to interchange
RDF metadata and merge triples from different vocabularies was suggested
as a use case for RDFa serialization of RDF, and this would hint a
requirement for supporting an RDFa processor in every conforming UA.
This also opens a question about what else might be needed beside
collecting triples (is an API to build custom query applications enough,
or should some query feature be provided by browsers? are there possible
problems involved (like possible spam through fake metadata in cached
ads)? possible solutions to prevent or moderate it?).
If, otherwise, nothing special must be done by browsers with RDFa
attributes, and instead their main use is for script or plugin or
server-side computations, or for "free" support by UA, these ones would
be no way different from any other kind of custom attributes (thus
should a validation requirement be let's accept every attribute?),
herein included data-*, but for the /intended use/, which may make the
difference but is something only a human can understand, and no
validator can check (from this point of view, validating RDFa
attributes, whatever else attribute, or just html5 attributes and custom
data-* ones would be the same, as validating would not be a concern as
it isn't for proprietary CSS extensions).
>> For what concerns html serialization, in particular, I'd consider
>> some code like [...] which is rendered properly
>
>
> Is it though? Try adding the following CSS:
>
> span[property="cal:summary"] { font-weight: bold; }
>
> And you'll see that CSS doesn't cope with a missing ending tag in that
> situation either.
>
> If you miss out a non-optional end tag, then funny things will happen
> - RDFa isn't immune to that problem, but neither is the DOM model,
> CSS, microformats, or anything else that relies on knowing where
> elements end. A better comparison would be a missing </p> tag, which
> is actually allowed in HTML, and HTML-aware RDFa processors can
> generally handle just fine.
That's definetely *not* the same issue. As I've replied in a previous
mail, people *do not* need proper styling to understend prose, they just
need to understand the prose language, then their /brains/ will cope
with the rest, thus the above example results in some acceptable
graceful degradation (it may or may not be the wanted presentation,
depending on where the closing </span> was to be positioned (it wouldn't
be the right presentation in this case), but it is not too harmful
anyway). Bots based on metadata, instead *do need* reliable metadata to
work properly, unless they're made smart enough to debug the code
they're fed (should Artificial Intelligence be a requirement? - no
sarcasm here).
If broken/wrong presentation caused by a missing end tag had ever been
an issue, html-serialization would have been deprecated in favour of
xml-one (if something really "problematic" happened, authors would
notice it on their very first test by opening a page in a browser,
whereas an extensive and complete debug for triples might be an odd
problem in a large document). In contrast with that, any break in
metadata semantics caused by html-serialization can only be a sever
issue for a metadata-based bot (because it needs accurate metadata,
while a non-very-accurate presentation is not a great concern for human
beings in most cases, and if no particular presentation is attached to
those spans, but instead they're used just to add semantics through
metadata, as it happens to embedd RDF through RDFa attributes, a
side-effect may arise), thus html-serialization may be more prone to
side-effects than xml-serialization (which stops on validation errors,
being in turn a possible cause for side-effects with metadata), from
this point of view -- that is, since RDFa semantics is more reliable in
a more well-formed document, xml-serialization might help to debug some
errors, while it is not a strict requirement for content presentation,
and instead finding more or less emboldened words is better for users
than finding a page which is not rendered at all, thus the differences
between xhtml and html.
But if it's or will be agreed that inaccurate metadata are reliable, or
that uncertain reliablility is not an issue for wide-scale semantic web
applications, well, I really don't know what to say apart than I just
have a different opinion.
However, that was just the first example I was able to produce just to
give an idea; better examples can surely be thought out. What if, for
instance, foster parenting or adoption agency caused metadata to be put
far from (part of) their correspondent data? Style is inherited, but a
wrong triple is a wrong triple (from this perspective, a parse error
/might/ highlight some misplaced metadata more quickly than a raw debug
of triples).
My point is that html-serialization is enough robust with respect to
presentational issues, in most cases (it's the same for non-screen
media), but it might not be the same for RDFa modelled metadata, which
require a greater "well-formedness" than content presentation to be
enough reliable, since RDFa is conceived with the purpose to allow RDF
serialization into xml documents in first place, without the possible
validation problems arising by direct use of xml-serialized RDF, and as
an alternative to RELAX NG (since strict xml parsers, as for xhtml, are
more diffused) -- it's in the first chapter of RDFa specification:
"1.Motivation".
That is, RDFa is born as an xml-related feature in primis, thus I think
that concerning whether it can work as well in another kind of document
(not if it may work, but if it may work as well in different documents
or if it can work better in some than in other ones) is legitimate -- of
course the same concern may apply to eRDF as well as to other kinds of
metadata.
>
>> considering RDFa relies on namespaces (thus,
>> adding RDFa attributes to HTML5 spec would require some features from
>> xml extensibility to be added to html serialization).
>
>
> RDFa *does not* rely on XML namespaces. RDFa relies on eight
> attributes: about, rel, rev, property, datatype, content, resource and
> typeof. It also relies on a CURIE prefix binding mechanism. In XHTML
> and SVG, RDFa happens to use XML namespaces as this mechanism, because
> they already existed and they were convenient. In non-XML markup
> languages, the route to define CURIE prefixes is still to be decided,
> though discussions tend to be leaning towards something like:
>
> <html prefix="dc=http://purl.org/dc/terms/
> foaf=http://xmlns.com/foaf/0.1/">
> <address rel="foaf:maker" rev="foaf:made">This document was made by <a
> href="http://joe.example.com" typeof="foaf:Person" rel="foaf:homepage"
> property="foaf:name">Joe Bloggs</a>.</address>
> </html>
>
Well, yes, that's a possible solution to be considered. Anyway, that
would require (at least) another new attribute to be specc'ed out, with
possible new concerns. For instance, a missing space between prefix/URI
pairs might compromise its good parsing (while space separated curies,
for instance, being shorter than absolute URIs, can focus a major
attention on typing errors in hand-written code, but this is a
subtlety), thus a separate attribute for each URI might be more robust
(for instance something like xmlns-* or just ns-* in the <html> tag,
similar to xmlns:* but not clashing with xml namespace mechanism, on the
same line as data-* but with a different "scope"). Also, something like
the eRDF use of <link> elements to declare namespaces (or mappings from
prefixes to curies, to be more consistent with RDFa conventions) inside
the head element might work, because an html document is likely to
present such declarations once at the beginning. However, each solution
would have its own "pros" and "cons", wile xml namespaces perfectly fit
the purpose, even because (one of) their main use is to represent
prefixed attributes or elements names taken from an RDF vocabulary which
is in turn an XML 'format' and to embed them in another kind of
document, that is to represent something coming from a different namespace.
> This discussion seems to be about "should/can RDFa work in HTML5?"
> when in fact, RDFa already can and does work in HTML5 - there are
> approaching a dozen interoperable implementations of RDFa, the
> majority of which seem to handle non-XHTML HTML. Assuming that people
> see value in RDFa, and assuming that the same people see value in
> using HTML5, then these people will use RDFa in HTML5. The question we
> should be discussing is not "should it work?" (because it already
> does), but rather, "should it validate?"
>
There should be also people seeing value in eRDF, at least enough people
for eRDF being supported by SearchMonkey. It is sure that these people
see value in using eRDF within html documents, since eRDF is conceived
to work with HTML "natively", that is without any need to change HTML
(by introducing new attributes or using unrecognized ones);
nevertheless, eRDF can't be valid HTML5 because of the "profile"
attribute, which has been dropped. Should eRDF validate instead? Should
we prefere eRDF to RDFa or viceversa? Should we treat them the very same
way? Or should we just wait and see which one works better for people,
to avoid an early specification of something later possibly
demonstrating to be less useful than originally thought, for instance
because most people decided to use something else?
WBR, Alex
--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f
Sponsor:
Con Danone Activia, puoi vincere cellulari Nokia e Macbook Air. Scopri come
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8551&d=12-1
attached mail follows:
Ian Hickson wrote: > >> The question we should be discussing is not "should it work?" (because >> it already does), but rather, "should it validate?" > > No, the question is "what problem are we solving?". Talking about RDFa, > RDF, eRDF, Microformats, and so forth doesn't answer this question. > > The question "should it validate" is the question "do we want to solve the > problem and is this the right solution", which is a question we can't > answer without actually knowing what the problem is. > > So far, all I really know is that the problem is apparently obvious. > My understanding of the use-case, based on discussions so far, is: - Allow authors to embed annotations in HTML documents such that RDF triples can be unambiguously extracted from human-readable data without duplicating the data, and thus ensuring that the machine-readable data and the human-readable data remain in sync. The disconnect you're facing is that the proposers of RDFa consider the ability to encode RDF triples to be a goal, while you consider RDF triples to be a solution to a (as-yet-undetermined) higher-level problem. They take RDF as a given, while you do not. They have already solved some problems with RDF and wish only to adapt this generalized solution to work in HTML, while you wish to re-solve all of these problems from the ground up. Would you agree with this analysis? If this is accurate, then it's difficult to see how continued discussion on this topic can be productive.
attached mail follows:
Ian Hickson wrote:
>
>> They have already solved some problems with RDF and wish only to adapt
>> this generalized solution to work in HTML, while you wish to re-solve
>> all of these problems from the ground up.
>
> I don't necessarily wish to resolve the problems -- if they have existing
> good solutions, I'm all in favour of reusing them. I just want to know
> what those problems are that we're solving, so that we can make sure that
> the solutions we're adopting are in fact solving the problems we want to
> solve. It would be irresponsible to add features without knowing why.
>
I would assume that our resident proponents are already satisfied that
their higher-level problem have been solved, and this is why they're
frustrated that you won't just let them map their existing solutions
into HTML all in one fell swoop.
I'm not sure I'd put myself into the "RDF proponent" bucket, but I do
know one use-case of RDF that I've encountered frequently so I'll post
it as a starting point.
The FOAF schema for RDF[0] addresses the problem of making personal
profile data machine-readable along with some of the relationships
between people. From the outside looking in, it seems that the goal they
set themselves was to make machine-readable the sort of information you
find on a social networking site.
One problem this can solve is that an agent can, given a URL that
represents a person, extract some basic profile information such as the
person's name along with references to other people that person knows.
This can further be applied to allow a user who provides his own URL
(for example, by signing in via OpenID) to bootstrap his account from
existing published data rather than having to re-enter it.
Google Social Graph API[1] apparently makes use of FOAF (when serialized
as XML) as one of the sources of data so that given a URL that
represents a person it can return a list of URLs that represent friends
of that person.
The Google Profiles application[2] makes use of the output of the Social
Graph API to suggest URLs that a user might want to list on his profile
page, so the user only needs to fill in a couple of URLs by hand.
So, to distill that into a list of requirements:
- Allow software agents to extract profile information for a person as
often exposed on social networking sites from a page that "represents"
that person.
There is a number of existing solutions for this:
* FOAF in RDF serialized as XML, Turtle, RDFa, eRDF, etc
* The vCard format
* The hCard microformat
* The PortableContacts protocol[3]
* Natural Language Processing of HTML documents
- Allow software agents to determine who a person lists as their friends
given a page that "represents" that person.
Again, there are competing solutions:
* FOAF in RDF serialized as XML, Turtle, RDFa, eRDF, etc
* The XFN microformat[4]
* The PortableContacts protocol[3]
* Natural Language Processing of HTML documents
-----------------------------------------------
Assuming that the above is a convincing problem domain, now let's add in
the following requirement:
- Allow the above to be encoded without duplicating the data in both
machine-readable and human-readable forms.
Now our solution list is reduced to (assuming we consider both
requirements together):
* FOAF in RDF serialized as RDFa or eRDF
* The hCard microformat + the XFN microformat
* Natural Language Processing of HTML documents
All three of the above options address the use-cases as I stated them --
the Social Graph API apparently uses all three if you're willing to
consider a MySpace-specific "screen-scraper" as Natural Language
Processing -- so what would be the advantages of the first solution?
* Existing RDF-based systems can use an off-the-shelf RDFa or eRDF
parser and get the same data model (RDF triples of FOAF predicates) that
they were already getting from the XML and Turtle RDF serializations,
reducing the amount of additional work that must be done to consume this
format.
* FOAF has an extensive vocabulary that's based on fields that have
been observed on social networking sites, while hCard is built on vCard
which has a more constrained scope intended for the sort of entries
you'd expect to find in an "address book".
* FOAF has been adopted -- usually in the RDF-XML serialization -- by
some number of social networking sites (e.g. LiveJournal) so they are
presumably already somewhat familiar with the FOAF vocabulary and may
therefore be able to adopt it more easily in the RDFa or eRDF
serializations.
Though there are of course also some disadvantages:
* Some sites are already publishing XFN and/or hCard so consuming
software would need to continue to support these in addition to
FOAF-in-HTML-somehow, which is more work than supporting only XFN and
hCard. (In other words, "XFN/hCard already work today")
* RDFa requires extensions to the HTML language, while XFN, hCard and
NLP do not.
* Many existing FOAF parsers are not actually RDF parsers but are
rather using stock XML parsers and assuming a particular tree layout, so
they would not be able to reuse any code in processing triples from RDFa
or eRDF.
-------------------------------------
Is this the sort of thing you're looking for, Ian?
Much of the above section could be applied to any other RDF vocabulary
with a bit of search and replace, but I'll leave that to others since
FOAF is the only RDF vocabulary with which I have any experience.
(and if I've misrepresented any of the facts about FOAF or RDF I'm happy
to be corrected. I'm writing this only in an attempt to move the
discussion forward; I'm currently neutral on whether RDFa should be
adopted into HTML5.)
[0]http://www.foaf-project.org/
[1]http://code.google.com/apis/socialgraph/
[2]http://www.google.com/support/accounts/bin/answer.py?answer=97703&hl=en
[3]http://portablecontacts.net/
[4]http://www.gmpg.org/xfn/
attached mail follows:
On Sun, 11 Jan 2009, Martin Atkins wrote: > > One problem this can solve is that an agent can, given a URL that > represents a person, extract some basic profile information such as the > person's name along with references to other people that person knows. > This can further be applied to allow a user who provides his own URL > (for example, by signing in via OpenID) to bootstrap his account from > existing published data rather than having to re-enter it. > > So, to distill that into a list of requirements: > > - Allow software agents to extract profile information for a person as often > exposed on social networking sites from a page that "represents" that person. > > - Allow software agents to determine who a person lists as their friends > given a page that "represents" that person. > > - Allow the above to be encoded without duplicating the data in both > machine-readable and human-readable forms. > > Is this the sort of thing you're looking for, Ian? Yes, the above is perfect. (I cut out the bits that weren't really "the problem" from the quote above -- the above is what I'm looking for.) The most critical part is "allow a user who provides his own URL to bootstrap his account from existing published data rather than having to re-enter it". The one thing I would add would be a scenario that one would like to be able to play out, so that we can see if our solution would enable that scenario. For example: "I have an account on social networking site A. I go to a new social networking site B. I want to be able to automatically add all my friends from site A to site B." There are presumably other requirements, e.g. "site B must not ask the user for the user's credentials for site A" (since that would train people to be susceptible to phishing attacks). Also, "site A must not publish the data in a manner that allows unrelated users to obtain privacy-sensitive data about the user", for example we don't want to let other users determine relationships that the user has intentionally kept secret [1]. It's important that we have these scenarios so that we can check if the solutions we consider are actually able to solve these problems, these scenarios, within the constraints and requirements we have. [1] http://w2spconf.com/2008/papers/s3p2.pdf -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
attached mail follows:
On Jan 11, 2009, at 14:01, Toby A Inkster wrote: > RDFa *does not* rely on XML namespaces. RDFa relies on eight > attributes: about, rel, rev, property, datatype, content, resource > and typeof. It also relies on a CURIE prefix binding mechanism. In > XHTML and SVG, RDFa happens to use XML namespaces as this mechanism, > because they already existed and they were convenient. Convenience is debatable. In any case, it is rather disingenuous to say that RDFa doesn't rely on XML Namespaces when all that has been defined so far relies of attributes whose qname contains the substring "xmlns". > In non-XML markup languages, the route to define CURIE prefixes is > still to be decided, though discussions tend to be leaning towards > something like: > > <html prefix="dc=http://purl.org/dc/terms/ foaf=http://xmlns.com/foaf/0.1/ > "> > <address rel="foaf:maker" rev="foaf:made">This document was made by > <a href="http://joe.example.com" typeof="foaf:Person" > rel="foaf:homepage" property="foaf:name">Joe Bloggs</a>.</address> > </html> Unless this syntax were also used for XHTML, the above would be in violation of the DOM Consistency Design Principle of the W3C HTML WG. > This discussion seems to be about "should/can RDFa work in HTML5?" > when in fact, RDFa already can and does work in HTML5 - there are > approaching a dozen interoperable implementations of RDFa, the > majority of which seem to handle non-XHTML HTML. Those implementations violate the software implementation reuse principle that motivates the DOM Consistency Design Principle. (The software reuse principle being that the same code path be used for both HTML and XHTML on layers higher than the parser.) The prefix mapping mechanism of CURIEs has been designed with disregard towards this software reuse principle (in use in Gecko, WebKit and, I gather, Presto) that should have been known to anyone working on Web-related specs far before "DOM Consistency" was written into the Design Principles of the HTML WG. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
Martin Atkins wrote: > * Some sites are already publishing XFN and/or hCard so consuming > software would need to continue to support these in addition to > FOAF-in-HTML-somehow, which is more work than supporting only XFN and > hCard. Mitigating this though is GRDDL which allows the hCard+XFN to be parsed using a subset of FOAF (e.g. http://weborganics.co.uk/hFoaF/) and thus merged with FOAF available as RDF/XML, RDFa, etc. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
attached mail follows:
Martin Atkins wrote: > One problem this can solve is that an agent can, given a URL that > represents a person, extract some basic profile information such as the > person's name along with references to other people that person knows. > This can further be applied to allow a user who provides his own URL > (for example, by signing in via OpenID) to bootstrap his account from > existing published data rather than having to re-enter it. > > So, to distill that into a list of requirements: > > - Allow software agents to extract profile information for a person as often > exposed on social networking sites from a page that "represents" that person. > > - Allow software agents to determine who a person lists as their friends > given a page that "represents" that person. > > - Allow the above to be encoded without duplicating the data in both > machine-readable and human-readable forms. > > Is this the sort of thing you're looking for, Ian? > >Much of the above section could be applied to any other RDF vocabulary >with a bit of search and replace, but I'll leave that to others since >FOAF is the only RDF vocabulary with which I have any experience. Why we must restrict the use case to a single vocabulary or analyze all the possibile vocabularies? I think it's be better to "generalize" the problem and find a unique solution for human/machine. I tried to expose this here... http://lists.w3.org/Archives/Public/public-html/2009Jan/0082.html ...where the fundamental problem is described in this way: - User agents must allow users to see that there are "semantic-links" (connections to semantically structured informations) in a HTML document/application. Consequently user agents must allow users to "follow" the semantic-link, (access/interact with the linked data, embedded or external) and this involves primarily the ability to: a) view the informations b) select the informations c) copy the informations in the clipboard d) drag and drop the informations e) send that informations to another web application (or to OS applications) selected by the user. -- Giovanni Gentili
attached mail follows:
Giovanni Gentili wrote: > Why we must restrict the use case to a single vocabulary > or analyze all the possibile vocabularies? > > I think it's be better to "generalize" the problem > and find a unique solution for human/machine. The issue when trying to abstract problems is that you can end up doing "architecture astronautics"; you concentrate on making generic ways to build solutions to weakly constrained problems without any attention to the details of those problems that make them unique. The solutions that are so produced often have the theoretical capacity to solve broad classes of problem, but are often found to be poor at solving any specific individual problem. By looking at actual use cases we can hope to retain enough detail in the requirements that we satisfy at least some use cases well, rather than wasting out time building huge follies that serve no practical purpose to anyone.
attached mail follows:
James Graham: > The issue when trying to abstract problems is that you can end up doing > "architecture astronautics"; you concentrate on making generic ways to build > solutions to weakly constrained problems without any attention to the > details of those problems that make them unique. I think the right level, like in my proposal, is greatly under "astronautics" but no so low as "single vocabularies". -- Giovanni Gentili
attached mail follows:
Giovanni Gentili wrote: > James Graham: >> The issue when trying to abstract problems is that you can end up doing >> "architecture astronautics"; you concentrate on making generic ways to build >> solutions to weakly constrained problems without any attention to the >> details of those problems that make them unique. > > I think the right level, like in my proposal, > is greatly under "astronautics" > but no so low as "single vocabularies". > I rather disagree. How we interact with information depends fundamentally on the type of information. If the information is a set of geographical coordinates, for example, the set of useful interactions are rather different to those for a bibliographic entry. Trying to pretend that the two problems are just interchangeable instances of the same "semantically structured information" problem is likely to hide the important distinctions between the two problem domains.
attached mail follows:
Per discussion with Ian, I am posting a link to my take on the RDFa discussion to this list. http://realtech.burningbird.net/semantic-web/semantic-markup/stop-justifying-rdfa Thank you Shelley Powers
attached mail follows:
The debate about RDFa highlights a disconnect in the decision making related to HTML5. The purpose behind RDFa is to provide a way to embed complex information into a web document, in such a way that a machine can extract this information and combine it with other data extracted from other web pages. It is not a way to document private data, or data that is meant to be used by some JavaScript-based application. The sole purpose of the data is for external extraction and combination. An earlier email between Martin Atkins and Ian Hickson had the following: "On Sun, 11 Jan 2009, Martin Atkins wrote: > > One problem this can solve is that an agent can, given a URL that > represents a person, extract some basic profile information such as the > person's name along with references to other people that person knows. > This can further be applied to allow a user who provides his own URL > (for example, by signing in via OpenID) to bootstrap his account from > existing published data rather than having to re-enter it. > > So, to distill that into a list of requirements: > > - Allow software agents to extract profile information for a person as often > exposed on social networking sites from a page that "represents" that person. > > - Allow software agents to determine who a person lists as their friends > given a page that "represents" that person. > > - Allow the above to be encoded without duplicating the data in both > machine-readable and human-readable forms. > > Is this the sort of thing you're looking for, Ian? Yes, the above is perfect. (I cut out the bits that weren't really "the problem" from the quote above -- the above is what I'm looking for.) The most critical part is "allow a user who provides his own URL to bootstrap his account from existing published data rather than having to re-enter it". The one thing I would add would be a scenario that one would like to be able to play out, so that we can see if our solution would enable that scenario. For example: "I have an account on social networking site A. I go to a new social networking site B. I want to be able to automatically add all my friends from site A to site B." There are presumably other requirements, e.g. "site B must not ask the user for the user's credentials for site A" (since that would train people to be susceptible to phishing attacks). Also, "site A must not publish the data in a manner that allows unrelated users to obtain privacy-sensitive data about the user", for example we don't want to let other users determine relationships that the user has intentionally kept secret [1]. It's important that we have these scenarios so that we can check if the solutions we consider are actually able to solve these problems, these scenarios, within the constraints and requirements we have." It would seem that Ian agrees with a need to both a) provide a way to document complex information in a consistent, machine readable form and that b) the purpose of this data is for external consumption, rather than internal use. Where the disconnect comes in is he believes that RDF, and the web page serialization technique, RDFa, are only one of a set of possible solutions. Yet at the same time, he references how the MathML and SVG people provide sufficient use cases to justify the inclusion of both of these into HTML5. But what is MathML. What does it solve? A way to include mathematical formula into a document in a formatted manner. What is SVG? A way to embed vector graphics into a web page, in such a way that the individual elements described by the graphics can become part of the overall DOM. So, why accept that we have to use MathML in order to solve the problems of formatting mathematical formula? Why not start from scratch, and devise a new approach? So, why accept that we have to use SVG in order to solve the problems of vector graphics? Why not start from scratch, and devise a new approach? Come to think of it, I think we should also question the use of the canvas element. After all, if the problem set is that we need the ability to animate graphics in a web page using a non-proprietary technology, then wouldn't something like SVG work for this purpose? Isn't the canvas element redundant? But then, perhaps we should start over from the beginning and just create a new graphics capability from scratch, and reject both canvas and SVG. We don't reject MathML, though. Neither do we reject SVG or canvas. Or any other of a number of entities being included in HTML5, including SQL. Why? Because they have a history of use, extensive documentation as to purpose and behavior, and there are a considerable number of implementations that support the specifications. It doesn't make sense to start from scratch. It makes more sense to make use of what already works. I have to ask, then: why do we isolate RDF, and RDFa for special handling? If we can accept that SQL is a natural database query mechanism, and SVG is a natural for vector graphics, and the canvas element is the proper choice for a script-enabled bitmaps, and MathML...well, you get the picture-if we can accept that these mature, well documented representatives of each of their genres as the de facto implementation, enough to incorporate each into HTML5, why then do we demand that RDF and its web page serialization technique, RDFa, must "prove" themselves, when we don't demand the same from other external objects and specifications? To do so is not consistent. To continue to do so demonstrates that perhaps other issues are at play in regards to RDF/RDFa. Martin provided a use case that Ian acknowledges is justified. Ipso facto, we do not need to continue providing use cases for this type of requirement. We have established that the requirement/need/desire to incorporate data into a web page that is consistently machine readable, which can be consistently extracted, and consistently combined with data from other documents using automated processes is a legitimate need. RDF was designed specifically for this purpose, is a mature specification, with extensive documentation, and one can find many different implementations of its use. The use of RDF for FOAF is just one of many uses, RSS 1.0 was another, and a version of RDF embedded within photos, CC licensing--these are all based on the same model. In other words, if we accept that SVG is the de facto implementation of vector graphics (as compared to something such as, say, VML), and we accept the same for MathML, the canvas element, SQL, and so on, to not accept RDF as the de facto implementation for the purpose behind which it was designed, is to single out RDF/RDFa for "special handling" within the group. To demand more from it, then has been demanded from any other element included in HTML5. In particular, as has been documented elsewhere, very little is needed to support RDFa within HTML5. The requirements are much less than those for the canvas element, SVG, MathML, and even SQL. So the task, itself, is not daunting. Not as daunting as, say, the alt attribute. This then returns us to my earlier supposition: To not support RDF/RDFa as the de facto implementation of complex, structured data is not consistent. To continue to do so demonstrates that perhaps other issues are at play in regards to RDF/RDFa. Such inconsistencies are not in the best interest when developing a new specification meant for widespread use on the web. If, as I believe, the inconsistency reflects an underlying bias against the concept behind RDF, which is that true web semantics is based on structured data, not natural language processing, or not exclusively based on natural language processing, then I believe it's important to highlight such bias, and deal with it accordingly. Shelley
attached mail follows:
On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers <shelleyp@burningbird.net> wrote: > The debate about RDFa highlights a disconnect in the decision making related > to HTML5. Perhaps. Or perhaps not. I am far from an apologist for Hixie, (nor for that matter and I a strong advocate for RDF), but I offer the following question and observation. > The purpose behind RDFa is to provide a way to embed complex information > into a web document, in such a way that a machine can extract this > information and combine it with other data extracted from other web pages. > It is not a way to document private data, or data that is meant to be used > by some JavaScript-based application. The sole purpose of the data is for > external extraction and combination. So, I take it that it isn't essential that RDFa information be included in the DOM? This is not rhetorical: I honestly don't know the answer to this question. > So, why accept that we have to use MathML in order to solve the problems of > formatting mathematical formula? Why not start from scratch, and devise a > new approach? Ian explored (and answered) that here: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-April/014372.html Key to Ian's decision was the importance of DOM integration for this vocabulary. If DOM integration is essential for RDFa, then perhaps the same principles apply. If not, perhaps some other principles may apply. - Sam Ruby
attached mail follows:
On 17/1/09 19:27, Sam Ruby wrote: > On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers > <shelleyp@burningbird.net> wrote: >> The debate about RDFa highlights a disconnect in the decision making related >> to HTML5. > > Perhaps. Or perhaps not. I am far from an apologist for Hixie, (nor > for that matter and I a strong advocate for RDF), but I offer the > following question and observation. > >> The purpose behind RDFa is to provide a way to embed complex information >> into a web document, in such a way that a machine can extract this >> information and combine it with other data extracted from other web pages. >> It is not a way to document private data, or data that is meant to be used >> by some JavaScript-based application. The sole purpose of the data is for >> external extraction and combination. > > So, I take it that it isn't essential that RDFa information be > included in the DOM? This is not rhetorical: I honestly don't know > the answer to this question. Good question. I for one expect RDFa to be accessible to Javascript. http://code.google.com/p/rdfquery/wiki/Introduction -> http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a nice example of code that does something useful in this way. cheers, Dan -- http://danbri.org/
attached mail follows:
Dan Brickley wrote: > On 17/1/09 19:27, Sam Ruby wrote: >> On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers >> <shelleyp@burningbird.net> wrote: >>> The debate about RDFa highlights a disconnect in the decision making >>> related >>> to HTML5. >> >> Perhaps. Or perhaps not. I am far from an apologist for Hixie, (nor >> for that matter and I a strong advocate for RDF), but I offer the >> following question and observation. >> >>> The purpose behind RDFa is to provide a way to embed complex >>> information >>> into a web document, in such a way that a machine can extract this >>> information and combine it with other data extracted from other web >>> pages. >>> It is not a way to document private data, or data that is meant to >>> be used >>> by some JavaScript-based application. The sole purpose of the data >>> is for >>> external extraction and combination. >> >> So, I take it that it isn't essential that RDFa information be >> included in the DOM? This is not rhetorical: I honestly don't know >> the answer to this question. > > Good question. I for one expect RDFa to be accessible to Javascript. > > http://code.google.com/p/rdfquery/wiki/Introduction -> > http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a > nice example of code that does something useful in this way. > > cheers, > > Dan > I agree, and appreciate Dan for pointing out a specific instance of use. Apologies for not making the assertion explicit. Shelley > -- > http://danbri.org/ >
attached mail follows:
On Sat, Jan 17, 2009 at 1:33 PM, Dan Brickley <danbri@danbri.org> wrote: > On 17/1/09 19:27, Sam Ruby wrote: >> >> On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers >> <shelleyp@burningbird.net> wrote: >>> >>> The debate about RDFa highlights a disconnect in the decision making >>> related >>> to HTML5. >> >> Perhaps. Or perhaps not. I am far from an apologist for Hixie, (nor >> for that matter and I a strong advocate for RDF), but I offer the >> following question and observation. >> >>> The purpose behind RDFa is to provide a way to embed complex information >>> into a web document, in such a way that a machine can extract this >>> information and combine it with other data extracted from other web >>> pages. >>> It is not a way to document private data, or data that is meant to be >>> used >>> by some JavaScript-based application. The sole purpose of the data is for >>> external extraction and combination. >> >> So, I take it that it isn't essential that RDFa information be >> included in the DOM? This is not rhetorical: I honestly don't know >> the answer to this question. > > Good question. I for one expect RDFa to be accessible to Javascript. > > http://code.google.com/p/rdfquery/wiki/Introduction -> > http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a nice > example of code that does something useful in this way. The fact that this works anywhere at all today implies that little, if any, changes to browsers is required in order to support this. Is that a fair statement? I've not taken a look at the code, but have taken a quick glance at the output using IE8.0.7000.0 beta, Safari 3.2.1/Windows, Chrome 1.0.154.43, Opera 9.63, and Firefox 3.0.5. The page is different (as in less functional) under IE8 and Safari. Is there something that they need to do which is not already covered in the HTML5 specification in order to support this? - Sam Ruby
attached mail follows:
Sam Ruby wrote: > On Sat, Jan 17, 2009 at 1:33 PM, Dan Brickley <danbri@danbri.org> wrote: > >> On 17/1/09 19:27, Sam Ruby wrote: >> >>> On Sat, Jan 17, 2009 at 11:55 AM, Shelley Powers >>> <shelleyp@burningbird.net> wrote: >>> >>>> The debate about RDFa highlights a disconnect in the decision making >>>> related >>>> to HTML5. >>>> >>> Perhaps. Or perhaps not. I am far from an apologist for Hixie, (nor >>> for that matter and I a strong advocate for RDF), but I offer the >>> following question and observation. >>> >>> >>>> The purpose behind RDFa is to provide a way to embed complex information >>>> into a web document, in such a way that a machine can extract this >>>> information and combine it with other data extracted from other web >>>> pages. >>>> It is not a way to document private data, or data that is meant to be >>>> used >>>> by some JavaScript-based application. The sole purpose of the data is for >>>> external extraction and combination. >>>> >>> So, I take it that it isn't essential that RDFa information be >>> included in the DOM? This is not rhetorical: I honestly don't know >>> the answer to this question. >>> >> Good question. I for one expect RDFa to be accessible to Javascript. >> >> http://code.google.com/p/rdfquery/wiki/Introduction -> >> http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is a nice >> example of code that does something useful in this way. >> > > The fact that this works anywhere at all today implies that little, if > any, changes to browsers is required in order to support this. Is > that a fair statement? > > I've not taken a look at the code, but have taken a quick glance at > the output using IE8.0.7000.0 beta, Safari 3.2.1/Windows, Chrome > 1.0.154.43, Opera 9.63, and Firefox 3.0.5. > > The page is different (as in less functional) under IE8 and Safari. > Is there something that they need to do which is not already covered > in the HTML5 specification in order to support this? > I would think we would have to go through the code to see what this specific instance of client-side access of the RDFa isn't working. The debugger I'm using with IE8 shows the problem is occuring in the jQuery code, not necessarily anything specific to the RDFa plugin. I know other JavaScript libraries that work with RDFa work, at least with Safari. For instance: http://www.w3.org/2006/07/SWD/RDFa/impl/js/ Since this library was vetted for IE7, would assume it would work for IE8, too. Of course, the RDFa attributes aren't incorporated into HTML5, which means their use would result in an invalid document. And of course, if they were incorporated, the issue of namespace for them would have to be addressed as namespaces were for MathML and SVG. Shelley > - Sam Ruby > >
attached mail follows:
On Jan 17, 2009, at 20:33, Dan Brickley wrote: > Good question. I for one expect RDFa to be accessible to Javascript. > > http://code.google.com/p/rdfquery/wiki/Introduction -> http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html > is a nice example of code that does something useful in this way. Does this code run the same way on both DOMs parsed from text/html and application/xhtml+xml in existing browsers without at any point branching on a condition that is a DOM difference between text/html- originated and application/xhtml+xml-originated DOMs? -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
Henri Sivonen wrote: > On Jan 17, 2009, at 20:33, Dan Brickley wrote: > >> Good question. I for one expect RDFa to be accessible to Javascript. >> >> http://code.google.com/p/rdfquery/wiki/Introduction -> >> http://rdfquery.googlecode.com/svn/trunk/demos/markup/markup.html is >> a nice example of code that does something useful in this way. > > > Does this code run the same way on both DOMs parsed from text/html and > application/xhtml+xml in existing browsers without at any point > branching on a condition that is a DOM difference between > text/html-originated and application/xhtml+xml-originated DOMs? > I don't want to specifically look at just the one case, since it is not working in Safari, and IE8 and is too complex to debug right at this moment. Generally, though, RDFa is based on reusing a set of attributes already existing in HTML5, and adding a few more. I would assume no differences in the DOM based on XHTML or HTML. The one issue that would occur has to do with the values assigned, not the syntax. I put together a very crude demonstration of JavaScript access of a specific RDFa attribute, about. It's temporary, but if you go to my main web page, http://realtech.burningbird.net, and look in the sidebar for the click me text, it will traverse each div element looking for an "about" attribute, and then pop up an alert with the value of the attribute. I would use console rather than alert, but I don't believe all browsers support console, yet. Access the page using Firefox, which is served the page as XHTML. Access it as IE8, which gets the page as HTML. You can tell the difference between my graphics are based in inline SVG, and will only show if the page is served as XHTML. So, yes, with my quick, crude demonstration, DOM access is the same in both environments. Shelley
attached mail follows:
On Jan 17, 2009, at 22:35, Shelley Powers wrote: > Generally, though, RDFa is based on reusing a set of attributes > already existing in HTML5, and adding a few more. Also, RDFa uses CURIEs which in turn use the XML namespace mapping context. > I would assume no differences in the DOM based on XHTML or HTML. The assumption is incorrect. Please compare http://hsivonen.iki.fi/test/moz/xmlns-dom.html and http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml Same bytes, different media type. > I put together a very crude demonstration of JavaScript access of a > specific RDFa attribute, about. It's temporary, but if you go to my > main web page,http://realtech.burningbird.net, and look in the > sidebar for the click me text, it will traverse each div element > looking for an "about" attribute, and then pop up an alert with the > value of the attribute. I would use console rather than alert, but I > don't believe all browsers support console, yet. This misses the point, because the inconsistency is with attributes named xmlns:foo. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
> > The assumption is incorrect. > > Please compare > http://hsivonen.iki.fi/test/moz/xmlns-dom.html > and > http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml > > Same bytes, different media type. > >> I put together a very crude demonstration of JavaScript access of a >> specific RDFa attribute, about. It's temporary, but if you go to my >> main web page,http://realtech.burningbird.net, and look in the >> sidebar for the click me text, it will traverse each div element >> looking for an "about" attribute, and then pop up an alert with the >> value of the attribute. I would use console rather than alert, but I >> don't believe all browsers support console, yet. > > This misses the point, because the inconsistency is with attributes > named xmlns:foo. > And I also said that we would have to address the issue of namespaces, which actually may require additional effort. I said that the addition of RDFa would mean the addition of some attributes, and we would have to deal with namespace issues. Just like the HTML5 working group is having to deal with namespaces with MathML and SVG. And probably the next dozen or so innovations that come along. That is the price for not having distributed extensibility. One works the issues. I assume the same could be said of any many of the newer additions to HTML5. Are you then saying that this will be a showstopper, and there will never be either a workaround or compromise? Shelley
attached mail follows:
On Jan 18, 2009, at 01:32, Shelley Powers wrote: > Are you then saying that this will be a showstopper, and there will > never be either a workaround or compromise? Are the RDFa TF open to compromises that involve changing the XHTML side of RDFa not to use attribute whose qualified name has a colon in them to achieve DOM Consistency by changing RDFa instead of changing parsing? -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
On 18/1/09 19:34, Henri Sivonen wrote: > On Jan 18, 2009, at 01:32, Shelley Powers wrote: > >> Are you then saying that this will be a showstopper, and there will >> never be either a workaround or compromise? > > > Are the RDFa TF open to compromises that involve changing the XHTML side > of RDFa not to use attribute whose qualified name has a colon in them to > achieve DOM Consistency by changing RDFa instead of changing parsing? I don't believe the RDFa TF are in a position to singlehandedly rescind a W3C Recommendation, ie. http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/. What they presumably could do is propose new work items within W3C, which I'd guess would be more likely to be accepted if it had the active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who might have something more to add. Do you have an alternative design in mind, for expressing the namespace mappings? cheers, Dan -- http://danbri.org/
attached mail follows:
On Jan 18, 2009, at 20:48, Dan Brickley wrote: > On 18/1/09 19:34, Henri Sivonen wrote: >> On Jan 18, 2009, at 01:32, Shelley Powers wrote: >> >>> Are you then saying that this will be a showstopper, and there will >>> never be either a workaround or compromise? >> >> >> Are the RDFa TF open to compromises that involve changing the XHTML >> side >> of RDFa not to use attribute whose qualified name has a colon in >> them to >> achieve DOM Consistency by changing RDFa instead of changing parsing? > > I don't believe the RDFa TF are in a position to singlehandedly > rescind a W3C Recommendation, ie. http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/ > . > > What they presumably could do is propose new work items within W3C, > which I'd guess would be more likely to be accepted if it had the > active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who > might have something more to add. > > Do you have an alternative design in mind, for expressing the > namespace mappings? The simplest thing is not to have mappings but to put the corresponding absolute URI wherever RDFa uses a CURIE. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
On 18/1/09 20:07, Henri Sivonen wrote: > On Jan 18, 2009, at 20:48, Dan Brickley wrote: > >> On 18/1/09 19:34, Henri Sivonen wrote: >>> On Jan 18, 2009, at 01:32, Shelley Powers wrote: >>> >>>> Are you then saying that this will be a showstopper, and there will >>>> never be either a workaround or compromise? >>> >>> >>> Are the RDFa TF open to compromises that involve changing the XHTML side >>> of RDFa not to use attribute whose qualified name has a colon in them to >>> achieve DOM Consistency by changing RDFa instead of changing parsing? >> >> I don't believe the RDFa TF are in a position to singlehandedly >> rescind a W3C Recommendation, ie. >> http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/. >> >> What they presumably could do is propose new work items within W3C, >> which I'd guess would be more likely to be accepted if it had the >> active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who >> might have something more to add. >> >> Do you have an alternative design in mind, for expressing the >> namespace mappings? > > The simplest thing is not to have mappings but to put the corresponding > absolute URI wherever RDFa uses a CURIE. So this would be a kind of "interoperability profile" of RDFa, where certain features approved of by REC-rdfa-syntax-20081014 wouldn't be used in some hypothetical HTML5 RDFa. If people can control their urge to use namespace abbreviations, and stick to URIs directly, would this make your DOM-oriented concerns go away? cheers, Dan -- http://danbri.org/
attached mail follows:
Dan Brickley wrote: > On 18/1/09 20:07, Henri Sivonen wrote: >> On Jan 18, 2009, at 20:48, Dan Brickley wrote: >> >>> On 18/1/09 19:34, Henri Sivonen wrote: >>>> On Jan 18, 2009, at 01:32, Shelley Powers wrote: >>>> >>>>> Are you then saying that this will be a showstopper, and there will >>>>> never be either a workaround or compromise? >>>> >>>> >>>> Are the RDFa TF open to compromises that involve changing the XHTML >>>> side >>>> of RDFa not to use attribute whose qualified name has a colon in >>>> them to >>>> achieve DOM Consistency by changing RDFa instead of changing parsing? >>> >>> I don't believe the RDFa TF are in a position to singlehandedly >>> rescind a W3C Recommendation, ie. >>> http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/. >>> >>> What they presumably could do is propose new work items within W3C, >>> which I'd guess would be more likely to be accepted if it had the >>> active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who >>> might have something more to add. >>> >>> Do you have an alternative design in mind, for expressing the >>> namespace mappings? >> >> The simplest thing is not to have mappings but to put the corresponding >> absolute URI wherever RDFa uses a CURIE. > > So this would be a kind of "interoperability profile" of RDFa, where > certain features approved of by REC-rdfa-syntax-20081014 wouldn't be > used in some hypothetical HTML5 RDFa. > > If people can control their urge to use namespace abbreviations, and > stick to URIs directly, would this make your DOM-oriented concerns go > away? Took five minutes to make this change in my template. Ran through validator.nu. Results: Doesn't like the content-type. Didn't like profile on head. Having to remove the profile attribute in my head element limits usability, but I'm not going to throw myself on the sword for this one. Doesn't like property, doesn't like about. These are the RDFa attributes I'm using. The RDF extractor doesn't care that I used the URIs directly. Didn't seem to mind SVG, but a value of "none" is a valid value for preserveAspectRatio. Shelley > > cheers, > > Dan > > -- > http://danbri.org/ >
attached mail follows:
On 18/1/09 21:04, Shelley Powers wrote: > Dan Brickley wrote: >> On 18/1/09 20:07, Henri Sivonen wrote: >>> On Jan 18, 2009, at 20:48, Dan Brickley wrote: >>> >>>> On 18/1/09 19:34, Henri Sivonen wrote: >>>>> On Jan 18, 2009, at 01:32, Shelley Powers wrote: >>>>> >>>>>> Are you then saying that this will be a showstopper, and there will >>>>>> never be either a workaround or compromise? >>>>> >>>>> >>>>> Are the RDFa TF open to compromises that involve changing the XHTML >>>>> side >>>>> of RDFa not to use attribute whose qualified name has a colon in >>>>> them to >>>>> achieve DOM Consistency by changing RDFa instead of changing parsing? >>>> >>>> I don't believe the RDFa TF are in a position to singlehandedly >>>> rescind a W3C Recommendation, ie. >>>> http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/. >>>> >>>> What they presumably could do is propose new work items within W3C, >>>> which I'd guess would be more likely to be accepted if it had the >>>> active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who >>>> might have something more to add. >>>> >>>> Do you have an alternative design in mind, for expressing the >>>> namespace mappings? >>> >>> The simplest thing is not to have mappings but to put the corresponding >>> absolute URI wherever RDFa uses a CURIE. >> >> So this would be a kind of "interoperability profile" of RDFa, where >> certain features approved of by REC-rdfa-syntax-20081014 wouldn't be >> used in some hypothetical HTML5 RDFa. >> >> If people can control their urge to use namespace abbreviations, and >> stick to URIs directly, would this make your DOM-oriented concerns go >> away? > > Took five minutes to make this change in my template. Ran through > validator.nu. Results: > > Doesn't like the content-type. Didn't like profile on head. Having to > remove the profile attribute in my head element limits usability, but > I'm not going to throw myself on the sword for this one. > > Doesn't like property, doesn't like about. These are the RDFa attributes > I'm using. The RDF extractor doesn't care that I used the URIs directly. This sounds encouraging. Thanks for taking the time to try the experiment, Shelley. But ... to be clear, are you putting full URIs in the @property attribute too? In http://www.w3.org/TR/rdfa-syntax/#s_curieprocessing it says '@property, @datatype and @typeof support only CURIE values.' (Can you post an example?) Reading ... """Many of the attributes that hold URIs are also able to carry 'compact URIs' or CURIEs. A CURIE is a convenient way to represent a long URI, by replacing a leading section of the URI with a substitution token. It's possible for authors to define a number of substitution tokens as they see fit; the full URI is obtained by locating the mapping defined by a token from a list of in-scope tokens, and then simply concatenating the second part of the CURIE onto the mapped value.""" ... I guess the fact that @property is supposed to be CURIE-only isn't a problem with parsers since this can be understood as a CURIE with no (or empty) substitution token. cheers, Dan -- http://danbri.org/
attached mail follows:
Dan Brickley wrote: > On 18/1/09 21:04, Shelley Powers wrote: >> Dan Brickley wrote: >>> On 18/1/09 20:07, Henri Sivonen wrote: >>>> On Jan 18, 2009, at 20:48, Dan Brickley wrote: >>>> >>>>> On 18/1/09 19:34, Henri Sivonen wrote: >>>>>> On Jan 18, 2009, at 01:32, Shelley Powers wrote: >>>>>> >>>>>>> Are you then saying that this will be a showstopper, and there will >>>>>>> never be either a workaround or compromise? >>>>>> >>>>>> >>>>>> Are the RDFa TF open to compromises that involve changing the XHTML >>>>>> side >>>>>> of RDFa not to use attribute whose qualified name has a colon in >>>>>> them to >>>>>> achieve DOM Consistency by changing RDFa instead of changing >>>>>> parsing? >>>>> >>>>> I don't believe the RDFa TF are in a position to singlehandedly >>>>> rescind a W3C Recommendation, ie. >>>>> http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/. >>>>> >>>>> What they presumably could do is propose new work items within W3C, >>>>> which I'd guess would be more likely to be accepted if it had the >>>>> active enthusiasm of the core HTML5 team. Am cc:'ing TimBL here who >>>>> might have something more to add. >>>>> >>>>> Do you have an alternative design in mind, for expressing the >>>>> namespace mappings? >>>> >>>> The simplest thing is not to have mappings but to put the >>>> corresponding >>>> absolute URI wherever RDFa uses a CURIE. >>> >>> So this would be a kind of "interoperability profile" of RDFa, where >>> certain features approved of by REC-rdfa-syntax-20081014 wouldn't be >>> used in some hypothetical HTML5 RDFa. >>> >>> If people can control their urge to use namespace abbreviations, and >>> stick to URIs directly, would this make your DOM-oriented concerns go >>> away? >> >> Took five minutes to make this change in my template. Ran through >> validator.nu. Results: >> >> Doesn't like the content-type. Didn't like profile on head. Having to >> remove the profile attribute in my head element limits usability, but >> I'm not going to throw myself on the sword for this one. >> >> Doesn't like property, doesn't like about. These are the RDFa attributes >> I'm using. The RDF extractor doesn't care that I used the URIs directly. > > This sounds encouraging. Thanks for taking the time to try the > experiment, Shelley. But ... to be clear, are you putting full URIs > in the @property attribute too? In > http://www.w3.org/TR/rdfa-syntax/#s_curieprocessing it says > '@property, @datatype and @typeof support only CURIE values.' > > (Can you post an example?) > > Reading ... > """Many of the attributes that hold URIs are also able to carry > 'compact URIs' or CURIEs. A CURIE is a convenient way to represent a > long URI, by replacing a leading section of the URI with a > substitution token. It's possible for authors to define a number of > substitution tokens as they see fit; the full URI is obtained by > locating the mapping defined by a token from a list of in-scope > tokens, and then simply concatenating the second part of the CURIE > onto the mapped value.""" > > ... I guess the fact that @property is supposed to be CURIE-only isn't > a problem with parsers since this can be understood as a CURIE with no > (or empty) substitution token. I apologize for wasting this group's time. I misunderstood the RDFa documentation myself, and am using full URIs within the property attribute, too. I guess when I validated my RDFa only page (http://missourigreen.burningbird.net) with the W3C validator, and it gave me a valid result for RDFa, I assumed I was doing it correctly. Oddly enough, the RDF extractor worked with my erroneous use, too. This presents a dilemma, as I don't now know how to represent the RDFa that will work either with the way the standard is now, or some nebulous future time in an acceptable format for HTML5. I'm embarrassed that I wasted the group's time, especially when I obviously don't have the abilities to contribute to the group, or to participate. I'll refrain from responding to any future email. Shelley > > cheers, > > Dan > > -- > http://danbri.org/ >
attached mail follows:
Dan Brickley wrote: > On 19/1/09 15:42, Henri Sivonen wrote: > >>> I've been making some ill-documented tests in >>> http://svn.foaf-project.org/foaftown/2009/rdfa/tests/ ... trying to >>> find middle ground between current RDFa parser behaviour and something >>> that can work in HTML5. >> >> Thanks. > > (current svn mime types now documented in > http://svn.foaf-project.org/foaftown/2009/rdfa/tests/mime.sh) > >>> It does seem that the RDFa tools should (although they don't all >>> currently) require the ' xmlns:http="http:" ' hack. In other words I >>> was over-optimistic in thinking this was legal RDFa without the >>> xmlns:http hack. But that's so clearly a hack that I can imagine an >>> errata being possible. >> >> Do they 'work', though, without it being 'legal'? > > I started trying to answer that in this email, but it quickly became > long and tangled. I tried t6.html (html5-ish, served as text/html, > using xmlns:http='http' hack) and t7.html (as t6.html minus the xmlns > part). The results of trying in 6 different RDFa parsers are roughly: > > * more than half can swallow this "verbose form" of RDFa, so long as > the xmlns: is present. > * none of the parsers are happy if the xmlns:http is removed > * I didn't test a version using XHTML boilerplate or mimetype yet I do have a site you can use for testing with the XHTML mimetype, but you might have to give me advice on what to change. It's at http://missourigreen.burningbird.net. I've been adding and modifying based on the emails, but at the moment am using the RDFa DOCTYPE, as this seems to work with my ARC2 library, which doesn't like the namespace workaround. This site doesn't use SVG, so it allows testing on just the RDFa stuff. Shelley
attached mail follows:
On Jan 18, 2009, at 21:45, Dan Brickley wrote: > If people can control their urge to use namespace abbreviations, and > stick to URIs directly, would this make your DOM-oriented concerns > go away? Yes, it would make my DOM Consistency concern go away if the urge were thus controlled for both HTML and XHTML. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
On Sun, Jan 18, 2009 at 1:34 PM, Henri Sivonen <hsivonen@iki.fi> wrote: > On Jan 18, 2009, at 01:32, Shelley Powers wrote: > >> Are you then saying that this will be a showstopper, and there will never >> be either a workaround or compromise? > > Are the RDFa TF open to compromises that involve changing the XHTML side of > RDFa not to use attribute whose qualified name has a colon in them to > achieve DOM Consistency by changing RDFa instead of changing parsing? Just so that we have all of the data available to make an informed decision, do we have examples of how it would "break the web" if attributes which started with the characters "xmlns:" (and *only* those attribute) were placed into the DOM exactly as they would be when those bytes are processed as XHTML? Notes: I am *not* suggesting anything just yet, other than the gathering of this data. I also recognize that this would require a parsing change by browser vendors, which also is a cost that needs to be factored in. But right now, I am interested in how it would affect the web if this were done. > -- > Henri Sivonen > hsivonen@iki.fi > http://hsivonen.iki.fi/ - Sam Ruby
attached mail follows:
On Sat, Jan 17, 2009 at 5:51 PM, Henri Sivonen <hsivonen@iki.fi> wrote: > On Jan 17, 2009, at 22:35, Shelley Powers wrote: > >> Generally, though, RDFa is based on reusing a set of attributes already >> existing in HTML5, and adding a few more. > > Also, RDFa uses CURIEs which in turn use the XML namespace mapping context. > >> I would assume no differences in the DOM based on XHTML or HTML. > > The assumption is incorrect. > > Please compare > http://hsivonen.iki.fi/test/moz/xmlns-dom.html > and > http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml > > Same bytes, different media type. The W3C Recommendation for DOM also describes a readonly attribute on Attr named 'name'. Discuss. >> I put together a very crude demonstration of JavaScript access of a >> specific RDFa attribute, about. It's temporary, but if you go to my main web >> page,http://realtech.burningbird.net, and look in the sidebar for the click >> me text, it will traverse each div element looking for an "about" attribute, >> and then pop up an alert with the value of the attribute. I would use >> console rather than alert, but I don't believe all browsers support console, >> yet. > > This misses the point, because the inconsistency is with attributes named > xmlns:foo. There is a similar inconsistency in how xml:lang is handled. Discuss. > -- > Henri Sivonen > hsivonen@iki.fi > http://hsivonen.iki.fi/ - Sam Ruby
attached mail follows:
On Jan 18, 2009, at 02:02, Sam Ruby wrote: > On Sat, Jan 17, 2009 at 5:51 PM, Henri Sivonen <hsivonen@iki.fi> > wrote: >> On Jan 17, 2009, at 22:35, Shelley Powers wrote: >> >>> Generally, though, RDFa is based on reusing a set of attributes >>> already >>> existing in HTML5, and adding a few more. >> >> Also, RDFa uses CURIEs which in turn use the XML namespace mapping >> context. >> >>> I would assume no differences in the DOM based on XHTML or HTML. >> >> The assumption is incorrect. >> >> Please compare >> http://hsivonen.iki.fi/test/moz/xmlns-dom.html >> and >> http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml >> >> Same bytes, different media type. > > The W3C Recommendation for DOM also describes a readonly attribute on > Attr named 'name'. Discuss. I have added this to the test cases. In the DOM API, you can use the namespace-unaware DOM Level 1 view to make both cases look the same upon getting a parser-inserted value. (This is, of course, totally against namespace-aware programming practices, and in non-browser apps, the API might not even expose qnames or higher-level technologies like RELAX NG or XPath can't trigger on them.) But it's too early to declare victory. Surely we want also scripted setters that mutate the DOM into a state that could have been the result of a parse. Now we have tentatively seen that DOM Level 1 APIs seem to do what we want. So let's try using setAttribute(): http://hsivonen.iki.fi/test/moz/xmlns-dom-setter.html The result looks the same as the HTML case earlier: http://hsivonen.iki.fi/test/moz/xmlns-dom.html But now, the XHTML side using the setter: http://hsivonen.iki.fi/test/moz/xmlns-dom-setter.xhtml ...gives a result that is different from the parser-inserted attribute XHTML: http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml Furthermore, the resulting DOM is no longer serializable as XML 1.0. So let's move to a less intuitive case and use the namespace-aware Level 2 setter while assuming the use of the namespace-unaware Level 1 getter: http://hsivonen.iki.fi/test/moz/xmlns-dom-setter-ns.xhtml Looks good compared to the parser-inserted XHTML case: http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml But now, the HTML side is broken: http://hsivonen.iki.fi/test/moz/xmlns-dom-setter-ns.html vs. http://hsivonen.iki.fi/test/moz/xmlns-dom.html >>> I put together a very crude demonstration of JavaScript access of a >>> specific RDFa attribute, about. It's temporary, but if you go to >>> my main web >>> page,http://realtech.burningbird.net, and look in the sidebar for >>> the click >>> me text, it will traverse each div element looking for an "about" >>> attribute, >>> and then pop up an alert with the value of the attribute. I would >>> use >>> console rather than alert, but I don't believe all browsers >>> support console, >>> yet. >> >> This misses the point, because the inconsistency is with attributes >> named >> xmlns:foo. > > There is a similar inconsistency in how xml:lang is handled. Discuss. The xml:lang DOM inconsistency has lead to a situation where the xml:lang/lang area in Validator.nu has has the highest incidence of validator buts per spec sentence of all areas of HTML5. You've reported at least one of those bugs. The amount of developer time needed to get it right was ridiculously high. fantasai recently wrote: “Unless you're working on a CSS layout engine yourself, the level of detail, complex interactions with the rest of CSS, and design and implementation constraints we need to deal with here are more complicated than you can imagine.” (Source: http://fantasai.inkedblade.net/weblog/2009/layout-is-expensive/) From my experience with Validator.nu (that doesn't even have a DOM!) I think I can say: Unless you're working on a software product whose code reuse between HTML and XHTML depends on the DOM Consistency Design principle, the badness caused by violations of the DOM Consistency Design principle is more complicated than you can imagine. (Where 'you' is not you, Sam, but the generic English you.) xml:lang was introduced by people who were designing for an XML universe when it seemed that would be the way the world would go, so they can be forgiven, and the WHATWG can clean up the mess. Likewise, the syntax that the SVG WG chose made sense given that they were designing for an XML world. It can be accepted as legacy, and HTML5 parser writers can spend time optimizing the conditional camel casing. RDFa, on the other hand, was created by people who fully expected it to be served as text/html, even though they called it something like XHTML 1.1 plus RDFa instead of calling it HTML5. Furthermore, when they saw they wanted to have RDFa in HTML5, too, instead of addressing HTML issues then, they just continued pushing towards REC. It's easily looks like this was done so that RDFa could be presented as a done deal that HTML5 needs to deal with instead of something whose details are negotiable. Creating a new mess that would have been easily avoidable is not similarly forgivable. Also, it sets in very bad precedent if we allow other groups to keep us on the treadmill by injecting new HTML-hostile features and expecting us to spend cycles to sort them out by "working the issues". -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
Ian Hickson wrote: > On Sat, 17 Jan 2009, Sam Ruby wrote: > >> Shelley Powers wrote: >> >>> So, why accept that we have to use MathML in order to solve the >>> problems of formatting mathematical formula? Why not start from >>> scratch, and devise a new approach? >>> >> Ian explored (and answered) that here: >> >> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-April/014372.html >> >> Key to Ian's decision was the importance of DOM integration for this >> vocabulary. If DOM integration is essential for RDFa, then perhaps the >> same principles apply. If not, perhaps some other principles may apply. >> > > Sam's point here bears repeating, because there seems to be an impression > that we took on SVG and MathML without any consideration, while RDF is > getting an unfair reception. > > On the contrary, SVG and MathML got the same reception. For MathML, for > instance, a number of options were very seriously considered, most notably > LaTeX. For SVG, we considered a variety of options including VML. > > I would encourage people to read the e-mail Sam cited: > > http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-April/014372.html > > It's long, but the start of it is a summary of what was considered and > shows that the same process derived from use cases was used for SVG and > MathML as is being used on this thread here. > > I'm not doubting the effort that went into getting MathML and SVG accepted. I've followed the effort associated with SVG since the beginning. I'm not sure if the same procedure was also applied to the canvas object, as well as the SQL query capability. Will assume so. The point I'm making is that you set a precedent, and a good one I think: giving precedence to "not invented here". In other words, to not re-invent new ways of doing something, but to look for established processes, models, et al already in place, implemented, vetted, etc, that solve specific problems. Now that you have accepted a use case, Martin's, and we've established that RDFa solves the problem associated with the use case, the issue then becomes is there another data model already as vetted, documented, implemented that would better solve the problem. I propose that RDFa is the best solution to the use case Martin supplied, and we've shown how it is not a disruptive solution to HTML5. The fact that it is based on RDF, a mature, well documented, widely used model with many different implementations is a perk. Shelley
attached mail follows:
On Jan 17, 2009, at 21:38, Shelley Powers wrote: > I'm not doubting the effort that went into getting MathML and SVG > accepted. I've followed the effort associated with SVG since the > beginning. > > I'm not sure if the same procedure was also applied to the canvas > object, as well as the SQL query capability. Will assume so. Note that SVG, MathML and SQL have had different popularity trajectories in top four browser engines than RDF. SVG is going up. At the time it was included in HTML5 (only to be commented out shortly thereafter), three of the top browser engines implemented SVG for retained-mode vector graphics and their SVG support was actively being improved. (One of the top four engines implemented VML, though.) At the time MathML was included in HTML5, it was supported by Gecko with renewed investment into it as part of the Cairo migration. Also, Opera added some MathML features at that time. Thus, two of the top four engines had active MathML development going on. Further, one of the major MathML implementations is an ActiveX control for IE. When SQL was included in HTML5, Apple (in WebKit) and Google (in Gears) had decided to use SQLite for this functionality. Even though Firefox doesn't have a Web-exposed database, Firefox also already ships with embedded SQLite. At that point it would have been futile for HTML5 to go against the flow of implementations. The story of RDF is very different. Of the top four engines, only Gecko has RDF functionality. It was implemented at a time when RDF was a young W3C REC and stuff that were W3C RECs were implemented less critically than nowadays. Unlike SVG and MathML, the RDF code isn't actively developed (see hg logs). Moreover, the general direction seems to be away from using RDF data sources in Firefox internally. Meanwhile, the feed example you gave--RSS 1.0--shows how the feed spec community knowingly moved away from RDF with RSS 2.0 and Atom. Furthermore, RSS 1.0 usually isn't parsed into an RDF graph but is treated as XML instead. If RSS 1.0 is evidence, it's evidence *against* RDF. > The point I'm making is that you set a precedent, and a good one I > think: giving precedence to "not invented here". In other words, to > not re-invent new ways of doing something, but to look for > established processes, models, et al already in place, implemented, > vetted, etc, that solve specific problems. Now that you have > accepted a use case, Martin's, and we've established that RDFa > solves the problem associated with the use case, the issue then > becomes is there another data model already as vetted, documented, > implemented that would better solve the problem. Clearly, RDFa wasn't properly vetted--as far as the desire to deploy it in text/html goes--when the outcome was that it ended up using markup that doesn't parse into the DOM the same way in HTML and XML. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
Henri Sivonen wrote: > On Jan 17, 2009, at 21:38, Shelley Powers wrote: > >> I'm not doubting the effort that went into getting MathML and SVG >> accepted. I've followed the effort associated with SVG since the >> beginning. >> >> I'm not sure if the same procedure was also applied to the canvas >> object, as well as the SQL query capability. Will assume so. > > Note that SVG, MathML and SQL have had different popularity > trajectories in top four browser engines than RDF. > > SVG is going up. At the time it was included in HTML5 (only to be > commented out shortly thereafter), three of the top browser engines > implemented SVG for retained-mode vector graphics and their SVG > support was actively being improved. (One of the top four engines > implemented VML, though.) > > At the time MathML was included in HTML5, it was supported by Gecko > with renewed investment into it as part of the Cairo migration. Also, > Opera added some MathML features at that time. Thus, two of the top > four engines had active MathML development going on. Further, one of > the major MathML implementations is an ActiveX control for IE. > > When SQL was included in HTML5, Apple (in WebKit) and Google (in > Gears) had decided to use SQLite for this functionality. Even though > Firefox doesn't have a Web-exposed database, Firefox also already > ships with embedded SQLite. At that point it would have been futile > for HTML5 to go against the flow of implementations. > > The story of RDF is very different. Of the top four engines, only > Gecko has RDF functionality. It was implemented at a time when RDF was > a young W3C REC and stuff that were W3C RECs were implemented less > critically than nowadays. Unlike SVG and MathML, the RDF code isn't > actively developed (see hg logs). Moreover, the general direction > seems to be away from using RDF data sources in Firefox internally. > Now wait a second, you're changing the parameters of the requirements. Before, the criteria was based on the DOM. Now you're saying that the browsers actually have to do with something with it. Who is to say what the browsers will do with RDF in the future? In addition, is that the criteria for pages on the web -- that every element in them has to result in different behaviors in browsers, only? What about other user agents? That seems to me to be looking for RDFa sized holes and them throwing them into the criteria, specifically to trip up RDF, and hence, RDFa. > Meanwhile, the feed example you gave--RSS 1.0--shows how the feed spec > community knowingly moved away from RDF with RSS 2.0 and Atom. > Furthermore, RSS 1.0 usually isn't parsed into an RDF graph but is > treated as XML instead. If RSS 1.0 is evidence, it's evidence > *against* RDF. > >> The point I'm making is that you set a precedent, and a good one I >> think: giving precedence to "not invented here". In other words, to >> not re-invent new ways of doing something, but to look for >> established processes, models, et al already in place, implemented, >> vetted, etc, that solve specific problems. Now that you have accepted >> a use case, Martin's, and we've established that RDFa solves the >> problem associated with the use case, the issue then becomes is there >> another data model already as vetted, documented, implemented that >> would better solve the problem. > > Clearly, RDFa wasn't properly vetted--as far as the desire to deploy > it in text/html goes--when the outcome was that it ended up using > markup that doesn't parse into the DOM the same way in HTML and XML. > SVG and MathML were both created as XML, and hence were not vetted for text/html, either. And yet, here they are. Well, here they'll be, eventually. Come to that -- I don't think the creators of SQL actually ever expected that someday SQL queries would be initiated from HTML pages. Shelley
attached mail follows:
On Jan 17, 2009, at 22:43, Shelley Powers wrote: > Henri Sivonen wrote: >> On Jan 17, 2009, at 21:38, Shelley Powers wrote: >> >>> I'm not doubting the effort that went into getting MathML and SVG >>> accepted. I've followed the effort associated with SVG since the >>> beginning. >>> >>> I'm not sure if the same procedure was also applied to the canvas >>> object, as well as the SQL query capability. Will assume so. >> >> Note that SVG, MathML and SQL have had different popularity >> trajectories in top four browser engines than RDF. >> >> SVG is going up. At the time it was included in HTML5 (only to be >> commented out shortly thereafter), three of the top browser engines >> implemented SVG for retained-mode vector graphics and their SVG >> support was actively being improved. (One of the top four engines >> implemented VML, though.) >> >> At the time MathML was included in HTML5, it was supported by Gecko >> with renewed investment into it as part of the Cairo migration. >> Also, Opera added some MathML features at that time. Thus, two of >> the top four engines had active MathML development going on. >> Further, one of the major MathML implementations is an ActiveX >> control for IE. >> >> When SQL was included in HTML5, Apple (in WebKit) and Google (in >> Gears) had decided to use SQLite for this functionality. Even >> though Firefox doesn't have a Web-exposed database, Firefox also >> already ships with embedded SQLite. At that point it would have >> been futile for HTML5 to go against the flow of implementations. >> >> The story of RDF is very different. Of the top four engines, only >> Gecko has RDF functionality. It was implemented at a time when RDF >> was a young W3C REC and stuff that were W3C RECs were implemented >> less critically than nowadays. Unlike SVG and MathML, the RDF code >> isn't actively developed (see hg logs). Moreover, the general >> direction seems to be away from using RDF data sources in Firefox >> internally. >> > > Now wait a second, you're changing the parameters of the requirements. I'm explaining how SVG, MathML and SQL are different from RDF(a) in a way that's very relevant to the practice of including stuff in the spec. > Before, the criteria was based on the DOM. Now you're saying that > the browsers actually have to do with something with it. > > Who is to say what the browsers will do with RDF in the future? http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-August/016045.html is a message where one of the editors of RDFa mentions RDFa together with "client-side tools like Ubiquity". That Ubiquity is a Firefox extension rather than part of the core feature set is an implementation detail. I read this as envisioning browser-sensitivity to RDFa. > In addition, is that the criteria for pages on the web -- that every > element in them has to result in different behaviors in browsers, > only? No. However, most of the time, when people publish HTML, they do it to elicit browser behavior when a user loads the HTML document in a browser. >> Meanwhile, the feed example you gave--RSS 1.0--shows how the feed >> spec community knowingly moved away from RDF with RSS 2.0 and Atom. >> Furthermore, RSS 1.0 usually isn't parsed into an RDF graph but is >> treated as XML instead. If RSS 1.0 is evidence, it's evidence >> *against* RDF. >> >>> The point I'm making is that you set a precedent, and a good one I >>> think: giving precedence to "not invented here". In other words, >>> to not re-invent new ways of doing something, but to look for >>> established processes, models, et al already in place, >>> implemented, vetted, etc, that solve specific problems. Now that >>> you have accepted a use case, Martin's, and we've established that >>> RDFa solves the problem associated with the use case, the issue >>> then becomes is there another data model already as vetted, >>> documented, implemented that would better solve the problem. >> >> Clearly, RDFa wasn't properly vetted--as far as the desire to >> deploy it in text/html goes--when the outcome was that it ended up >> using markup that doesn't parse into the DOM the same way in HTML >> and XML. >> > SVG and MathML were both created as XML, and hence were not vetted > for text/html, either. And yet, here they are. Well, here they'll > be, eventually. Actually, the creators of MathML had the good sense and foresight to avoid name collisions with HTML even after Namespaces theoretically gave them a permission not to care. Unlike the creators of RDFa, the creators of SVG weren't pushing for inclusion in HTML5 or saying that it's OK to serve their XML as text/ html--quite the contrary. And the integration would have been nicer if the SVG WG had had the same prudence as the Math WG. > Come to that -- I don't think the creators of SQL actually ever > expected that someday SQL queries would be initiated from HTML pages. I don't see the creators of SQL asking for the inclusion of their stuff in HTML after building on another spec that is well-known to be trouble with HTML (Namespaces in XML in the RDFa case). -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
attached mail follows:
On 18/1/09 00:24, Henri Sivonen wrote: > No. However, most of the time, when people publish HTML, they do it to > elicit browser behavior when a user loads the HTML document in a browser. Most users of the Web barely know what a browser is, let alone HTML. They're just putting information online; perhaps into a closed site (eg. facebook), perhaps into a public-facing site (eg. a blog), or perhaps into 1:1, group or IM messaging (eg. webmail). HTML figures in all these scenarios. Browsers or HTML rendering code too, of course. But I don't think we can jump from that to claims about user intent, and more than their use of the Internet signifies an intent to have their information chopped up into packets and transmitted according to the rules of TCP/IP. The reason for my pedantry here is not to be argumentative, but just to suggest that this (otherwise very natural) thinking leads us to forget about the other major consumers of HTML - search engines. Having their stuff found and linked by other is often a big part of the motivation for putting stuff online. HTML parsing is involved, impact on the needs and interests of mainstream users is involved; but it's not clear whether all/any/many users 'do it to elicit search engine behaviour when indexing the HTML document'. Aren't search engines equally important consumers of HTML? Perhaps they're more simple-minded in their behaviour than a full UI browser. But from the user side, there's only slightly more value in being readable without being findable than vice-versa... cheers, Dan -- http://danbri.org/
attached mail follows:
On Saturday 2009-01-17 22:25 +0200, Henri Sivonen wrote: > The story of RDF is very different. Of the top four engines, only Gecko > has RDF functionality. It was implemented at a time when RDF was a young > W3C REC and stuff that were W3C RECs were implemented less critically > than nowadays. Actually, the implementation was well underway *before* RDF was a W3C REC, done by a team led by one of the designers of RDF. In other words, it was in Gecko because there were RDF advocates at Netscape (although advocating, I think, a somewhat different RDF than the current RDF recommendations). Compare the dates on: http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/ http://www.w3.org/TR/1999/PR-rdf-schema-19990303/ http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=mozilla%2Frdf&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=1998-01-01&maxdate=1999-01-01&cvsroot=%2Fcvsroot -David -- L. David Baron http://dbaron.org/ Mozilla Corporation http://www.mozilla.com/
attached mail follows:
On 17/1/09 23:30, L. David Baron wrote: > On Saturday 2009-01-17 22:25 +0200, Henri Sivonen wrote: >> The story of RDF is very different. Of the top four engines, only Gecko >> has RDF functionality. It was implemented at a time when RDF was a young >> W3C REC and stuff that were W3C RECs were implemented less critically >> than nowadays. > > Actually, the implementation was well underway *before* RDF was a > W3C REC, done by a team led by one of the designers of RDF. In > other words, it was in Gecko because there were RDF advocates at > Netscape (although advocating, I think, a somewhat different RDF > than the current RDF recommendations). Yes, Netscape had this stuff when it was still called MCF. W3C's RDF took ideas from several input activities, including MCF, Microsoft XML-Data, PICS, and requirements from the Dublin Core community. But it looks more like MCF than the others. MCF was originally proposed by R.V.Guha at Apple; it followed him from Apple to Netscape in 1997, and when the Mozilla sources were later thrown over the wall, there was a lot of MCF in there. MCF White Paper, 1996 http://www.guha.com/mcf/wp.html spec, http://www.guha.com/mcf/mcf_spec.html While this was at Apple, there was a product/viewer called HotSauce / Project X, and some early grassroots adoption of MCF as a text format for publishing website summaries. http://web.archive.org/web/19961224042753/http://hotsauce.apple.com/ http://downlode.org/Etext/MCF/macworld_online.html It was at this stage that dialog started with the Library scene and Dublin Core folk, about how it related to their notion of catalogue records, and to the evolving PICS labelling system, format and protocol being built at W3C. eg. http://www.ssrc.hku.hk/tb-issues/TidBITS-355.html#lnk3 http://web.archive.org/web/19980215092626/http://www.ariadne.ac.uk/issue7/mcf/ The MCF/RSS relationship is a whole other story, eg. see http://www.scripting.com/midas/mcf.html http://www.scripting.com/frontier/siteMap.mcf http://web.archive.org/web/19990222114619/http://www.xspace.net/hotsauce/sites.html Then the thing moved to Netscape. Tim Bray helped Guha XMLize the spec, which was submitted to W3C in 1997, where it joined the existing efforts to extend PICS to include text labels and more structure - http://www.w3.org/TR/NOTE-pics-ng-metadata http://www.daml.org/committee/minutes/2000-12-07-RDF-design-rationale.ppt http://searchenginewatch.com/2165291 So the June 97 spec was http://www.w3.org/TR/NOTE-MCF-XML/ .. you can see from the figures that the technology was very RDF-shaped, http://www.w3.org/TR/NOTE-MCF-XML/#sec2. Also a tutorial at http://www.w3.org/TR/NOTE-MCF-XML/MCF-tutorial.html Netscape press release accompanying June 13 1997 submission - http://web.archive.org/web/20010308150737/http://cgi.netscape.com/newsref/pr/newsrelease432.html Less than 4 months later, this came out as a W3C Working Draft called "RDF": http://www.w3.org/TR/WD-rdf-syntax-971002/ ... in a shape that didn't really change much subsequently. RDF wasn't the same design exactly as MCF but the ancestry is clear enough. And getting back to the original point, yeah Mozilla had MCF sitemaps code in there. Revisiting http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/9-8-97/312711&EDATE= http://www.irt.org/articles/js086/ and the like, it's clear that RDF was very much a child of the 1st browser wars. In retrospect the direction it took within Mozilla didn't do anyone much good. The earliest MCF apps were about public data on the public Web, feeds, sitemaps and so on. But eventually the ambition to be a complete information hub led to MCF/RDF being used for pretty much everything *inside* Mozilla. And I don't think that turned out very well. http://www.mozilla.org/rdf/doc/api.html etc. The RDF vocabularies it used were poorly or never documented (I have some guilt there) and when Netscape went away, the incentive to connect to public data on the Web seemed to drop (no more tie-ins with the 'what's related' annotation server, 'dmoz' etc.). RDF drifted from being a Web data format to be consumed *by* the browser, into an engineering tool to be used in the construction *of* the browser, ie. as a datasource abstraction within Mozilla APIs. While I can certainly see the value of having a unified view of mail, news, sitemaps, and so on, the Moz code at the time wasn't really in a position to match up to the language in the press releases. Not making any particular point here beyond connecting up to the MCF heritage... cheers, Dan -- http://danbri.org/
attached mail follows:
On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers <shelleyp@burningbird.net> wrote: > > I propose that RDFa is the best solution to the use case Martin supplied, > and we've shown how it is not a disruptive solution to HTML5. Others may differ, but my read is that the case is a strong one. But I will caution you that a little patience is in order. SVG is not a done deal yet. I've been involved in a number of standards efforts, and I've never seen a case of "proposed on a Saturday morning, decided on a Saturday afternoon". One demo is not conclusive. Now you mention that there exists a number of libraries. I think that's important. Very important. Possibly conclusive. But back to expectations. I've seen references elsewhere to Ian being booked through the end of this quarter. I may have misheard, but in any case, my point is the same: if this is awaiting something from Ian, it will be prioritized and dealt with accordingly. If, however, some of the legwork is done for Ian, this may help accelerate the effort. Even little things may help a lot. I know what I'm about to say may be unpopular, but I'll say it anyway: take a few good examples of RDFa and run them through Henri's validator. The validator will helpfully indicate exactly what areas of the spec would need to be updated in order to accommodate RDFa. The next step would be to take a look at those sections. If the update is obvious and straightforward, perhaps nothing more is required. But if not, researching into the options and making recommendations may help. - Sam Ruby
attached mail follows:
Sam Ruby wrote: > On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers > <shelleyp@burningbird.net> wrote: > >> I propose that RDFa is the best solution to the use case Martin supplied, >> and we've shown how it is not a disruptive solution to HTML5. >> > > Others may differ, but my read is that the case is a strong one. But > I will caution you that a little patience is in order. SVG is not a > done deal yet. I've been involved in a number of standards efforts, > and I've never seen a case of "proposed on a Saturday morning, decided > on a Saturday afternoon". One demo is not conclusive. Now you > mention that there exists a number of libraries. I think that's > important. Very important. Possibly conclusive. > I am patient. Look at me? I make extensive use of both SVG and RDF -- that is the mark of a patient woman. > But back to expectations. I've seen references elsewhere to Ian being > booked through the end of this quarter. I may have misheard, but in > any case, my point is the same: if this is awaiting something from > Ian, it will be prioritized and dealt with accordingly. If, however, > some of the legwork is done for Ian, this may help accelerate the > effort. > First of all, whatever happens has to happen with either vetting by the RDF/RDFa folks, if not their active help. This is my way of saying, I'd be willing to do much of the legwork, but I want to make I don't represent RDFa incorrectly. Secondly, my finances have been caught up in the current downturn, and my first priority has to be on the hourly work and odd jobs I'm getting to keep afloat. Which means that I can't always guarantee 20+ hours a week on a task, nor can I travel. Anywhere. But if both are acceptable conditions, I'm willing to help with tasks. > Even little things may help a lot. I know what I'm about to say may > be unpopular, but I'll say it anyway: take a few good examples of RDFa > and run them through Henri's validator. The validator will helpfully > indicate exactly what areas of the spec would need to be updated in > order to accommodate RDFa. The next step would be to take a look at > those sections. If the update is obvious and straightforward, perhaps > nothing more is required. But if not, researching into the options > and making recommendations may help. > > Tasks including this one. Shelley > - Sam Ruby > >
attached mail follows:
On Sat, Jan 17, 2009 at 3:51 PM, Shelley Powers <shelleyp@burningbird.net> wrote: > Sam Ruby wrote: >> >> On Sat, Jan 17, 2009 at 2:38 PM, Shelley Powers >> <shelleyp@burningbird.net> wrote: >> >>> >>> I propose that RDFa is the best solution to the use case Martin supplied, >>> and we've shown how it is not a disruptive solution to HTML5. >>> >> >> Others may differ, but my read is that the case is a strong one. But >> I will caution you that a little patience is in order. SVG is not a >> done deal yet. I've been involved in a number of standards efforts, >> and I've never seen a case of "proposed on a Saturday morning, decided >> on a Saturday afternoon". One demo is not conclusive. Now you >> mention that there exists a number of libraries. I think that's >> important. Very important. Possibly conclusive. >> > > I am patient. Look at me? I make extensive use of both SVG and RDF -- that > is the mark of a patient woman. >> >> But back to expectations. I've seen references elsewhere to Ian being >> booked through the end of this quarter. I may have misheard, but in >> any case, my point is the same: if this is awaiting something from >> Ian, it will be prioritized and dealt with accordingly. If, however, >> some of the legwork is done for Ian, this may help accelerate the >> effort. >> > > First of all, whatever happens has to happen with either vetting by the > RDF/RDFa folks, if not their active help. This is my way of saying, I'd be > willing to do much of the legwork, but I want to make I don't represent RDFa > incorrectly. > > Secondly, my finances have been caught up in the current downturn, and my > first priority has to be on the hourly work and odd jobs I'm getting to keep > afloat. Which means that I can't always guarantee 20+ hours a week on a > task, nor can I travel. Anywhere. > > But if both are acceptable conditions, I'm willing to help with tasks. I don't see any of that as being a problem. >> Even little things may help a lot. I know what I'm about to say may >> be unpopular, but I'll say it anyway: take a few good examples of RDFa >> and run them through Henri's validator. The validator will helpfully >> indicate exactly what areas of the spec would need to be updated in >> order to accommodate RDFa. The next step would be to take a look at >> those sections. If the update is obvious and straightforward, perhaps >> nothing more is required. But if not, researching into the options >> and making recommendations may help. > > Tasks including this one. Excellent. Well, all except for the downturn thing, but you know what I mean. In order to prevent any misunderstandings: it is not for me to assign work. In fact, nobody here is in such a position. People simply note things that need to be done, and do the ones that interest them, at the pace at which they are able. And communicate copiously. If you need help in vetting, I am given to understand that there is a small pocket of RDF enthusiasm in the W3C. :-P > Shelley - Sam Ruby
attached mail follows:
Ian Hickson wrote: > On Sun, 18 Jan 2009, Shelley Powers wrote: > >>> The more use cases there are, the better informed the results will be. >>> >> The point isn't to provide use cases. The point is to highlight a >> serious problem with this working group--there is a mindset of what the >> future of HTML will look like, and the holders of the mindset brook no >> challenge, tolerate no disagreement, and continually move to quash any >> possibility of asserting perhaps even the faintest difference of >> opinion. >> > > I'm certainly sad that this is the impression I have given. I'd like to > clarify for everyone's sake that this mailing list is definitely open to > any proposals, any opinions, any disagreement. The only thing I ask is > that people use rational debate, back up their opinions with logical > arguments, present research to justify their claims, and derive proposals > from user needs. > > I've been especially critical of you, which isn't fair. At the same time, as you have said yourself, you are a "benevolent dictator", which seems to me to not be the best strategy for an inclusive HTML for the future. I know I'm not comfortable with the concept. But I'm also late to this group, and shouldn't disrupt if the strategy works. > >> Regardless, I got the point in the comment. That, combined with this >> email from Ian, tells us that it doesn't matter how our arguments run, >> the logic of our debate, the rightness of our cause--he is the final >> arbiter, and he does not want RDFa. >> > > For the record, I am as open to us including a feature like RDFa as I am > to us including a feature like MathML, SVG, or indeed anything else. While > I may present a devil's advocate position to stimulate critical > consideration of proposals, this does not mean that my mind is made up. If > my mind was made up, I wouldn't be asking for use cases, and I wouldn't > be planning to investigate the issue further in April. > > > There is a fine difference between being the devil's advocate, and the devil's front door made of thick oak, with heavy brass fittings. How does one know if one has provided a use case in a format that is more likely to meet a successful outcome, than not. Is the criteria documented somewhere? It's difficult to provide use cases with the twenty questions approach. What are the criteria by which a possible solution to a problem is judged? Is there a consistent set of questions asked? Tests made? A certain number of implementations? Again, is this documented somewhere? >> I am not paid by Google, or Mozilla, or IBM to continue throwing away my >> time, arguing for naught. >> > > It may be worth pointing out that, many of our most active participants > are volunteers, not paid by anyone to participate. Indeed I myself spent > many years contributing to the standards community while unemployed or > while a student. I am sorry you feel that you need to be compensated for > your participation in the standards community, and wish you the best of > luck in finding a suitable employer. > > The point I was trying to make, and forgive me if the my writing was too subtle, is that it's not the fact that the work will time, but whether the time will be well spent. Operating in the dark and tossing use cases in hopes they stick against the wall, without understanding criteria is not a particularly good use of time. However, having specific tasks that meet a given goal, and knowing that the goal is stable, and not a moving target, goes a long way to ensuring that the time spent has value. Knowing that one can, with diligence, ensure that the best result occurs is a good use of time. Spitting into the wind, at the whim and whimsy of a benevolent dictator, is not a good use of time. > As far as Google goes, we have no corporate opinion either way on the > topic of RDFa in HTML5. We do, however, encourage the continued practice > of basing decisions on data rather than hopes. > > Bully for Google. Shelley
attached mail follows:
On Jan 18, 2009, at 8:43 AM, Shelley Powers wrote: > Take you guys seriously...OK, yeah. > > I don't doubt that the work will be challenging, or problematical. > I'm not denying Henri's claim. And I didn't claim to be the one who > would necessarily come up with the solutions, either, but that I > would help in those instances that I could. > > What I did express in the later emails, is what others have > expressed who have asked about RDFa in HTML5: are we wasting our > time even trying? That it seems like a decision has already been > made, and we're spinning our wheels even attempting to find > solutions. There's a difference between not being willing to > negotiate, compromise, work the problem, and just spitting into the > wind for no good. Based on past experience, I would say that you are not wasting your time. Evidence-based arguments, explication of use cases, solutions to technical problems, persuading third parties, and getting implementation traction (for example in popular JavaScript libraries, major browser engines, popular authoring/publishing software) will all affect how a feature is seen. As past examples, allowing XML-like self-closing tag syntax for void elements in text/html, and ability to include SVG inline in text/html, are both features that were highly controversial and at times opposed by the editor and others. Nontheless we seem to be on track to have both of these in the spec. Note that in the case of SVG especially, the path from initial proposal to rough consensus to actual integration with the spec was a long one. In fact, integration in the spec is not yet fully complete due to some disputes about the details of the syntax. Another example is the "headers" attribute, and the more general issue of header association in tables. Though the "headers" attribute was controversial and once opposed by the editor, it is now in the spec. I believe that most of us here, while we may have our biases and preconceptions, will evaluate concrete technical arguments in good faith, and are prepared to change our minds. The fact is that people have changed positions in the past, Ian included. So nothing should be assumed to be a done deal, especially at this early stage of exploring metadata embedding and RDFa. >>> However, the debate ended as soon as Ian re-asserted his authority. >> >> Ian just gave an indication of when he's going to work on this >> again. That doesn't mean that research into e.g. DOM consistency >> can't happen meanwhile. It also doesn't mean that debate needs to >> stop. >> >> > No, Ian's listing of tasks pretty much precluded any input into the > decision making process other than his own. I never see "we" when > Ian writes, I only see "I". Ian intends to make an evaluation based on evidence and arguments presented. Presenting such evidence and arguments is input into the decision making process. That's how other changes to the spec that went against Ian's initial gut instinct happened. Indeed it is possible for Ian to be overruled if he is clearly blocking the consensus of the group(*), but so far that has not been necessary, even on controversial issues. I encourage you to provide input into the process, and not to get too frustrated if the process is not quick. Nor by the fact that some may initially (or even finally, when all is said and done) disagree with you. Regards, Maciej * - The HTML WG can take a vote which is binding at least in the W3C context or remove Ian as editor; and the WHATWG oversight group can remove Ian as editor or pressure him by virtue of having the authority to remove him.
attached mail follows:
Shelley Powers ha scritto: > > > The point I'm making is that you set a precedent, and a good one I > think: giving precedence to "not invented here". In other words, to > not re-invent new ways of doing something, but to look for established > processes, models, et al already in place, implemented, vetted, etc, > that solve specific problems. Now that you have accepted a use case, > Martin's, and we've established that RDFa solves the problem > associated with the use case, the issue then becomes *is there another > data model already as vetted, documented, implemented that would > better solve the problem*. > RDF in a separate XML-syntax file, perhaps. Just because that use case raised a privacy concern on informations to keep private anyway, and that's not a problem solvable at the document level with metadata; instead, keeping relevant metadata in a separate file would help a better access control. Also, a separate file would have the relevant informations ready for use, while embedding them with other content would force a load and parsing of the other content in search of relevant metadata (possible, of course, and not much of a problem, but not as clean and efficient). Moreover, it should be verified whether social-network service providers agree with such a requirement: I might avail of a compliant implementation to easily migrate from one service to another and leave the former, in which case why should a company open its inner infrastructure and database and invest resources for the benefit of a competitor accessing its data and consuming its bandwidth to catch its customers? (this is not the same interoperability issue for mail clients supporting different address book formats, minor vendors had to do that to improve their businness - and they didn't need to access a competitor's infrastructure). Perhaps, that might work if personal infos and relationships were handled by an external service on the same lines of an OpenID service allowing an automated identification by other services; but this would reduce social networks to be a kind of front-end for such a centralized management (and service providers might not like that). Also, in this case anonimity should be ensured (for instance, I might have met you in two different networks, but knew your identity in only one of them, and you might wish that no one knew you're the person behind the other nickname; this is possible taking different informations in different databases and with different access rights, and should be replicable when merging such infos -- on the other hand, if you knew my identity, you should be allowed to "fill in the blanks" somehow). Shelley Powers ha scritto: > Anne van Kesteren wrote: >> On Sun, 18 Jan 2009 17:15:34 +0100, Shelley Powers >> <shelleyp@burningbird.net> wrote: >>> And regardless of the fact that I jumped to conclusions about WhatWG >>> membership, I do not believe I was inaccurate with the earlier part >>> of this email. Sam started a new thread in the discussion about the >>> issues of namespace and how, perhaps we could find a way to work the >>> issues through with RDFa. My god, I use RDFa in my pages, and they >>> load fine with any browser, including IE. I have to believe its >>> incorporation into HTML5 is not the daunting effort that others make >>> it seem to be.' >> >> You ask us to take you seriously and consider your feedback, it would >> be nice if you took what e.g. Henri wrote seriously as well. >> Integrating a new feature in HTML is not a simple task, even if the >> new feature loads and renders fine in Internet Explorer. >> > Take you guys seriously...OK, yeah. > > I don't doubt that the work will be challenging, or problematical. I'm > not denying Henri's claim. And I didn't claim to be the one who would > necessarily come up with the solutions, either, but that I would help > in those instances that I could. It seems that you'd expect RDFa to be specced out before solving related problems (so to push their solution). I don't think that's the right path to follow, instead known issues must be solved before making a decision, so that the specification can tell exactly what developers must implement, eventually pointing out (after/while implementing) newer (hopefully minor) issues to be solved by refining the spec (which is a different task than specifying something known to be, let's say, "buggy" or uncertain). Everything, as always, IMHO WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Blu American Express: gratuita a vita! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8613&d=4-2
attached mail follows:
Calogero Alex Baldacchino wrote: > It seems that you'd expect RDFa to be specced out before solving related > problems (so to push their solution). I don't think that's the right path to > follow, instead known issues must be solved before making a decision, so > that the specification can tell exactly what developers must implement I think that an help in defining of the requirements around structured data, RDFa, metadata copy&paste, semantic links [1],etc could came from the W3C document "Use Cases and Requirements for Ontology and API for Media Object 1.0" [2] Take the requirements listed from "r01" to "r13" and replace the term "media objects" with "structured/linked data". [1] http://lists.w3.org/Archives/Public/public-html/2009Jan/0082.html [2] http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r01 -- Giovanni Gentili
attached mail follows:
Shelley Powers ha scritto: > > > The point I'm making is that you set a precedent, and a good one I > think: giving precedence to "not invented here". In other words, to > not re-invent new ways of doing something, but to look for established > processes, models, et al already in place, implemented, vetted, etc, > that solve specific problems. Now that you have accepted a use case, > Martin's, and we've established that RDFa solves the problem > associated with the use case, the issue then becomes *is there another > data model already as vetted, documented, implemented that would > better solve the problem*. > RDF in a separate XML-syntax file, perhaps. Just because that use case raised a privacy concern on informations to keep private anyway, and that's not a problem solvable at the document level with metadata; instead, keeping relevant metadata in a separate file would help a better access control. Also, a separate file would have the relevant informations ready for use, while embedding them with other content would force a load and parsing of the other content in search of relevant metadata (possible, of course, and not much of a problem, but not as clean and efficient). Moreover, it should be verified whether social-network service providers agree with such a requirement: I might avail of a compliant implementation to easily migrate from one service to another and leave the former, in which case why should a company open its inner infrastructure and database and invest resources for the benefit of a competitor accessing its data and consuming its bandwidth to catch its customers? (this is not the same interoperability issue for mail clients supporting different address book formats, minor vendors had to do that to improve their businness - and they didn't need to access a competitor's infrastructure). Perhaps, that might work if personal infos and relationships were handled by an external service on the same lines of an OpenID service allowing an automated identification by other services; but this would reduce social networks to be a kind of front-end for such a centralized management (and service providers might not like that). Also, in this case anonimity should be ensured (for instance, I might have met you in two different networks, but knew your identity in only one of them, and you might wish that no one knew you're the person behind the other nickname; this is possible taking different informations in different databases and with different access rights, and should be replicable when merging such infos -- on the other hand, if you knew my identity, you should be allowed to "fill in the blanks" somehow). Shelley Powers ha scritto: > Anne van Kesteren wrote: >> On Sun, 18 Jan 2009 17:15:34 +0100, Shelley Powers >> <shelleyp@burningbird.net> wrote: >>> And regardless of the fact that I jumped to conclusions about WhatWG >>> membership, I do not believe I was inaccurate with the earlier part >>> of this email. Sam started a new thread in the discussion about the >>> issues of namespace and how, perhaps we could find a way to work the >>> issues through with RDFa. My god, I use RDFa in my pages, and they >>> load fine with any browser, including IE. I have to believe its >>> incorporation into HTML5 is not the daunting effort that others make >>> it seem to be.' >> >> You ask us to take you seriously and consider your feedback, it would >> be nice if you took what e.g. Henri wrote seriously as well. >> Integrating a new feature in HTML is not a simple task, even if the >> new feature loads and renders fine in Internet Explorer. >> > Take you guys seriously...OK, yeah. > > I don't doubt that the work will be challenging, or problematical. I'm > not denying Henri's claim. And I didn't claim to be the one who would > necessarily come up with the solutions, either, but that I would help > in those instances that I could. It seems that you'd expect RDFa to be specced out before solving related problems (so to push their solution). I don't think that's the right path to follow, instead known issues must be solved before making a decision, so that the specification can tell exactly what developers must implement, eventually pointing out (after/while implementing) newer (hopefully minor) issues to be solved by refining the spec (which is a different task than specifying something known to be, let's say, "buggy" or uncertain). Everything, as always, IMHO WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Blu American Express: gratuita a vita! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8614&d=4-2
attached mail follows:
> Now wait a second, you're changing the parameters of the requirements. > Before, the criteria was based on the DOM. Now you're saying that the > browsers actually have to do with something with it. [Put "almost" in front of most words in the following.] The consistent DOM criteria is necessary but not not sufficient. Browsers doing something with it is a step towards sufficient. Without DOM consistency (or at least an agreed workaround), it almost doesn't matter how great RFDa is, because it isn't compatible. Once you have that consistency, then the questions can move on to the next step. That next step boils down to "Why bother?" Needing DOM integration of the information is a reason to bother. Browsers doing something with it is a reason to bother. Those aren't the only reasons to bother, but they are likely reasons, so people have asked about them. If you have other reasons, go ahead and offer those as well. (But "existing W3C standard" probably isn't strong enough.) -jJ
attached mail follows:
RDFa should sink or swim on its own merits, and if RDFa requires drastic changes to HTML, it is probably broken. Let the compelling benefits of RDFa pave the way to implementations, and then standardize based on experience with those. RDFa should not be blessed by HTML, and the HTML spec should adopt a similar stance to all new features. For example, I would be very surprised to see Web Sockets fail on its own, since the benefits seem clear. But I could be wrong, and it should face a survival test. -- Robert Sayre "I would have written a shorter letter, but I did not have the time."
attached mail follows:
Dan Brickley wrote: > ... I guess the fact that @property is supposed to be CURIE-only > isn't a > problem with parsers since this can be understood as a CURIE with > no (or > empty) substitution token. Actually, most RDFa parsers will break if full URIs are used in RDFa attributes: in RDFa all CURIEs need a prefix which is a string of zero or more alphanumeric characters, dashes and hyphens followed by a colon (and yes, the empty string is allowed - but it is permanently bound to <http://www.w3.org/1999/xhtml/vocab#>). The proposed recommendation (IIRC, that's the current status) for CURIEs *does* actually allow for unprefixed CURIES, but RDFa enforces extra conditions. (As it was published before the CURIE spec, which is a spin-off of RDFa.) A suggestion I've heard for using full URIs is: <html xmlns:http="http:"> <title property="http://purl.org/dc/terms/title">Foo</title> </html> Which should theoretically work according to the reference algorithm in the RDFa syntax document, however it does (I believe) break the XML Namespaces spec. (Though that wouldn't be a problem if an alternative, non-xmlns, syntax were adopted for CURIE prefix binding.) It wouldn't surprise me if a few RDFa parsers had issues with this caused by the front-end XML parser they use. So RDFa, as it is currently defined, does need a CURIE binding mechanism. XML namespaces are used for XHTML+RDFa 1.0, but given that namespaces don't work in HTML, an alternative mechanism for defining them is expected, and for consistency would probably be allowed in XHTML too - albeit in a future version of XHTML+RDFa, as 1.0 is already finalised. (I don't speak for the RDFa task force as I am not a member, but I would be surprised if many of them disagreed with me strongly on this.) Back to when I said "most RDFa parsers will break if full URIs are used in RDFa attributes". The Perl library RDF::RDFa::Parser doesn't, so if you want to do any testing with full URIs, it can be found on CPAN. Full URIs are a pain to type though - I certainly prefer using CURIEs. -- Toby A Inkster <mailto: