- From: Calogero Alex Baldacchino <alex.baldacchino@email.it>
- Date: Sat, 10 Jan 2009 05:35:27 +0100
Ben Adida ha scritto: > Ian Hickson wrote: > >> We have to make sure that whatever we specify in HTML5 actually is going >> to be useful for the purpose it is intended for. If a feature intended for >> wide-scale automated data extraction is especially susceptible to spamming >> attacks, then it is unlikely to be useful for wide-scale automated data >> extraction. >> > > It's no more susceptible to spam than existing HTML, as per my previous > response. > > Perhaps this is why general purpose search engines do not rely (entirely) on metadata and markup semantics to classify content, nor does Yahoo with SearchMonkey. SearchMonkey documentation points out that metadata never affects page ranks, nor is semantics interpreted for any purpose; metadata only affects additional informations presented to the user at the user will, and if the user chose to get informations of a certain kind (gathered by a certain data service), thus spammy metadata can be thought as circumscribed in this case, they might corrupt SearchMonkey additional data, but not the user's overall experience with the search engine. From this point of view, SearchMonkey is some kind of wide-range but small-scale use case (with respect to each tool and each site the user might enable), because the user can easily choose which sources to trust (e.g. which data services to use, or which sites to look for additional infos), and in any case he can get enough infos without metadata. On the other hand, a client UA implementing a feature entirely based on metadata couldn't easily circumscribe abused metadata and bring valid informations to the user attention, nor could the average user take easily trusted and spammy sites apart, because he wouldn't understand the problem (and a site with spammy metadata might still contain informations users were interested in previously, or in a different context), whereas in SearchMonkey the average user would notice something doesn't work in enhanced results, but he'd also get the basic infos he was looking for. Thus there are different requirements to be taken into account for different scenarios (SearchMonkey and client UA are such different scenarios) Moreover, SearchMonkey is a kind of centralised service based on distributed metadata, it doesn't need collaboration by any other UA (that is, it doesn't need support for metadata in other software) by default (whereas it allows custom data services to autonomously extract metadata, but always for the purposes of SearchMonkey), it only requires that web sites adhering to the project (or just willing to provide additional infos) embed some kind of metadata only for the purpose of making them available to SearchMonkey services, or at least that authors create appropriate metadata and send them to Yahoo (in the form of dataRSS embedded in a Atom document). That is, SearchMonkey seems to me a clear example of a use case for metadata not requiring any changes to html5 spec, since any kind of supported metadata are used by SearchMonkey as if they were custom, private metadata; whatever happens to such metadata client-side, even if they're just stripped by a browser, doesn't really matter. Furthermore, SearchMonkey supports several kinds of metadata, not only RDFa, but also eRDF, microformats and dataRSS external to the document. So, why should SearchMonkey be the reason to introduce explicit support to RDFa and not also for eRDF, which doesn't require new attributes, but just a parser? One might think one solution is better than the other, and this might be true in theory, but what really counts is what people do find easier to use, and this might be determined by experience with SearchMonkey (that is, let's see what people use more often, then decide what's more needed). Moreover, RDFa is thought for xhtml, thus it can't be introduced in html serialization just by defining a few new attributes: a processor would or might need some knowledge over /namespaces/, thus the whole "family" of *xmlns* attributes (with and without prefixes) should be specified for use with the html serialization, unless an alternative mechanism, similar to the one chosen for eRDF, were defined, and maybe such would result in a new, hybrid mechanism (stitching together pieces from eRDF and RDFa). Buf if we introduce xmlns and xmlns:<prefix> into html serialization, why not also prefixed attributes? That is, can RDFa be introduced into html serialization "as is", without resorting to the whole xml extensibility? This should be taken into account as well, because just adding new attributes to the language might work fine for xml-serialized documents, but might not for html-serialized ones. This means RDFa support might be more difficult than it may seem at first glance, whereas it might not be needed for custom and/or small scale use cases (and I think SearchMonkey is one such case). >> Nobody is suggesting that user agents derive any behavior from <title>, so >> it doesn't matter if <title> is spammed or not. >> > > And RDFa does not mandate any specific behavior, only the ability to > express structure. The power lies in products like SearchMonkey that > make use of this structure with innovative applications. > > Can one imagine tools that make poor use of this structured data so that > they incentivize spam? Absolutely. Is this the bar for HTML5? If bad or > poorly conceived applications can be imagined, then it's not in the > standard? > > I think the right question should be whether there are effective counter measures to circumscribe bad uses and make possible damages less significant then advantages from good uses. When a feature in the standard is thought to be a possible security (or privacy) issue, counter-measures are proposed. Since spam is a possible immediate issue for abused metadata, especially in wide-scale and automated data extraction, we should also think to possible counter-measures to be specc'ed out along with RDFa attributes. WBR, Alex -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Innammorarsi ? facile con Meetic, milioni di single si sono iscritti, si sono conosciuti e hanno riscoperto l'amore. Tutto con Meetic, prova anche tu! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8292&d=10-1
Received on Friday, 9 January 2009 20:35:27 UTC