- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Fri, 14 Nov 2008 10:59:53 -0600
On Fri, Nov 14, 2008 at 10:44 AM, Pentasis <pentasis at lavabit.com> wrote: > >>>If we wish to communicate that level of semantics, yes. It may not be > useful to us. If you *really* need some metadata/semantics, @class probably > can't convey it with enough granularity. Check out the big discussion from > a few months ago about ccRel and RDFa. > > > Not yet maybe, but we could at least try to keep options open for the > future. > Of course, but I don't think having <small> in the language closes any options. > >>Second: Suppose I want to collect all copyright notices from 1000 > websites (don't ask me why, I just want to), how am I to do this when they > are marked up in <small>s? I will definatly end up with a lot of text that > has nothing to do with copyrights (and probably miss a lot of copyright > notices as they are marked up differently) Whereas If they were maked up in > (for example) <span class="copyright"> I could retrieve it all based on the > class-name. > > >>>That would be a wonderful perfect world. I'd like the copyright date as > well, so I can retrieve only things copyrighted in the last ten years. > Assuming that metadata will exist is a fool's errand. The fact is that if > you are searching for copyright notices, the most efficient way is likely to > just search for the string "copyright" and the (c) symbol. That'll net you > copyright notices with a high accuracy, and some training on real data can > yield further rules to improve the data-mining accuracy. > > You say it yourself, only in a perfect world where all websites in the > world would be written in the same language would your "solution" work. > Unfortunatly I would miss out on all the chinese copyright stuff. > Of course. But would you expect chinese speakers to use class="copyright" on their pages anyway? > But another example (based on "siemens") wouldn't it be nice if I could > tell Google I am looking for a person named "Siemens" so it would ignore the > "brand"-name? > Certainly. But at this point you're expecting authors to mark up their pages with metadata every time they mention someone's name. The use of <b> doesn't prevent this, but your use-case certainly requires quite a lot more. > >>>While we're hoping for copyright notices to be marked up as <span > class="copyright">, though, why not wish for <small class="copyright">? If > you're going to be providing metadata, it works the same. Is it that you > believe people won't provide a special class for copyrights if the <small> > tag already gives them the preferred display? Do you believe that everyone > will automatically use class="copyright" to mark up their copyright > notices? What if they use class="copyright-notice"? Or class="license"? > Or any of a million other distinct possibilities that would destroy any > naive attempt to datamine based on a particular class name? > > Well, that would have to be defined in the standard, wouldn't it? I'm not > saying -again- it should be defined NOW, but at least leave the door open. > I have no problems with using small over span, neither one is correct as > far as I can see, in this context. Using "copyright" instead of "license" or > "copyright-notice" would have to be defined somewhere, either in the > standard or in an externally maintained "document" that is accepted as "best > practice" or "standards related". > Okay, then we have no issue with <small>. There has been some discussion, btw, about standardizing a set of normative class names. You should be able to turn something up about it. PS: I find it very difficult to respond to rich-text/html messages as they > seriously mess up the indentation. Sorry therfor if this message is unclear > as original message and reply are mixed up. > No problem; it was clear enough. The only richtext I use is quote levels, and with the conversation context nearby anyway, it's not difficult to puzzle out when it occasionally messes up. ~TJ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20081114/0bfa046c/attachment.htm>
Received on Friday, 14 November 2008 08:59:53 UTC