- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Date: Mon, 19 Aug 2002 18:12:30 -0400
- To: www-tag@w3.org, www-style@w3.org
At 2:46 PM -0700 8/19/02, Tantek Çelik wrote: >You actually expect a UA to parse the English tag name "headline" and then >conclude it is a header, and then make similar conclusions for all other >valid XML tag names? Actually no. I expect it to look at the layout of the page and notice certain characteristics that strongly suggest certain things are headlines. I expect this will probably be done using some form of adaptive algorithms, rather than the deterministic ones we're accustomed to. >This is because unambiguously parsing English and assigning meaning to >English words is a solved problem right? > >Please do some homework on the state of AI and Natural Language Processing >before making such ridiculous assertions. This isn't just natural language processing, though. There mere fact that something is bigger and bold is a huge clue, and not the only one either. >And never mind the fact that 90%+ folks in the world don't speak English. >Add "i18n" reading to your homework as well. To the extent that other cultures use different visual metaphors, you'd need to rerun the adaptive algorithms on native-language sources. Though perhaps you could just use the language itself as one of the clues of what else was significant. >This is because computer vision is a solved problem right? Again, more AI >reading would help here, as I don't think you understand where the state of >the art is, nor how far it has to go. Actually, it's much easier than that. You don't need computer vision because the information is already in the computer. It's even easier than OCR. The computer has much more accurate information about what it's displaying on its screen that it can rely on. This isn't an easy problem by any means, but it's not nearly as hard as a lot of people think it is, and I strongly suspect it's much easier than changing the behavior of millions of web publishers who are hardwired by evolution to like WYSIWYG. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer | +-----------------------+------------------------+-------------------+ | XML in a Nutshell, 2nd Edition (O'Reilly, 2002) | | http://www.cafeconleche.org/books/xian2/ | | http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.cafeconleche.org/ | +----------------------------------+---------------------------------+
Received on Monday, 19 August 2002 18:20:53 UTC