- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 05 Aug 2003 09:34:21 -0400
- To: Brian McBride <bwm@hplb.hpl.hp.com>
- Cc: rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>
Hello Brian, At 15:16 03/08/04 +0100, Brian McBride wrote: >I'm still at the point of looking for a use case to demonstrate that >markup integrity is a real problem. For some people, it is important. For others, it may not be important. For the RDF Core WG, the graph is obviously very important, and the triples. If somebody created a new language to serialize RDF, and this new language would mess up graphs, I guess you would not be happy. If this currently happened with RDF/XML, or if some XML group changed XML so that it could happen, I guess you would not be happy. So I guess you should be able to understand that other people will not be happy at all if their markup is arbitrarily changed. It's not necessarily the people in the I18N WG who are most concerned with markup integrity (although I think we actually are). But assume some third party wants to use RDF to scrape metadata from XML documents, and this third party is concerned about markup integrity, either because s/he is just convinced that markup is crucial, or because of concerns for various round trip scenarios. After all, any user can scrape plain text literals (with language information), put them into RDF, and get them back unchanged. Do you (the RDF Core WG) or we (the I18N WG) have a detailled use case for this? Or do we all just agree, even without ever really talking about it, that it would be a very bad idea if plain literals suddenly got changed, e.g. if RDF suddenly upper-cased all plain literals? So let's assume that people in the XML community are concerned in a similar way about markup integrity as people in the RDF community are concerned about triple and graph integrity. So a person who is concerned about markup integrity does some scraping or something similar. They are faced with the following alternatives: 1) Preserve the markup, ignore the language information 2) Change the markup, squeeze in an additional element to attach language information. 4) Put the language information somewhere else 3) does not work because then language information is lost for purposes such as glyph disambiguation and text-to-speech. So the user is faced with the question: Do I preserve markup, or do I preserve language information? Seen from an I18N viewpoint, if we get to this point, we already have lost. From our experience, we know that the users unfortunately in most cases will just take the easy way out, even if they don't explicitly weight the alternatives. That means choosing 1), and thus loosing language information, the wrong thing from an i18n perspective. The fact that the users not only have to change the markup, but that they have to think about how to change it (which element to use) and that this may depend on circumstances (e.g. <div> vs. <span> in a very simple HTML case), which may significantly complicate the extraction logic, doesn't at all help pushing people towards conserving language information. Bad for i18n. There is a fourth alternative that users may take in some case, which is to strip all the markup so that they can maybe use some language info (or maybe not). Of course, loosing markup is also bad for i18n. >You suggested that your issue has to do with multiple users doing the same >thing differently and I asked you to refine the use case we have been >discussing to better illustrate your issue. > >I don't see how this use case illustrates a problem with markup integrity; >rather it assumes that problem. Yes, to some extent, we have to assume it as a problem because we know that others see it as a problem. Hope this helps. Regards, Martin.
Received on Tuesday, 5 August 2003 09:49:12 UTC