- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 01 Aug 2003 15:10:58 -0400
- To: w3c-rdfcore-wg@w3.org
- Cc: w3c-i18n-ig@w3.org
Dear RDF Core WG, Below please find the relevant excerpt from the I18N I18N WG Core Task Force teleconference, where we were glad to welcome your chair. The Core Task Force of the I18N WG operates in Member- Confidential mode; the excerpt below was approved for publication in a public forum by all participants. Regards, Martin. RDF: Major issue about language information for XML literals. Should we state a formal objection? Discussion with Brian McBride, chair of the RDF Core WG. Discussion focused around the working document at http://www.w3.org/International/2003/07/rdf-literal-issues.html. No conclusions were reached or decisions made. We will continue this discussion next week. IRI log, Zakim noise removed, may not completely reflect discussion (time is US EDT, -04:00): >>>>>>>> 13:24 <Zakim> On the phone I see Ishida, Martin, fyergeau, Tex, bwm 13:24 >MJDuerst< Topic is RDF 13:25 <bwm> http://www.w3.org/International/2003/07/rdf-literal-issues.html 13:26 >MJDuerst< document prepared by Richard (with help from Martin) 13:26 >MJDuerst< RDF attributions based on Mail by Brian; may not be correct 13:27 >MJDuerst< Brian: not entirely comfortable with framing 13:28 >MJDuerst< Brian: thanks for invitatio 13:28 >MJDuerst< spent a lot of time thinking about the meeting 13:28 >MJDuerst< try to be straightforward with each other 13:28 >MJDuerst< concern that relationship got strained between rdf and i18n 13:30 >MJDuerst< try to improve relationship 13:30 >MJDuerst< try to involve more constituency in i18n to improve communication 13:31 >MJDuerst< try to make sure you take an informed decision 13:31 >MJDuerst< rdf core running very late, original plan 8month, now already 2 years, somewhat exhausted 13:31 >MJDuerst< if you can give us leeway, we would appreciate it. 13:32 >MJDuerst< have already tried to understand Martin's issues, but have decided to move on 13:32 >MJDuerst< WEBOnt is also waiting for us 13:32 >MJDuerst< they decided to go to CR recently 13:33 >MJDuerst< i18n goals seem to shift a bit, makes things difficult to understand 13:35 >MJDuerst< Martin: general issue of literal equivalence is an important issue, but was missed during last call 13:35 >MJDuerst< by i18n, so we don't want to do a formal objection on this 13:36 >MJDuerst< Richard: hope that we can improve relationships in this meeting and from now on 13:36 >MJDuerst< started this document with a few more points, but removed the non-key points 13:37 >MJDuerst< Brian: One of the nightmares is that in RDF, everything is connected 13:38 >MJDuerst< Richard's document seems to suggest there are alternatives; I was understanding that there would 13:39 >MJDuerst< be a decision of whether to have a formal objection, any 'can't live with' things. 13:39 >MJDuerst< Richard: explain how to read the document. tried to keep it simple. describes how we see things. 13:39 >MJDuerst< various possibilities, and alternatives. 13:40 >MJDuerst< Top question is whether XML literals have language information in some way, seems everybody agrees that 13:40 >MJDuerst< some kind of information is needed. 13:40 >MJDuerst< Brian: we can save some time if we go to number 3 straight away. 13:41 >MJDuerst< Richard: Brian's documen outlined three possibilities in his mail. 13:41 >MJDuerst< in this doc, listed as choice 1, 2, and 3. 13:41 >MJDuerst< 3 is current RDF solution. 13:42 <bwm> bwm notes request to discuss current design, not alternative designs 13:43 >MJDuerst< Richard: first major problem: if we have to add a dummy tag to put on language information, how can 13:43 >MJDuerst< the original information be reconstructed? 13:44 >MJDuerst< Brian: why do you care? 13:44 >MJDuerst< Richard: if you want to see the RDF literal as it was originally, you would care. 13:44 >MJDuerst< Brian: we can always create hypothetical scenarios 13:45 >MJDuerst< Brian: assumes that you want to take data from elsewhere and put it into graph; we start at the graph 13:45 >MJDuerst< Brian: it has been suggested that primary puprose of XML Literals is to represent text 13:46 >MJDuerst< Tex: text is where we have most i18n problems 13:47 >MJDuerst< Martin: I18N has tendency (and job) to look at where the i18n problems are 13:49 >MJDuerst< Brian: we have a notion of design centre 13:50 >MJDuerst< Brian: we may just have agreed that text is important for XML Literals, but other uses should be possible too 13:50 >MJDuerst< Richard: can you explain the term 'text' 13:51 >MJDuerst< Brian: sometimes, we hear 'text with markup', sometimes 'text', sometimes 'sequence of characters' 13:52 >MJDuerst< typical example is title, where you may start with just characters, but then for some cases realize you may need some other things 13:52 >MJDuerst< Markup is a sequence of characters that describes text 13:52 >MJDuerst< different markup languages could describe the same text 13:53 >MJDuerst< text would not just be 'foo', but 'foo in bold', or so 13:54 >MJDuerst< Francois: may be a disconnect in that text can be viewed at different levels; plain text, markup, rich text 13:54 >MJDuerst< rich text,... may be not necessarily considered as necessary, but from i18n point, 13:54 >MJDuerst< additional markup may be necessary for things like bidi or ruby. 13:55 >MJDuerst< Brian: I think I understand Francois; foo in bold may not have been the best example, maybe 'foo backwards' 13:55 >MJDuerst< may be better example. 13:55 >MJDuerst< Brian: I think we agree these are important, and we should not mix them up. 13:56 >MJDuerst< We don't want to confuse string "<br/>" and markup "<br/>", important to keep them appart. 13:56 >MJDuerst< Richard: come back to our concerns 13:56 <bwm> <b>chat</b> 13:57 >MJDuerst< this may come from another document 13:57 >MJDuerst< we knew at the source that it was french 13:58 <bwm> <b xml:lang="fr">chat</b> 13:59 <bwm> note: note a real use case 13:59 >MJDuerst< Martin: we don't have the real use case, but I think it's pretty plausible that such data is scraped off the Web 14:00 <Zakim> -fyergeau 14:00 <bwm> scenario: web scraper - pulling title elements out of html documents and creating RDF metastore 14:00 >MJDuerst< Richard: our concern is that you want to recover the original text, but you wouldn't know whether 14:01 >MJDuerst< the <b> is original or not 14:01 >MJDuerst< Brian: why is it important 14:01 >MJDuerst< Tex: Important to maintain the semantics of the text 14:02 >MJDuerst< Martin: languag is important, but how important is exact markup 14:02 >MJDuerst< Tex: screen-scraper may not pull out exactly the markup with <b>,.... 14:05 >MJDuerst< Richard: there may be scenarios where we don't have exactly the same 14:05 >MJDuerst< Tex: information should be the same, but exact markup may be different 14:06 >MJDuerst< Richard: worrying about assuming that such scenarios are not valid 14:07 >MJDuerst< Brian: interesting, because these scenarios probably haven't been considered 14:07 >MJDuerst< 'same markup' may mean same according to canonicalization, may include other context,... 14:08 >MJDuerst< Richard: to make sure you can store text (characters with additional information), may be difficult to get all the information 14:09 >MJDuerst< Brian: disagree that the notion of text is in jeopardy 14:09 >MJDuerst< Brian: RDF centric view is that if you want to represent info, you have to put it into the graph. 14:10 >MJDuerst< layering issue: we are talking about XHTML, e.g. with things like <b>. 14:10 >MJDuerst< difficult to model this in XML or RDF, needs to be modeled on XHTML layer 14:11 >MJDuerst< Tex: some applications may not care whether text is bold or not, if it needs to be represented, then 14:11 >MJDuerst< it need to get into the graph. 14:12 >MJDuerst< whitespace may also needed to be represented carefully, because otherwise it may get lost. 14:12 >MJDuerst< this may be recognized too late 14:13 >MJDuerst< Richard: moving on to the next point 14:13 <r12a> chat 14:14 >MJDuerst< how to apply language info to 'chat'? what markup to use? <span> is (X)HTML specific 14:14 >MJDuerst< XMLspec uses <phrase> 14:14 >MJDuerst< different markup in different contexts 14:15 >MJDuerst< some markup languages don't have a <span> equivalent 14:15 >MJDuerst< Brian: in a sense, you are right, there is an issue there 14:16 >MJDuerst< interpreting the meaning of <span> or <phrase>: RDF doesn't do that. 14:16 >MJDuerst< if you have an element, you should be fine, because you can represent the info 14:17 >MJDuerst< Richard: If application is looking at different kinds of documents, how is it deciding which element 14:17 >MJDuerst< to use as a <dummy>? 14:18 >MJDuerst< Brian: Application has to know which kind of markup it is dealing with 14:18 >MJDuerst< if that matters for application 14:20 >MJDuerst< Martin: scraping of some XMLspec docs and some (X)HTML documents at W3C doesn't seem like a far-fetched thing 14:20 >MJDuerst< Brian: application could just decide to use it's own wrapper 14:21 >MJDuerst< Richard: would that be the wrapper solution of point 2? 14:21 >MJDuerst< Brian: similar to 2., but done by application 14:22 >MJDuerst< Richard: what about 'lang' attribute in HTML documents? 14:23 >MJDuerst< Martin: there seems to be an unvoiced agreement that xml:lang would be used to indicate the language 14:24 <r12a> <b lang="fr">chat</br> 14:24 <bwm> or we could build triple structure 14:26 >MJDuerst< Brian: seems natural to use xml:lang, but seems inappropriate to have the spec say that this is the only one 14:26 >MJDuerst< Trying to clarify scenario: describe all the individual words in all W3C docs with their language 14:27 >MJDuerst< One way is to put in the language specificially into the markup. 14:27 >MJDuerst< Other way is to put the language into the triple structure. 14:27 >MJDuerst< Richard: is this selection 1? 14:29 <bwm> <eg:foo rdf:parseType="Literal"><word xml:lang='en'>chat</word></... 14:29 <bwm> <eg:text>chat</eg:text> 14:30 <bwm> <eg:lang>en</eg:lang> 14:32 >MJDuerst< Martin: if we knew that there is a better way to deal with language, we should have done it, but now it may be too late. 14:32 <bwm> brian falls of chair 14:33 <r12a> * r12a would like to raise a process note asap 14:36 >MJDuerst< we will continue this discussion next Tuesday >>>>>>>> Other issues: - Unification of text types - XML Literals denote sequences of octets (we did not discuss these issues)
Received on Friday, 1 August 2003 15:11:35 UTC