Meeting between Brian McBride and I18N WG Core Task Force from Martin Duerst on 2003-08-01 (w3c-rdfcore-wg@w3.org from August 2003)

From: Martin Duerst <duerst@w3.org>
Date: Fri, 01 Aug 2003 15:10:58 -0400
To: w3c-rdfcore-wg@w3.org
Cc: w3c-i18n-ig@w3.org
Message-Id: <4.2.0.58.J.20030801150619.06bb9ef0@localhost>
Dear RDF Core WG,

Below please find the relevant excerpt from the I18N
I18N WG Core Task Force teleconference, where we were
glad to welcome your chair.

The Core Task Force of the I18N WG operates in Member-
Confidential mode; the excerpt below was approved for
publication in a public forum by all participants.


Regards,    Martin.


RDF:
   Major issue about language information for XML literals.
   Should we state a formal objection?

Discussion with Brian McBride, chair of the RDF Core WG.
Discussion focused around the working document at
http://www.w3.org/International/2003/07/rdf-literal-issues.html.

No conclusions were reached or decisions made. We will
continue this discussion next week.

IRI log, Zakim noise removed, may not completely reflect
discussion (time is US EDT, -04:00):

 >>>>>>>>
13:24 <Zakim> On the phone I see Ishida, Martin, fyergeau, Tex, bwm
13:24 >MJDuerst< Topic is RDF
13:25 <bwm> http://www.w3.org/International/2003/07/rdf-literal-issues.html
13:26 >MJDuerst< document prepared by Richard (with help from Martin)
13:26 >MJDuerst< RDF attributions based on Mail by Brian; may not be correct
13:27 >MJDuerst< Brian: not entirely comfortable with framing
13:28 >MJDuerst< Brian: thanks for invitatio
13:28 >MJDuerst< spent a lot of time thinking about the meeting
13:28 >MJDuerst< try to be straightforward with each other
13:28 >MJDuerst< concern that relationship got strained between rdf and i18n
13:30 >MJDuerst< try to improve relationship
13:30 >MJDuerst< try to involve more constituency in i18n to improve
communication
13:31 >MJDuerst< try to make sure you take an informed decision
13:31 >MJDuerst< rdf core running very late, original plan 8month, now
already 2 years, somewhat exhausted
13:31 >MJDuerst< if you can give us leeway, we would appreciate it.
13:32 >MJDuerst< have already tried to understand Martin's issues, but have
decided to move on
13:32 >MJDuerst< WEBOnt is also waiting for us
13:32 >MJDuerst< they decided to go to CR recently
13:33 >MJDuerst< i18n goals seem to shift a bit, makes things difficult to
understand
13:35 >MJDuerst< Martin: general issue of literal equivalence is an
important issue, but was missed during last call
13:35 >MJDuerst< by i18n, so we don't want to do a formal objection on this
13:36 >MJDuerst< Richard: hope that we can improve relationships in this
meeting and from now on
13:36 >MJDuerst< started this document with a few more points, but removed
the non-key points
13:37 >MJDuerst< Brian: One of the nightmares is that in RDF, everything is
connected
13:38 >MJDuerst< Richard's document seems to suggest there are
alternatives; I was understanding that there would
13:39 >MJDuerst< be a decision of whether to have a formal objection, any
'can't live with' things.
13:39 >MJDuerst< Richard: explain how to read the document. tried to keep
it simple. describes how we see things.
13:39 >MJDuerst< various possibilities, and alternatives.
13:40 >MJDuerst< Top question is whether XML literals have language
information in some way, seems everybody agrees that
13:40 >MJDuerst< some kind of information is needed.
13:40 >MJDuerst< Brian: we can save some time if we go to number 3 straight
away.
13:41 >MJDuerst< Richard: Brian's documen outlined three possibilities in
his mail.
13:41 >MJDuerst< in this doc, listed as choice 1, 2, and 3.
13:41 >MJDuerst< 3 is current RDF solution.
13:42 <bwm> bwm notes request to discuss current design, not alternative
designs
13:43 >MJDuerst< Richard: first major problem: if we have to add a dummy
tag to put on language information, how can
13:43 >MJDuerst< the original information be reconstructed?
13:44 >MJDuerst< Brian: why do you care?
13:44 >MJDuerst< Richard: if you want to see the RDF literal as it was
originally, you would care.
13:44 >MJDuerst< Brian: we can always create hypothetical scenarios
13:45 >MJDuerst< Brian: assumes that you want to take data from elsewhere
and put it into graph; we start at the graph
13:45 >MJDuerst< Brian: it has been suggested that primary puprose of XML
Literals is to represent text
13:46 >MJDuerst< Tex: text is where we have most i18n problems
13:47 >MJDuerst< Martin: I18N has tendency (and job) to look at where the
i18n problems are
13:49 >MJDuerst< Brian: we have a notion of design centre
13:50 >MJDuerst< Brian: we may just have agreed that text is important for
XML Literals, but other uses should be possible too
13:50 >MJDuerst< Richard: can you explain the term 'text'
13:51 >MJDuerst< Brian: sometimes, we hear 'text with markup', sometimes
'text', sometimes 'sequence of characters'
13:52 >MJDuerst< typical example is title, where you may start with just
characters, but then for some cases realize you may need some other things
13:52 >MJDuerst< Markup is a sequence of characters that describes text
13:52 >MJDuerst< different markup languages could describe the same text
13:53 >MJDuerst< text would not just be 'foo', but 'foo in bold', or so
13:54 >MJDuerst< Francois: may be a disconnect in that text can be viewed
at different levels; plain text, markup, rich text
13:54 >MJDuerst< rich text,... may be not necessarily considered as
necessary, but from i18n point,
13:54 >MJDuerst< additional markup may be necessary for things like bidi or
ruby.
13:55 >MJDuerst< Brian: I think I understand Francois; foo in bold may not
have been the best example, maybe 'foo backwards'
13:55 >MJDuerst< may be better example.
13:55 >MJDuerst< Brian: I think we agree these are important, and we should
not mix them up.
13:56 >MJDuerst< We don't want to confuse string "<br/>" and markup
"<br/>", important to keep them appart.
13:56 >MJDuerst< Richard: come back to our concerns
13:56 <bwm> <b>chat</b>
13:57 >MJDuerst< this may come from another document
13:57 >MJDuerst< we knew at the source that it was french
13:58 <bwm> <b xml:lang="fr">chat</b>
13:59 <bwm> note: note a real use case
13:59 >MJDuerst< Martin: we don't have the real use case, but I think it's
pretty plausible that such data is scraped off the Web
14:00 <Zakim> -fyergeau
14:00 <bwm> scenario: web scraper - pulling title elements out of html
documents and creating RDF metastore
14:00 >MJDuerst< Richard: our concern is that you want to recover the
original text, but you wouldn't know whether
14:01 >MJDuerst< the <b> is original or not
14:01 >MJDuerst< Brian: why is it important
14:01 >MJDuerst< Tex: Important to maintain the semantics of the text
14:02 >MJDuerst< Martin: languag is important, but how important is exact
markup
14:02 >MJDuerst< Tex: screen-scraper may not pull out exactly the markup
with <b>,....
14:05 >MJDuerst< Richard: there may be scenarios where we don't have
exactly the same
14:05 >MJDuerst< Tex: information should be the same, but exact markup may
be different
14:06 >MJDuerst< Richard: worrying about assuming that such scenarios are
not valid
14:07 >MJDuerst< Brian: interesting, because these scenarios probably
haven't been considered
14:07 >MJDuerst< 'same markup' may mean same according to canonicalization,
may include other context,...
14:08 >MJDuerst< Richard: to make sure you can store text (characters with
additional information), may be difficult to get all the information
14:09 >MJDuerst< Brian: disagree that the notion of text is in jeopardy
14:09 >MJDuerst< Brian: RDF centric view is that if you want to represent
info, you have to put it into the graph.
14:10 >MJDuerst< layering issue: we are talking about XHTML, e.g. with
things like <b>.
14:10 >MJDuerst< difficult to model this in XML or RDF, needs to be modeled
on XHTML layer
14:11 >MJDuerst< Tex: some applications may not care whether text is bold
or not, if it needs to be represented, then
14:11 >MJDuerst< it need to get into the graph.
14:12 >MJDuerst< whitespace may also needed to be represented carefully,
because otherwise it may get lost.
14:12 >MJDuerst< this may be recognized too late
14:13 >MJDuerst< Richard: moving on to the next point
14:13 <r12a> chat
14:14 >MJDuerst< how to apply language info to 'chat'? what markup to use?
<span> is (X)HTML specific
14:14 >MJDuerst< XMLspec uses <phrase>
14:14 >MJDuerst< different markup in different contexts
14:15 >MJDuerst< some markup languages don't have a <span> equivalent
14:15 >MJDuerst< Brian: in a sense, you are right, there is an issue there
14:16 >MJDuerst< interpreting the meaning of <span> or <phrase>: RDF
doesn't do that.
14:16 >MJDuerst< if you have an element, you should be fine, because you
can represent the info
14:17 >MJDuerst< Richard: If application is looking at different kinds of
documents, how is it deciding which element
14:17 >MJDuerst< to use as a <dummy>?
14:18 >MJDuerst< Brian: Application has to know which kind of markup it is
dealing with
14:18 >MJDuerst< if that matters for application
14:20 >MJDuerst< Martin: scraping of some XMLspec docs and some (X)HTML
documents at W3C doesn't seem like a far-fetched thing
14:20 >MJDuerst< Brian: application could just decide to use it's own
wrapper
14:21 >MJDuerst< Richard: would that be the wrapper solution of point 2?
14:21 >MJDuerst< Brian: similar to 2., but done by application
14:22 >MJDuerst< Richard: what about 'lang' attribute in HTML documents?
14:23 >MJDuerst< Martin: there seems to be an unvoiced agreement that
xml:lang would be used to indicate the language
14:24 <r12a> <b lang="fr">chat</br>
14:24 <bwm> or we could build triple structure
14:26 >MJDuerst< Brian: seems natural to use xml:lang, but seems
inappropriate to have the spec say that this is the only one
14:26 >MJDuerst< Trying to clarify scenario: describe all the individual
words in all W3C docs with their language
14:27 >MJDuerst< One way is to put in the language specificially into the
markup.
14:27 >MJDuerst< Other way is to put the language into the triple structure.
14:27 >MJDuerst< Richard: is this selection 1?
14:29 <bwm> <eg:foo rdf:parseType="Literal"><word
xml:lang='en'>chat</word></...
14:29 <bwm> <eg:text>chat</eg:text>
14:30 <bwm> <eg:lang>en</eg:lang>
14:32 >MJDuerst< Martin: if we knew that there is a better way to deal with
language, we should have done it, but now it may be too late.
14:32 <bwm> brian falls of chair
14:33 <r12a> * r12a would like to raise a process note asap
14:36 >MJDuerst< we will continue this discussion next Tuesday
 >>>>>>>>


   Other issues:
   - Unification of text types
   - XML Literals denote sequences of octets
(we did not discuss these issues)
Received on Friday, 1 August 2003 15:11:35 UTC