- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 31 Jul 2003 14:25:14 -0400
- To: Brian McBride <bwm@hplb.hpl.hp.com>, rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>
Hello Brian, Many thanks for writing this up. Some comments below: At 18:05 03/07/31 +0100, Brian McBride wrote: >This is a use case concerning xml literals which we identified during our >discussions this week, and a little analysis of it. The use case may need >some refining to fully capture i18n concerns. > >Consider an application which is building an RDF store of metadata >about web pages. It crawls the web extracting title information from web >pages and storing then represents this data as RDF. And this application may scrape data from other things than (HTML) Web pages. And two independent applications may scrape the same data. >Lets say it is searching for <title> elements, which may contain arbritary >markup. Trying for example: > > <title><em>title</em></title> >But I note that XHTML 1.1 does not allow markup in titles! Yes. As I mentioned in one of my mails, we are working with the HTML WG to fix this in XHTML 2.0. An application may also scrape <h1> elements. <h1> can contain markup, and although often misused just to control font size, its semantics are close to <title>. >How does the application represent this in RDF? > >Since you can't use markup in a title element, use a plain literal :) > >But lets assume we are far sighted and assume that markup will be allowed >in titles in the future. > >Well, in that case, you could use an rdf:XMLLiteral and include a span >element to hold the lang tag. > >Objection: But then you couldn't use that literal with XHTML 1.1. > >Response: Record that information separately in the graph e.g. > > <rdf:Description rdf:about="..."> > <ex:title> > <rdf:Description> > <rdf:value rdf:parseType="Literal">chat</rdf:value> > <ex:lang>en</ex:lang> > </rdf:Description> > </... > > >Objection: You've changed the title. You can't recover the exact markup >that was there in the first place because you can't tell whether the span >was added by the crawler or was there in the first place. Yes. This is called markup integrity (or violation of markup integrity). >Response: Most of the time, you won't care. If you do care, you can >record the extra information in the graph. We can always record any kind of information in the graph. But the problem there is that different people/applications will do this differently. So now we got from the problem of not knowing what kind of <dummy> markup to use (or what has been used) to the problem of not knowing what kind of graph structure to use (or what has been used). [or even worse, not to know what kind of <dummy> markup or graph structure has been used]: For example, in a recent discussion, Tim was proposing to do something like: <rdf:Description rdf:about="..."> <ex:title> <rdf:Description> <⟨en rdf:parseType="Literal">chat</rdf:value> </rdf:Description> </... (i.e. making languages properties). There is also Patrick's proposal about reification (see http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0069.html). This is quite contrary to what we need; xml:lang was defined in XML, and not elsewhere, so that a wide range of applications could make use of it in an uniform way. The handling of language information in RDF M&S was also designed with the goal of allowing applications to make use of it in a uniform way. >Objection: <span> is html specific. you might want to use the literal in >another context. > >Response: Really need to refine the use case here, but in general if you >are not prepared to commit to a specific markup language, you can use the >graph to represent the underlying structure. Same problems as above, I guess. Regards, Martin.
Received on Thursday, 31 July 2003 14:39:04 UTC