Re: RDF Use Case: scraping metadata from the web

Disclaimer: I speak here only for myself, not for the WG.

----- Original Message ----- 
From: "ext Martin Duerst" <duerst@w3.org>
To: "Patrick Stickler" <patrick.stickler@nokia.com>; "Brian McBride" <bwm@hplb.hpl.hp.com>
Cc: "rdf core" <w3c-rdfcore-wg@w3.org>; "i18n" <w3c-i18n-ig@w3.org>
Sent: 07 August, 2003 23:43
Subject: Re: RDF Use Case: scraping metadata from the web


> >If RDF were to provide one solution, then that would simply
> >be inconsistent with another solution provided for some other
> >context. You seem to be big on having consistent treatment,
> >so it puzzles me that you would seek so specialized a solution
> >by RDF specifically.
> 
> RDF has a custom solution for plain literals.

Well, that's not surprising, given that there is
no standard that addresses language qualification of
arbitrary unicode strings.

> The same solution would work for XML Literals,

No, it would *not*, if XML Literals are modelled using
typed literals.

This has been sufficiently stated more than enough times.

> and was in the last call document. 

Yes. And was found to be a significant problem.

And when significant problems are found in last call
drafts, they are addressed.

> There is a difference
> between asking to keep some consistency in the current
> (up to Last Call) design, and bringing in new consistency
> requirements at this late stage.

I appreciate your desire for consistency. I share it. But
I think we have more than sufficiently justified the minor
lack of consistency in this case.

You're free to continue to disagree, but I will not exert
further energy re-re-re-stating our justifications.

> I18N doesn't want to require RDF to solve all XML problems.
> But we obviously care about a good solution for
> language information.

So do we, but not at the expense of the *primary* purpose
for RDF, which *does* after all take precidence over such
markup issues.

Again, do not take this comment to imply that language
markup issues are not important. They are. But they are
not the *primary* purpose for RDF and *do* not take
precidence over the primary requirements that must be
addressed by RDF.

This entire discussion feels to me like the tail wagging
the dog. Nothing against tails, but that's not right.

> >Let's not try to treat the symptoms rather than find a cure.
> 
> If we had a cure, that would be fine. Currently, we
> don't. And we don't want the patient to feel pain
> (or loose a limb), so we prefer treating the symptoms
> to doing nothing. I'm sure if you were the patient,
> you would want to do the same.

Not if the side effects of the treatment were as bad
as the illness and not if efforts put towards treating
the symptoms distract from work towards finding a cure.

> >Let the XML folks tell XML users how to deal with this
> >problem in a *general* way when dealing with XML fragments
> >irregardless of the language of encapsulation.
> >
> >E.g., have someone dust off the XML Fragment Interchange [1]
> >spec, make sure it does the right things, and then tell
> >folks use it *everywhere* they deal with XML fragments, including
> >with RDF.
> 
> We wouldn't mind if that happened. When we met originally
> with RDF Core, we were told about the problem that is
> now solved by xml:lang=''. We carefully did our homework
> and solved this problem. 

Actually, we ended up hard-coding an implicit xml:lang=''
for every property element having a typed literal value,
which was IMO the right way to handle it. The existence of
a mechanism such as xml:lang='' is still quite valuable,
and efforts to create it were certainly not wasted. I'm
sure the XML community will make good use of it.

The point is that we *don't* want xml:lang to apply to
XML literals. XML literals are XML fragments that have
intepretation *independent from* the RDF/XML instance and
the semantics of the RDF/XML instance not affect those
literals. It's not enough to *allow* folks to use xml:lang=''
with XML literals. We want and need to *ensure* that they
do. The problem of language qualification for XML literals
only exists for cases of XML fragments with mixed content 
where some form of markup must surround the mixed content 
to bear the language qualification. RDF should not be used 
for that encapsulation. 

If users are extracting XML fragments from complete documents,
then they are responsible for capturing all semantically
relevant contextual information *in* the fragment, and can
use something like XML Fragment to do so. If they are creating 
new XML fragments to be inserted into other, yet to be created
documents, then they should insert all semantically relevant
information about the fragment *in* the fragment, either
using markup language specific mechanisms such as <span>
or XML Fragment with an unspecified root element.

In the end, it is neither the business of RDF to care, nor
require, nor affect how such language qualification is made
to XML fragments that are encoded in RDF/XML as XML literals.

The only mechanisms expressed in the RDF/XML that apply
to XML literals are *syntactic* mechanisms, such as namespace
declarations, etc. and not *semantic* mechanisms such as
xml:lang.

The fact that RDF does allow xml:lang to infect plain literals
is IMO a legacy inherited bug, and asking for us to extend
that bug to XML literals is hardly motivating.

> If we had been told 1.5 years
> ago that RDF needs a complete solution for XML fragments,
> then we could have tried to contribute to that solution.

Does *RDF Core* need to tell the I18N that a complete
solution for language qualification of XML fragments is 
needed?!

Surely the I18N is, and has been, aware of this glaring
problem for far longer than that, and has been working
on such a solution. If not, then, well... no comment.

> Now it's too late for this, I guess.

(a) It's too late to expect RDF to provide any complete solution

(b) It's not too late to work on a complete solution for all
    XML users (and I hope you folks do)

(c) When a complete solution is available for all XML users,
    RDF will **automatically* support it -- because it does 
    not interfere with the semantics of XML literals

> >RDF is not going to be able to solve this general XML problem.
> >Certainly not at this point, given the fact that we should have
> >been finished with all this stuff well over a *year* ago!
> >
> >Can we *please* stop spinning our wheels on this and move one?
> 
> I would definitely like to. Of course, one easy solution
> would be to move back to the Last Call stage, 

That is *hardly* "easy". If you haven't come to appreciate
that by now, then these lengthy discussions have not been
very effective.

The inclusion of lang tags for XML literals as defined
in the last call specs was found to cause substantial
problems both for the MT and for clarity and usability.
I'm quite confident to bet that the WG will *not* be
putting them back in.

> or choose
> a solution that is closer to Last Call than the one we
> currently have.

We tried to do that. I expended numerous hours towards that end
(even though I did not myself consider there to be any problem).
We could not see any option that was superior to the present
solution as specified in the latest editors drafts, nor
were many WG members confortable with changes that might not be
sufficiently understood at this late stage. *No* significant
technical problem with the present design was recognized by
the WG. It would be nice to keep on tinkering with this
issue, and I hope that work continues in this area, but the
present WG has provided a solid, well understood, and tested
solution, and we are more than out of time. 

I'm sorry you consider RDF to be suboptimal in this matter.

I, and others, do not agree (many of which are more than
sufficiently qualified in the area of multilingual markup
issues). This matter can be noted as a possible area of
study for a future WG. I see no point in addressing it 
further within the context of the present RDF Core WG.

Patrick

Received on Friday, 8 August 2003 09:10:52 UTC