Re: varieties of datatyped tagged literals

On Thu, 2011-09-08 at 08:37 -0400, Lee Feigenbaum wrote:
> On 9/8/2011 8:01 AM, Sandro Hawke wrote:
> > On Wed, 2011-09-07 at 13:54 -0700, Gavin Carothers wrote:
> >> On Wed, Sep 7, 2011 at 12:06 PM, Sandro Hawke<sandro@w3.org>  wrote:
> >>> On Wed, 2011-09-07 at 19:30 +0100, Andy Seaborne wrote:
> >>>>
> >>>> On 07/09/11 17:42, Pierre-Antoine Champin wrote:
> >>>>> Following todays's discussion, let me rephrase the rationale of each
> >>>>> "family" of solution:
> >>>>
> >>>> Thanks. Pat gives teh details; this is good to discuss the general
> >>>> intent of each approach.
> >>>>
> >>>>>
> >>>>> 1. Don't change anything: literals will have *either* a datatype or a
> >>>>> literal.
> >>>>>
> >>>>> In the following options, we unify literals by ensuring that every
> >>>>> literal has a datatype.
> >>>>>
> >>>>> 2. The language tag is still "outside" the (lexical/value) mechanism of
> >>>>> the datatype; the various sub-options differ in how this
> >>>>> extra-information is introduced in the system.
> >>>>>
> >>>>> In the following options, we unify literals even more by making
> >>>>> language-tagged literals a special case of datatyped literal.
> >>>>>
> >>>>> 3. The language tag is attached to the by the datatype.
> >>>>>
> >>>>> 4. The language tag is attached to the lexical form.
> >>>>
> >>>> A RDF 1.0 literal has three parts:
> >>>>      (lexical form, language tag, datatype)
> >>>>
> >>>> with lang and datatype being optional.
> >>>>
> >>>> Options 2, 3 and 4 remove the optionality on datatype.
> >>>>
> >>>> Option 2 still has optional language tag; there is a single datatype for
> >>>> lang-tag literals.
> >>>>
> >>>> Option 3 removes the lang slot and encodes it into the URI.
> >>>> (or requires a dereference).
> >>>>
> >>>> Option 4 removes the lang slot and encodes it into the lexcial form.
> >>>>
> >>>> For 3 vs 4, if you emphasis datatypes more than lexical forms, you like
> >>>> 3 and conversely, if you emphasis lexical forms, 3 is preferable to 4.
> >>>>
> >>>> Options 3 and 4 reduce the dimensionality to 2 by encoding.
> >>>>
> >>>> All options make language tags "special" in some way.  Option 2 does it
> >>>> bypassing L2V; options 3 and 4 rely on micro-parsing (further parsing a
> >>>> string).
> >>>
> >>> Very, very nicely put.   I dislike 2 because it doesn't get us down to
> >>> two elements.
> >>
> >> We have three elements today, so we don't get two in the future... meh.
> >
> > Not sure I agree.  In some sense the datatype is already two elements,
> > since many people think of it as a namespace and an entry in that
> > namespace.
> 
> In what way to people use current datatypes like that?

I just meant conceptually; I don't know if it would appear as a
"use".   I imagine most people think of
http://www.w3.org/2001/XMLSchemaint as xs:int, a combination of xs (or
xsd, short of XML Schema Datatypes), and "int".   Of course, APIs also
do this separation, via constructs like XS.int or XS["int"].    It's
two elements in the language; language tags will still be two elements,
like perhaps LANG["en-FR"].

> >  Option 3 adds more complexity to the datatypes, true, but
> > it seems to me the complexity is only there for people who need it,
> > instead of being in the way of people who don't need it.
> 
> I think the sample code for checking if a literal is a string shows that 
> the complexity comes through almost no matter what.

It's hard to be crisp about it, but it seems to me that the string test
example code shows the complexity in the right place.   There's no
single conceptual entity in RDF of "a string with or without a language
tag", which I think is fine.  So if you want to check if something is an
xs:string or a language-tagged string, you have to check for both
possibilities.   It's like if I wanted to see if something was a date or
an integer; I'd need an "or" expression.

   -- Sandro

> Lee
> 
> >>>   I prefer 3 over 4 because I think datatype URIs are a
> >>> better place to do the encoding than data values -- URIs are already
> >>> full of delimiters and parameters understood by different components.
> >>
> >> http://www.w3.org/DesignIssues/Axioms.html#opaque
> >>
> >> The only thing you can use an identifier for is to refer to an object.
> >> When you are not dereferencing, you should not look at the contents of
> >> the URI string to gain other information.
> >>
> >> Recommending the use of non opaque URIs seems like a backwards step.
> >
> > TimBL wrote that many years ago in response to the trend of people and
> > software making unwarranted assumptions about the structure of URLs.  In
> > this case, we're talking about a warranted assumption -- a standard,
> > even, so the situation is different.   It's more like a namespace, or
> > the .well-known/genid thing.
> >
> > I'm fairly confident Tim prefers option 3 here, but he's traveling for
> > the next few weeks, so I'm not sure I can get a solid answer from him.
> > If his opinion on this is likely to change anyone's mind, I'm happy to
> > try to get his attention (or you can email him directly, of course).
> >
> >     -- Sandro
> >
> >
> >> --Gavin
> >>
> >>> Forcing the data values to also be parsed doesn't feel right, although I
> >>> concede it does work.
> >>>
> >>>      -- Sandro
> >>>
> >>>
> >>>>        Andy
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
> >
> 

Received on Thursday, 8 September 2011 14:22:19 UTC