Re: Datayped tagged literals: a case for option 4 vs option 2d

On 09/26/2011 03:04 PM, Steve Harris wrote:
> On 26 Sep 2011, at 10:51, Jan Wielemaker wrote:
>
>> On 09/26/2011 11:28 AM, Richard Cyganiak wrote:
>>> You understate the issues.
>>>
>>> Every existing application that uses the Literal.getLexicalForm()
>>> call of some API to get at the xxx part of xxx@lll would have to
>>> be changed, because the lexical form of xxx@lll is now xxx@lll.
>>>
>>> That's a complete non-starter.
>>
>> I fully agree. Also note that APIs for (notably in-core) RDF stores
>> can now typically work on a single shared representation of the
>> literal. If we add a tag to the literal many of the operations will
>> have to create a copy without the tag. I'm not saying this cannot
>> be solved, but I fear it will be natural nor pretty, especially for
>> existing stores that did not anticipate this in their design
>> phase.
>>
>> I must admit that I'm only following this from the sideline. As an
>> implementor I'm starting to get worried about some wild ideas
>> though. The solution I still like best is that foo@tag is the same
>> as "foo"^^langbase:tag, where langbase is some to be decided prefix
>> for language identifiers.  Any implementation should be fairly
>> comfortable with that (typically it will just simplify things).
>
> Broadly I agree, but how would you ask the query "is ?x a string (of
> some kind)"?.

I know datatypes are organized in a hierarchy (see
http://www.w3.org/TR/xmlschema-2/#built-in-datatypes), but I'm not aware
that this hierarchy can be queried in RDF. If we had a predicate say
rdf:subDataTypeOf, we could make every language:tag a subtype of
rdf:LangString. Ideally, this would allow for

 DATATYPE(?x) rdf:subDataTypeOf rdf:LangString

I am afraid that DATATYPE(?x) cannot be used in SPARQL graph
expressions, but I think this can be fixed if there is a good
use-case.

We can easily ask that a string is in English

 FILTER( DATATYPE(?x) = lang:en )

Note that using a mechanism such as Jena's Property Functions,
we can do things like this:

 DATATYPE(?x) lang:langMatches lang:en

Of course, the current SPARQL constructs for language matching should
be mapped to datatype matches.

> I'ts a pretty common use case, and currently it's a pain, but I think
> this would make it even harder. I'm not keen on any solution which
> requires reasoning over language tag URIs, or regex matching.

I don't want to see regex matching either, but I see reasoning over
language tag URIs as a serious option.

>> I understand things get complicated if we want to attach semantics
>> to the these datatypes, so I'd propose not to do that. Most likely
>> others will make an attempt.
>
> Not me!

Having proper URIs for languages and having this connected to the
datamodel surely opens many possibilities and is (IMHO) much more in
line with the RDF spirit than having just a language identifier such as
"en" or "en-GB". The only meaningful operation on these strings is
langMatches :-(

>
> - Steve

Received on Monday, 26 September 2011 13:45:39 UTC