Re: rdf accommodation from Paul Tyson on 2008-08-27 (public-xml-processing-model-comments@w3.org from August 2008)

From: Paul Tyson <phtyson@sbcglobal.net>
Date: Tue, 26 Aug 2008 19:25:09 -0500
To: public-xml-processing-model-comments@w3.org
Message-ID: <48B49EE5.5080706@sbcglobal.net>
Norman Walsh wrote:
> Paul Tyson <phtyson@sbcglobal.net> writes:
> 
> 
>>
>>1. Eliminate the requirement that says port-to-port data flow must be
>>XML.  Instead, use some phrase that means "serialized data instance",
>>or simply "data stream", a serialization of some data format specified
>>by W3C.  (Actually, why shouldn't this just say "behave as if ...",
>>instead of specifying the data stream?)
> 
> 
> How would this expand its scope or power? Do you have in mind a
> specific example of a use case that would be possible if the XML
> constraint was relaxed but is not possible without relaxing it?
>

No, I don't.  I can only say that more options are better than fewer 
when there is little cost to the more.  And, although never impossible, 
it is usually burdensome to serialize and deserialize logic statements 
to and from XML.  This just raises the barrier to semantic processing 
with xproc.

> 
>>2. Provide a type-checking mechanism on input ports to report a
>>dynamic error when a port receives data it can't handle (instead of
>>just XD0001 non-xml).  This could default to XML.
> 
> 
> Allowing non-XML data would certainly introduce new interoperability
> issues.
> 

Very few.  Associate an "allowed-types" setting on each input port 
(default to "text/xml").   Associate a "types" setting to each output 
port.  Then you could statically check both for pipeline sanity and 
implementation capability.

Yes, this allows people to write xproc scripts that use flow types that 
aren't supported in all implementations.  But if you limit the 
capabilities of the language, implementors will add non-standard 
features that render the xproc scripts non-portable anyway.  Better to 
provide a standard optional-feature list that implementors can implement 
and xproc writers can write for.  Maybe a "type-implemented" boolean 
function.

> 
>>With these changes, xproc would be equipped to handle semantic
>>processing of rdf, owl, or any other type of w3c data that has a
>>non-xml syntax.
> 
> 
> RDF and OWL both *have* XML syntaxes, so there's nothing about XProc
> that's unable to handle them now. Surely the decision to send N3 or
> RDF/XML through a particular pipe is an implementation detail that the
> user doesn't care about.
> 

Nor should the specification care about! (Other than for type checking.)

> 
>>While the current draft addresses a large body of current mainstream
>>XML processing, it fails to meet the growing need for combined
>>syntactic and semantic processing.  I don't know of any other WG that
>>aims to meet this need.  XProc has laid the groundwork for everything
>>required in a pipeline language, so any separate effort for semantic
>>processing would be largely redundant.
> 
> 
> I don't think the WG will consider expanding the scope in V1, though I
> suppose it's a possibility in some future version. However, it will
> have to be motivated by use cases that are prevented by the current
> constraints.

My personal experience when learning a pipeline framework was that I 
quickly dreamed up new applications that were beyond the capabilities of 
the framework, and were probably not among the use cases considered by 
the framework designers.  But I wouldn't have thought of these 
applications unless I first learned the framework.  For an enabling 
technology like this it is impossible to enumerate, _a priori_, all the 
problems it will solve.

I think you have a genii in a bottle here, and I'd like to see him come 
out in V1 rather than later.

> 
> I remain convinced that some RDF steps would be (will be!) valuable in
> XProc, but aside from p:sparql, haven't heard any specific
> suggestions.

See my "rdf processing steps" submission to this list.

Thanks for your consideration,
--Paul
Received on Wednesday, 27 August 2008 00:23:25 UTC