Re: ISSUE-26: We don't need any RDFS vocabulary for error triples! from Mark Birbeck on 2010-07-21 (public-rdfa-wg@w3.org from July 2010)

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Thu, 22 Jul 2010 00:59:08 +0100
To: Ivan Herman <ivan@w3.org>
Cc: benjamin.adrian@dfki.de, RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <AANLkTinoiBeISP0ciQBAfvrQ6dm9J0r0FVgmZvEc0zia@mail.gmail.com>
Hi Ivan,

I had an action item relating to the error-reporting mechanism, and I
think the idea was that I put some more meat on my EARL comments.
However, the more I look at this, the more I think we're gold-plating
(to use a software development expression), and that RDFa Core is
really not the place to be defining an error-reporting schema.

(I have no problem with a simple error-reporting event or callback
function in the RDFa API, but that's about it.)

I realise that you want some error-reporting mechanism for Distiller,
but that seems to me to be something that could be wrapped around your
parser. The parser itself doesn't need to generate error triples, it
only needs to tell your wrapper that errors have occurred, hopefully
using the mechanism from the API; your *wrapper* can then send error
triples over the wire.

I know that you and Manu feel that that the error triples themselves
should be standardised, and that could still be done; an error schema
could still be defined by the W3C, but I believe that should be done
outside of RDFa Core.

So, my concerns are:

* this group isn't well equipped to deal with creating an
error-reporting schema;

* a lot of people might have a lot to say on such a schema (rightly
so), which could end up unnecessarily slowing down the progress of
RDFa Core;

* this whole thing has already taken a lot of our mailing-list and
telecon time, and as we try to come up with a schema, I believe it
will continue to do so;

* all of which is a real shame, given that it's not immediately clear
where error-reporting sits in our goals and deliverables anyway.

In short, I feel we're losing time on this without a clear benefit for
all the work that is being done.

Of course, I also realise that there have been some WG resolutions on
this, and my action item was not to oppose the creation of a schema,
but to say why I believe that schema should be based on EARL. But
since I believe that such a schema is a lot of work to get right
(whatever it's based on), and is overkill for RDFa Core, then I don't
see the point in putting a lot of effort into creating an EARL-based
schema.

Regards,

Mark


On Wed, Jul 21, 2010 at 2:14 PM, Ivan Herman <ivan@w3.org> wrote:
>
> On Jul 21, 2010, at 14:23 , Benjamin Adrian wrote:
>
>> Hi Ivan,
>>
>> My main concern about an RDFS vocabulary for error triples about RDFa Parser errors is,
>> that (if we want to do a good job) it has to be extensible for all kind of RDF parser errors.
>
> You mean RDF/XML or Turtle parser errors, right?
>
> This was indeed the problem of Toby and, acknowledging this issue, we have slightly changed the requirements in terms of not *requiring* the processor graph mechanism to be used. What we say is that *if* an implementation uses it, then this is the way to use it.
>
> The comparison is also a little bit misleading. As Shane pointed out, there is a difference between RDFa and the others, namely that there are situations (I think it was the un-referencability of a @profile file that triggered this) where a large part of the triples are not generated. I always felt that a service should inform the caller somehow if this occurs.
>
> (Yes, I can see the possible answer that if, say, somebody mistypes the rdf: namespace declaration in an RDF/XML file then the file does not generate any triples at all... Ie, there is at least some analogy)
>
>> And this a real big issue and means a lot of communication efforts!
>>
>>> What you propose, if my understanding is correct, is to have an error vocabulary in XML and return that to the caller as an XML Literal. If I am an RDF application and use a remove RDFa service to extract RDF from an RDFa file that means that I would have to include into my application an XML parser (even if it is a simple one) just to understand the Error message, whereas if I get the results in the form of triples then, well, I use whatever I use for my application already. I just do not believe that would be acceptable.
>>>
>> I don't see the real use case here. I never wrote any program logic on error messages or info messages.
>> I just use grep to see what went wrong. Inside an application logic I also catch exceptions and check what type they are.
>
> Hm. Grep may not be an option is I run a (remote) distiller on a (remove) HTML file that refers to a (remove) @profile file... And, in the case of a remote service, there is no such thing as an exception.
>
> So... what would you expect a remote service like a distiller to return if a @profile file is not reachable? Just return whatever triples that are generated and leave it at that?
>
> (I do not mean to be provocative: this is a genuine question of what you as a user would expect...)
>
>
>>> I am less qualified on the API level but... the user of an API surely has to be prepared to handle RDF graphs. That should include the handling of a processor graph. If the order of the statements is a problem for the API user than we have much bigger problems on our hands because the order of RDF triples extracted from the RDFa content is also random! I would hope that is not a real issue...
>>>
>>>
>> Well it's not a problem, but an issue the application developer has to be aware of.
>> The ordering of RDF triples is random. That means using the RDFTripleIterator
>> for complex queries is nearly impossible on a large dataset.
>>
>> It also means that filtering error triples with triple-events is very difficult and error-prone.
>
> I do not understand that. In rough RDFLib parlance
>
> for s in processor_graph.triples((None, rdf:type, rdfa:ProfileReferenceError)) :
>    for c in processor_graph.triples((s, rdfa:context, None)) :
>        print c
>
> will print out all the @profile values that cannot be dereferenced. I do not see that to be any more difficult than managing triples in general...
>
>> Multiple randomly ordered RDF triples about errors are completely useless when using the
>> event based mechanism.
>>
>> That's why I recommend describing each error in a single RDF triple.
>> Other solutions might look like:
>> - error events contain a property group instead of a single triple.
>
> We have not specified the error mechanism on the API. But isn't it possible to ask for a property group using the rdf:type of the error? I would then get hold of the subjects for errors and then I can get hold of the error descriptions for each of those. I really do not see why this is much more complicated than processing RDFa triples in general.
>
> Note that, I presume, the order in a property group is not fixed either...
>
>> - the returned triple order of the processor graph is fixed
>
> I do not think this is feasible. What this means is that an implementation cannot use an underlying triple store or environment to generate and store error triples. That seems to be prohibitive to me...
>
> I guess the real issue that does come and did come up in the past is whether an error mechanism is necessary at all. We seemed to have a working group consensus on this, and I begin to wonder whether this is still true...
>
> Ivan
>
>
>>
>> Cheers,
>>
>> Benjamin
>>> On Jul 19, 2010, at 13:38 , Benjamin Adrian wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I say we don't need any RDFS vocabulary for error triples!
>>>>
>>>> Read why:
>>>>
>>>> The spec sais:
>>>>
>>>> "SAX-based processors or processors that utilize function or method callbacks
>>>> to report the generation of triples are classified as event-based RDFa Processors."
>>>>
>>>> That means, the callback function is called  for every generated RDF triple.
>>>> Parsing error triples with these callbacks can be extremely difficult, when
>>>> the ordering of the generated triples inside the processor graph are unsorted
>>>> (as it may occur -- it's RDF not XML!).
>>>>
>>>> So searching the stream for triples with patterns like:
>>>> rdf:type rdfa:ProfileReferenceError
>>>>
>>>>
>>>> is nice when the generated triples' ordering is like this:
>>>>
>>>>      _:1 a rdfa:ProfileReferenceError ;
>>>>      _:1 dc:description "The @profile value could not be deferenced" ;
>>>>      _:1 dc:date "2010-06-30T13:40"^^xsd:dateTime ;
>>>>
>>>> But what if they are generated like this?
>>>>      _:1 dc:date "2010-06-30T13:40"^^xsd:dateTime ;
>>>>      _:1 dc:description "The @profile value could not be deferenced" ;
>>>>      _:1 a rdfa:ProfileReferenceError ;
>>>>
>>>>
>>>> Then you have to puffer and search the whole stream, which means you should better use the
>>>> model based approaches of error reporting.
>>>>
>>>> -->  NEITHER EARL NOR ANOTHER RDFS  it should be really simple.
>>>>
>>>> I don't think that the intention of EARL matches the use case of our error vocabulary.
>>>> The used RDF vocabulary must be as simple as possible.
>>>> That means it should use as few properties as possible.
>>>> Nobody will ever reason on an error graph. So why not
>>>> summarizing all information about a single error in a single triple describing a stack trace.
>>>>
>>>> [] c:description "ProfileReferenceError: The @profile value<http://www.example.org/profile>  could not be deferenced. \n
>>>>                             Line<http://www.example.org>: 564 \n
>>>>                             HTTP GET: ....\ n
>>>>                             HTTP RESPONSE ".
>>>>
>>>> If you say, well, a string is not not enough, try an XMLLiteral:
>>>>
>>>> [] c:description "<ProfileReferenceError>: The @profile value<http://www.example.org/profile>  could not be deferenced. \n
>>>>                            <POSITION>
>>>>                            <URL>http://www.example.org</URL>
>>>>                            <LINE>564</LINE>
>>>>                            </POSITION>
>>>>                            <REQUEST>  GET: ....</REQUEST>
>>>>                             <RESPONSE>HTTP RESPONSE ...</RESPONSE>
>>>>                             </ProfileReferenceError>".
>>>>
>>>>
>>>> That's it :) I'm fine with XML or plain Literals as objects for error triples.
>>>>
>>>> Best regards,
>>>>
>>>> Benjamin
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> __________________________________________
>>>> Benjamin Adrian
>>>> Email :
>>>> benjamin.adrian@dfki.de
>>>>
>>>> WWW :
>>>> http://www.dfki.uni-kl.de/~adrian/
>>>>
>>>> Tel.: +49631 20575 145
>>>> __________________________________________
>>>> Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
>>>> Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
>>>> Geschäftsführung:
>>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff
>>>> Vorsitzender des Aufsichtsrats:
>>>> Prof. Dr. h.c. Hans A. Aukes
>>>> Amtsgericht Kaiserslautern, HRB 2313
>>>> __________________________________________
>>>>
>>>>
>>>
>>> ----
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> __________________________________________
>> Benjamin Adrian
>> Email : benjamin.adrian@dfki.de
>> WWW : http://www.dfki.uni-kl.de/~adrian/
>> Tel.: +49631 20575 145
>> __________________________________________
>> Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
>> Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
>> Geschäftsführung:
>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff
>> Vorsitzender des Aufsichtsrats:
>> Prof. Dr. h.c. Hans A. Aukes
>> Amtsgericht Kaiserslautern, HRB 2313
>> __________________________________________
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
>
Received on Wednesday, 21 July 2010 23:59:46 UTC