Re: Fixing IO typeclasses

All right, here is a second attempt, taking into account the feedback
I got. Thanks for that by the way. A few remarks:

* the goal here was *never* to do streaming/non-blocking IO
* those are abstractions for the existing implementations (Jena,
Sesame, N3.js, jsonld-js, custom)
* we *only* need to accommodate sync and async IO
* this addresses the issues related to
  * @base directive
  * @prefix directives
  * handling of relative graphs without violating the RDF model i.e.
only full IRIs in the model itself. So this is happening in the
serialized form only.
* decoupled the interfaces when I could, because depending on the
implementations, you only have access to String as Input (or you have
some optimizations for String-s), so I wanted to avoid an unnecessary
round-trip through an InputStream

Anyway, here it is:


/** The result of parsing an RDF Graph */
final case class Parsed[Rdf <: RDF](
  graph: Rdf#Graph,
  prefixes: Map[String, Rdf#Uri]
)

/** An RDF reader.
  *
  * All functions return results in the context `M`. `S` is a phantom
  * type for the RDF syntax.
  */
trait RDFReader[Rdf <: RDF, M[_], +S] {

  /** reads an RDF Graph from a [[java.io.InputStream]] and a base IRI */
  def read(inputStream: InputStream, base: String): M[Parsed[Rdf]]

  /** reads an RDF Graph from a [[java.io.Reader]] and a base IRI */
  def read(reader: Reader, base: String): M[Parsed[Rdf]]

}

/** An RDF reader for [[java.lang.String]]s.
  *
  * All functions return results in the context `M`. `S` is a phantom
  * type for the RDF syntax.
  */
trait RDFStringReader[Rdf <: RDF, M[_], +S] {

  /** reads an RDF Graph from a [[java.lang.String]] and a base IRI */
  def read(input: String, base: String): M[Parsed[Rdf]]

}

/** An RDF writer.
  *
  * All functions return results in the context `M`. `S` is a phantom
  * type for the RDF syntax.
  */
trait RDFWriter[Rdf <: RDF, M[_], +S] {

  /** writes `graph` in a [[java.io.OutputStream]]
    *
    * @param graph
    * @param outputStream
    * @param optBase if set and supported, used in a `@base`-like directive
    * @param prefixes used in `@prefix`-like directives
    */
  def write(
    graph: Rdf#Graph,
    outputStream: OutputStream,
    optBase: Option[String],
    prefixes: Map[String, Rdf#Uri]
  ): M[Unit]

  /** writes `graph` in a [[java.io.OutputStream]] where IRIs are
    * relative to the provided `base`.
    *
    * Note: the result cannot be reliably parsed without knowing the
    * base.
    *
    * @param graph
    * @param outputStream
    * @param base all IRIs are relative to it
    * @param prefixes used in `@prefix`-like directives
    */
  def writeAsRelative(
    graph: Rdf#Graph,
    outputStream: OutputStream,
    base: String,
    prefixes: Map[String, Rdf#Uri]
  ): M[Unit]

}


/** An RDF formatter returning [[java.lang.String]]s.
  *
  * All functions return results in the context `M`. `S` is a phantom
  * type for the RDF syntax.
  */
trait RDFStringWriter[Rdf <: RDF, M[_], +S] {

  /** returns `graph` as a [[java.lang.String]]
    *
    * @param graph
    * @param optBase if set and supported, used in a `@base`-like directive
    * @param prefixes used in `@prefix`-like directives
    */
  def asString(
    graph: Rdf#Graph,
    optBase: Option[String],
    prefixes: Map[String, Rdf#Uri]
  ): M[String]

  /** returns `graph` as a [[java.lang.String]] where IRIs are relative
    * to the provided `base`
    *
    * @param graph
    * @param base all IRIs are relative to it
    * @param prefixes used in `@prefix`-like directives
    */
  def asString(
    graph: Rdf#Graph,
    base: String,
    prefixes: Map[String, Rdf#Uri]
  ): M[String]

}



Alexandre




On Thu, Jan 29, 2015 at 3:23 PM, Alexandre Bertails
<alexandre@bertails.org> wrote:
> On Thu, Jan 29, 2015 at 3:05 PM, henry.story@bblfish.net
> <henry.story@bblfish.net> wrote:
>>
>> On 29 Jan 2015, at 20:57, Anton Kulaga <antonkulaga@gmail.com> wrote:
>>
>> Hi, Alexandre!
>>
>> I think we lack support of streaming API for readers. All readers return
>> RDFGraph but in many cases it is not what is needed. For instance:
>> 1) In case of large files I would prefer to parse them in a nonblocking way,
>> so I need a stream.
>>
>>
>> +1
>
> Sure, but this is not what RDFReader is trying to solve. Streaming
> parsers would live in an entirely different typeclass. This is not
> what I am trying to address here.
>
> `M` already let's you implement the synchronous and asynchronous
> parsers, and handle errors.
>
>>
>> 2) In case of gettings data from websockets (in a string format) it is an
>> overhead to wrap each websocket message into a graph
>>
>>
>> +1
>>
>>
>> Writers are also not clear for me.
>> Why do you provide "base:String" in ordinary "write" method? base:String
>> makes sense only for relative graphs (for which you already have  def
>> writeAsRelative )
>
> This addresses the use-case of the `@base` directive for syntaxes that
> support it. We could decide to ignore it entirely.
>
> Anton was mentioning that this information might be in his graphs
> already. The banana-rdf model doesn't support that for now. Neither it
> plans anything for prefixes. It could actually, but I am not sure that
> all RDF implementations would actually support it (N3.js doesn't
> support it at the moment, neither does the current Plantain), so it's
> hard to write a contract that binds the Rdf#Graph and an RDFWriter
> when prefixes/base are involved. Suggestions welcome.
>
>> Also I think it would be better to have the base be a URI, rather than a
>> string.
>
> That would require the user to build one, hence having an RDFOps
> somewhere. In practice, the implementations do not require that. And
> an error on the `base` can already be handled in the `M` context. But
> if there is more demand for an `base: Rdf#URI` and if nobody else
> prefers keeping the String, then I will certainly not fight for it :-)
>
>>
>> Can we not also have a readAsRelative, and end up with a relative graph?
>> That’s what we have had until now, and it can be quite useful.
>
> The relative graph is a really ugly hack. It is not really handled by
> implementations (especially Sesame), and will probably never be,
> without the kind of hacks we have today. The proposal here is to use a
> well-defined `rel:` base URI instead. The rationale in the
> documentation for the package.
>
> Alexandre
>
>>
>>
>> 2015-01-29 21:42 GMT+02:00 Alexandre Bertails <alexandre@bertails.org>:
>>>
>>> Hey guys,
>>>
>>> It is time to fix our reader and writer typeclasses. We have a few
>>> outstanding issues on Github, and I have spent some time thinking on
>>> how to move forwards. So please find here my proposal.
>>>
>>> Please review, comment, ask questions, I am planning to start hacking
>>> on those as soon as possible. I want this stuff to be ready way before
>>> my Scaladays talk, along with the tutorial I am planning for
>>> banana-rdf.
>>>
>>> Alexandre
>>>
>>>
>>>
>>> * rdf/common/src/main/scala/org/w3/banana/io/package.scala
>>>
>>> ```
>>> package org.w3.banana
>>>
>>> /** Types for IO operations on RDF objects.
>>>   *
>>>   * RDF syntaxes like Turtle or JSON-LD do allow for the use of
>>>   * relative IRIs but there are no relative IRIs in the RDF model
>>>   * itself. So instead of failing on a relative IRI without a known
>>>   * base, RDF parsers would usually *silently* pick a default base IRI
>>>   * on their own.
>>>   *
>>>   * In `banana-rdf`, we standardize the behaviour by **always** using
>>>   * `rel:` as the default base IRI for IO operations when no base
>>>   * was provided. RDF applications can then look at the scheme and
>>>   * know what happened.
>>>   */
>>> package object io {
>>>
>>>   /** `rel:`, the special base IRI used when no base is provided. */
>>>   final val DEFAULT_BASE: String = "rel:"
>>>
>>> }
>>> ```
>>>
>>> * rdf/common/src/main/scala/org/w3/banana/io/RDFReader.scala
>>>
>>> ```package org.w3.banana
>>> package io
>>>
>>> import java.io._
>>>
>>> /** An RDF reader.
>>>   *
>>>   * All functions return results in the context `M`. `S` is a phantom
>>>   * type for the RDF syntax.
>>>   */
>>> trait RDFReader[Rdf <: RDF, M[_], +S] {
>>>
>>>   /** reads a RDF Graph from a [[java.io.InputStream]] and a base IRI */
>>>   def read(is: InputStream, base: String): M[Rdf#Graph]
>>>
>>>   /** reads an RDF Graph from a [[java.io.Reader]] and a base IRI */
>>>   def read(reader: Reader, base: String): M[Rdf#Graph]
>>>
>>>   /** reads an RDF Graph from a [[java.io.InputStream]] using the `rel:`
>>>     * base IRI
>>>     */
>>>   final def read(is: InputStream): M[Rdf#Graph] = read(is, DEFAULT_BASE)
>>>
>>>   /** reads an RDF Graph from a [[java.io.Reader]] using the `rel:` base
>>>     * IRI
>>>     */
>>>   final def read(reader: Reader): M[Rdf#Graph] = read(reader,
>>> DEFAULT_BASE)
>>>
>>> }
>>> ```
>>>
>>> * rdf/common/src/main/scala/org/w3/banana/io/RDFWriter.scala
>>>
>>> ```
>>> package org.w3.banana
>>> package io
>>>
>>> import java.io.OutputStream
>>>
>>> /** An RDF writer.
>>>   *
>>>   * All functions return results in the context `M`. `S` is a phantom
>>>   * type for the RDF syntax.
>>>   */
>>> trait RDFWriter[Rdf <: RDF, M[_], +T] {
>>>
>>>   /** writes `graph` in a [[java.io.OutputStream]] with the provided
>>>     * `base`
>>>     */
>>>   def write(graph: Rdf#Graph, os: OutputStream, base: String): M[Unit]
>>>
>>>   /** writes `graph` in a [[java.io.OutputStream]] where IRIs are
>>>     * relative to the provided `base`
>>>     */
>>>   def writeAsRelative(graph: Rdf#Graph, os: OutputStream, base: String):
>>> M[Unit]
>>>
>>>   /** returns `graph` as a [[java.lang.String]] */
>>>   def asString(graph: Rdf#Graph, base: String): M[String]
>>>
>>>   /** returns `graph` as a [[java.lang.String]] where IRIs are relative
>>>     * to the provided `base`
>>>     */
>>>   def asRelativeString(graph: Rdf#Graph, base: String): M[String]
>>>
>>> }
>>> ```
>>>
>>
>>
>>
>> --
>> Best regards,
>> Anton Kulaga
>>
>>
>> Social Web Architect
>> http://bblfish.net/
>>

Received on Saturday, 31 January 2015 20:34:21 UTC