Re: Use machine-readable standardized data formats / Use non-proprietary data formats

Hi All,

I am becoming really confused about what data will be considered as the
target of our BPs.

I am trying to understand how we could explain to the publishers if they
are or are not our audience.

If we have BPs that orient publishers to provide metadata about structure,
license, etc., to provide version information, and a lot of other BPs, why
we have to explain what data is ruled out or ruled in? Why we have to
forbid some publishers of following our BPs?

I think that some types of data should be better published if they conform
even to few of our BPs. Maybe other types could follow more BPs, depending
on their "nature".

Sorry about my confusion. I understand and agree that we have to think in a
perfect world of data. But I also think that we have to think in the
current stage of this world. A mature model and some roadmaps maybe could
be more useful to clarify these different cases.

Best Regards,

Em sexta-feira, 14 de agosto de 2015, Annette Greiner <>

> The notion of context is what I was trying to get to by talking about a
> set of things that are not intended to be meaningful on their own. As you
> say, 31 degrees C is not meaningful out of context. In a dataset with other
> temperature readings and metadata, it has meaning. So I think we have some
> overlap there. But you are thinking about it a little differently and end
> up with more stuff in scope than me. I don’t think that legislation is
> something this group should be considering as data. (That gets into
> structured documents, which I view as a slippery slope into
> everything-land.) If we ruled out everything but collections whose pieces
> *lack* meaning when used alone, I think legislation is ruled out, as are
> the rest of the things that make me worry about boiling the ocean. A
> section of a law conveys a lot of meaning. What I don’t quite understand
> from your message is what you are thinking of that would be ruled out of
> scope by a rule that says anything put into some sensible perspective by
> having context is in scope. Doesn’t context put everything into perspective?
> After thinking about graph data a bit, I’m liking the tabular notion more.
> Since graph data are basically matrices, and matrices are really a form of
> table, that’s not so difficult to rule in after all. If you define JSON as
> tabular, we are probably in agreement. I’m just not sure that the word
> tabular would be interpreted by most readers as including key-value stores,
> but we could clarify that in a sentence. Or we could just rule in a short
> list of forms, like tabular and key-value. Legislation is clearly ruled out
> if we go with tabular data. To be clear, I’m thinking of stuff that *is in*
> a “tabular” form, not just anything that could be represented that way,
> because anything can. The same for graph data, as Erik points out. I don’t
> think we should rule in everything that *could be* expressed by a graph
> representation, but I would rule in anything that *is*. So, if you want to
> make a matrix of your relationships with your cousins and publish those on
> the web, we have some guidance for you, but as for the photos you took at
> the family picnic, you’re on your own.
> -Annette
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 510-495-2935
> On Aug 14, 2015, at 11:55 AM, Makx Dekkers <
> <javascript:;>> wrote:
> >
> > Erik wrote:
> >
> >> one person's model/reality is another person's data. trying to
> understand
> >> where to draw the line is a futile attempt with a long history of trying
> > and
> >> failing.
> >
> > So maybe the reason we have never managed to decide what we mean by
> 'data'
> > is because it is not possible to define it and therefore our attempts
> have
> > been futile. Good point.
> >
> > Maybe we need to look at it from a different angle. Here is what I think
> > could maybe be a way forward.
> >
> > Someone mentioned the word 'context' in another thread, and maybe that is
> > what we need to look at.
> >
> > One way of looking at context is how DCAT defines 'dataset': "A
> collection
> > of data, published or curated by a single agent, and available for
> access or
> > download in one or more formats". So not individual observations,
> sentences,
> > numbers, but data items that belong together in some sort of
> 'collection'.
> >
> > My proposal would be not to try to define limits related to what the data
> > *is* or how it can be used but just to consider the context in which the
> > data exists or is embedded. If the context puts the data in some sensible
> > perspective, it's in scope; if it is just bits and pieces without a clear
> > context, it's out of scope.
> >
> > Here are two examples that I imagined:
> >
> > 1. metereological information
> >
> > * 31 degrees Celsius is just a temperature;
> > * The fact that 31 degrees Celsius was the maximum temperature today in
> the
> > village where I am is a piece of information.
> > My assumption is that this level is not what we want to be concerned
> with in
> > this group.
> >
> > I think that we start getting interested if there is a collection of
> those
> > pieces of information, for example a list of today's maximum temperatures
> > across the whole province or country, or in a bigger context, when this
> is
> > part of the list of all maximum temperatures across the country for all
> days
> > of the year. As far as I understand, such lists are what DCAT would call
> > 'datasets'.
> >
> > 2. legal information
> >
> > * A single sentence is just that;
> > * A legal article with some sentences is a piece of information.
> > Again, not the kinds of things that we're concerned with.
> >
> > As soon as the articles are embedded in a complete legal act with
> > definitions and references, then it becomes again "a collection of data,
> > published or curated by a single agent, and available for access or
> download
> > in one or more formats" (a dataset) and therefore of interest to us.
> >
> >
> > Happy to hear people's views on this.
> >
> > Makx.
> >
> >
> >
> >

.  .  .  .. .  .
.        .   . ..
.     ..       .

Received on Saturday, 15 August 2015 00:50:58 UTC