Re: The Agent proposal in bib.schema.org is controversial

Having used open source reconciliation tools like Open Refine to clean up
and enhance DC data I would like to echo Jeff Y's concern about just using
schema:Thing. Although one could help limit the number of possibilities by
searching for schema:Thing with a predicate of schema:author,
schema:creator etc. this is not really a viable alternative for cleaning up
data (since no tools make use of it).

Cory, with regards to some of those ~70 Million. You would be surprised at
how creative some of the MARC records are :)

Also a lot of those come from other parts of the MARC record. A good
example is a 260 field that has the Publisher information. One can not
assume that the publisher is always an Organization and there is no
sub-field code to indicate if the $b is a Person or Organization.
Alternatively, I would argue that it is possible to assume that the
publisher is some sort of Agent.

Jeff Mixter

On Mon, Aug 10, 2015 at 3:10 PM, Corey A Harper <corey.harper@nyu.edu>
wrote:

> Dear all,
>
> My $0.02: I also think that schama:Thing is the best option at this time,
> and don't think we should push too much on Agent given what I consider
> relatively limited usefulness. I understand Jeff's point about the dangers
> of "not sorting these out", but I also think that we can store and manage
> data with whatever specifity we want, and I'm not sure those dangers apply
> to data as published downstream to consumers on the Web.
>
> I'm also _very_ interested in knowing more about the 70 Million + "mystery
> agents" Richard and Jeff have been referencing. Are these just 1xx and 7xx
> data points that are type unknown because they haven't matched a known
> entity with a known type? Can't we at least infer more about their type by
> their Marc field? Can we see some example instance (bib) data where these
> show up?
>
> Best,
> -Corey
>
> On Mon, Aug 10, 2015 at 1:55 PM, Dan Scott <denials@gmail.com> wrote:
>
>> FWIW, the Bibliographic Ontology (bibo) also uses foaf:Agent.
>>
>> But I concur with the developing dissenting opinion on the github issue
>> that, if we have nothing specific to say about the nature of the entity
>> because we lack the information, it's better to simply avoid the compromise
>> of Agent. We might make ourselves feel a bit better about the dismal state
>> of our bibliographic data through an abstract class like Agent, but in the
>> end it doesn't really add any data to the data we're trying to express.
>>
>> Using schema:Thing seems like an acceptable fallback in the mean time,
>> and allows the data expressed by the target links to be refined to either
>> Person or Organization at some point in the future when the effort occurs.
>>
>>
>> On Mon, 10 Aug 2015 at 11:29 Young,Jeff (OR) <jyoung@oclc.org> wrote:
>>
>>> I made an argument that the problem is broader than bib records:
>>>
>>>
>>>
>>> https://github.com/schemaorg/schemaorg/issues/700#issuecomment-129078302
>>>
>>>
>>>
>>> Limiting to our situation, though, Richard cites the count from WorldCat
>>> at 72 million “agents” (people and organizations excluded):
>>>
>>>
>>>
>>> https://github.com/schemaorg/schemaorg/issues/700#issuecomment-129227478
>>>
>>>
>>>
>>> These all have Linked Data identifiers, but they are only mechanized
>>> placeholders in need of exposure, reconciliation, and enrichment.
>>>
>>>
>>>
>>> The danger of not sorting these out is that naïve automated “entity
>>> matching” processes resort to string matching on name as an “else
>>> condition” and the resulting mix up manifests itself in the Linked Data.
>>>
>>>
>>>
>>> I suggested Google Custom Search as a possible tool to help with
>>> discovery and possibly lead to an interface where they could be reconciled:
>>>
>>>
>>>
>>> https://github.com/schemaorg/schemaorg/issues/700#issuecomment-129239474
>>>
>>>
>>>
>>> Jeff
>>>
>>>
>>>
>>> *From:* LeVan,Ralph
>>> *Sent:* Monday, August 10, 2015 10:33 AM
>>> *To:* Young,Jeff (OR); Richard Wallis; public-schemabibex@w3.org
>>>
>>>
>>> *Subject:* RE: The Agent proposal in bib.schema.org is controversial
>>>
>>>
>>>
>>> One of the arguments against Agent was that if you didn’t know what kind
>>> of object a thing was, then you just shouldn’t say.   All the properties of
>>> Agent seem to come from Thing.  I’d propose that we just use Thing.
>>>
>>>
>>>
>>> My guess is that the need for Agent comes mostly from our need to
>>> convert existing bib records into RDF and some of our crappy old bib
>>> records don’t reliably distinguish the type of agent involved.  Rather than
>>> be caught out in a lie about whether the agent is a Person or Organization,
>>> we’d rather say less.  This is a problem peculiar to our situation and not
>>> a broad problem of the internet community.  It’s also a short-term
>>> problem.  Selling ‘Agent’ to a community that doesn’t need it is going to
>>> be an uphill battle.
>>>
>>>
>>>
>>> What’s wrong with dropping all the way back to Thing when we don’t know
>>> the type of the agent?
>>>
>>>
>>>
>>> Ralph
>>>
>>>
>>>
>>> *From:* Young,Jeff (OR) [mailto:jyoung@oclc.org <jyoung@oclc.org>]
>>> *Sent:* Monday, August 10, 2015 10:04 AM
>>> *To:* Richard Wallis; public-schemabibex@w3.org
>>> *Subject:* RE: The Agent proposal in bib.schema.org is controversial
>>>
>>>
>>>
>>> One option would be for us to use foaf:Agent. Presumably search engines
>>> would ignore it, but that’s their prerogative.
>>>
>>>
>>>
>>> Another option would be to preserve http://bibliograph.net/Agent, with
>>> a comment that it wasn’t accepted by the broader community, but remains
>>> useful in our limited domain. (Terms that have been adopted should be
>>> deprecated.)
>>>
>>>
>>>
>>> Jeff
>>>
>>>
>>>
>>>
>>>
>>> *From:* Richard Wallis [mailto:richard.wallis@dataliberate.com
>>> <richard.wallis@dataliberate.com>]
>>> *Sent:* Monday, August 10, 2015 8:18 AM
>>> *To:* public-schemabibex@w3.org
>>> *Subject:* The Agent proposal in bib.schema.org is controversial
>>>
>>>
>>>
>>> You may have noticed if you followed the recent announcement of
>>> Schema.or v2.1
>>> <https://lists.w3.org/Archives/Public/public-schemabibex/2015Aug/0000.html>,
>>> which includes bib.schema.org, that one of our proposals did not make
>>> it in.  That proposal being the Agent type that we proposed as a super-type
>>> for Person and Organization.
>>>
>>>
>>>
>>> Agent has been a theme of discussion in the community well before we
>>> approached the issue.  You can follow the recent debate in the related
>>> schemaorg git issue comment trail:
>>> https://github.com/schemaorg/schemaorg/issues/700
>>>
>>>
>>>
>>> In the bibliographic world Agent is a well understood, some would say
>>> obvious, approach.  When applied to the wider domains that Schema.org
>>> embraces however, it raises many concerns and issues. Especially because,
>>> as proposed, it would introduce a new direct sub-type of Thing with
>>> ramifications that could cascade across many areas of the  vocabulary.
>>>
>>>
>>>
>>> In my personal opinion the gap between the two apposing views on this is
>>> significant and the best way forward would be to consider possible
>>> pragmatic approaches to how we represent our data in Schema.org without
>>> loosing the ability to describe our resources effectively to the wider
>>> world.
>>>
>>>
>>>
>>> In simple terms, if we identify an author, creator, publisher, or even
>>> copyright holder as a Person or an Organization there is not a problem.
>>> The difficulty occurs when we know from the relationships in the data that
>>> they are either a Person or an Organization but cannot identify which.
>>>
>>>
>>>
>>> One suggested way forward for such a circumstance would be to define
>>> them as a schema:Thing.  To me this feels a little too vague.  A follow-on
>>> option was to suggest a 'personOrOrganization' boolean property to indicate
>>> this circumstance.  This is a little more appealing, but I think it still
>>> needs some work.
>>>
>>>
>>>
>>> What are others thoughts on this?
>>>
>>>
>>>
>>> Do we believe that the proposed Agent type is the *only* way forward?
>>> Are there potential pragmatic options like the one I describe above that we
>>> could shape, that would be acceptable? Is this requirement to specifically
>>> describe agents as too detailed and something we can pass over, and move on
>>> to other things?
>>>
>>>
>>>
>>> ~Richard.
>>>
>>>
>>>
>>>
>>>
>>>
>>> Richard Wallis
>>>
>>> Founder, Data Liberate
>>>
>>> http://dataliberate.com
>>>
>>> Linkedin: http://www.linkedin.com/in/richardwallis
>>>
>>> Twitter: @rjw
>>>
>>
>
>
> --
> Corey A Harper
> Metadata Services Librarian
> New York University Libraries
> 20 Cooper Square, 3rd Floor
> New York, NY 10003-7112
> 212.998.2479
> corey.harper@nyu.edu
>



-- 
Jeff Mixter
jeffmixter@gmail.com
440-773-9079

Received on Monday, 10 August 2015 19:32:41 UTC