Re: Itemprop for person from Antoine Isaac on 2012-11-28 (public-schemabibex@w3.org from November 2012)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Wed, 28 Nov 2012 23:33:52 +0100
To: <public-schemabibex@w3.org>
Message-ID: <50B69150.8010905@few.vu.nl>
Hi Karen,

Thanks for the confirmation about the weirdness of subjects vs. creators, I wasn't really expecting that one before the thread ;-)

On using personas or strings vs. real persons: I was perhaps wording it too strictly, of course any mark-up is better than waiting forever for perfect data. And this fits the schema.org approach, where it is stated explicitly (somewhere) that simple strings can still be used for attributes that would ideally be used with resources.
Nonetheless, I find it useful if we can still come with recommendations, for the scenarios we're after.

By the way I have to admit you're right on Amazon. I'd feel more comfortable with having mark-up that is "poor" (strings) rather than mark-up that splits the reality into too many hairs (personas). But a brief search on Amazon gives quite different results from Samuel Clemens and Mark Twain. Not counting the one below with two authors ;-)
http://www.amazon.com/Best-Quotations-Mark-Twain-ebook/dp/B003U4W790/

Antoine


>
>
> On 11/27/12 8:05 AM, Antoine Isaac wrote:
>> Hi Karen,
>>
>> Thanks for the explanations! So, is there a different rule for subject
>> indexing of books (one persona should be used) than for description of
>> creators (different persona can be used)? If yes, it's a bit strange.
>
> Oh, yes, it is strange. Of course, it's libraries! Here's the entry for Clemens:
>
> HEADING: Clemens, Samuel Langhorne, 1835-1910
> 000 01553cz a2200241n 450
> 001 1718977
> 005 20121024153045.0
> 008 931008n| azannabbn |a aaa
> 010 __ |a n 93099439 |z n 88274847
> 035 __ |a (DLC)n 93099439
> 100 1_ |a Clemens, Samuel Langhorne, |d 1835-1910
> 400 1_ |a Klemens, Sami︠u︡ėl, |d 1835-1910
> 400 1_ |a Klemens, Seĭmeul Lenghorn, |d 1835-1910
> 400 1_ |a Kʻo-lan-man-ssu, Sai-mi-erh Lang-en, |d 1835-1910
> 500 1_ |w nnnc |a Twain, Mark, |d 1835-1910
> 663 __ |a Works by this author are identified by the name used in the item. For a listing of other names used by this author, search also under |b Twain, Mark, 1835-1910
> 667 __ |a SUBJECT USAGE: This heading is not valid for use as a subject. Works about this person are entered under Twain, Mark, 1835-1910
>
> There's also a code in the Leader that says whether it can be used as a subject. So all subject headings are under Twain.
>
> Then in the LC Classification there is an area for literature, and in that, presuming that both Twain and Clemens had written novels (not the case, but pretend) then they would be given the same classification number. One would hope that the class number would also translate to Twain, aligned with the subject heading access, but I don't know that there is any guarantee.
>
> Before the 1970's and AACR, authors also were given the name of the real person, and everything by Twain in US libraries was under "Clemens" meaning that users often didn't find what they were looking for. The use of pseudonyms as entry points has to do with what the user comes to the library knowing, not some concept of what is "correct." And for subject access, what is considered the most common name is used.
>
> Note that none of this really translates to anything that we could think of as universal identifiers. As I said in my note to Richard, most libraries are still using text strings. Moving from text strings to identifiers is a future goal, perhaps, but I see schema.org in the near-future of libraries as being web page markup of the textual data that libraries have. That's step 1. Later steps may include identifiers and inclusion of a wide range of identifiers, but I think that will only happen with some serious changes in library systems and library data.
>
>>
>> Anyway, my point in 2 is indeed about (somewhat) controling value: if
>> library mark-up tries to reproduce persona patterns that are really
>> specific to library-land and not relevant for more general mark-up
>> consumption cases, then library mark-up risks being not very popular...
>
> I think this isn't so much a question about library markup as about bibliographic markup. Publishers and Amazon use the name of the person that is on the package or product. They may or may not have information about other names or name forms that have been used. And it isn't clear to me that bookstores will be terribly interested in the "real world person" since their job is selling the product with a particular name on it and their systems may never get linked into the semantic web (who knows?). I agree about all of the desires for linking, but I'm not sure that's the role of schema.org. It can be used that way, but I see schema.org as being a recipient of links that have many more uses, not as the main motivator for creating links. It's the tail wagging the dog, and in terms of identifiers and links we should concentrate on the dog, and the tail will follow. Meanwhile, a quick entry into schema.org has some value.
>
> kc
>
>>
>> Cheers,
>>
>> Antoine
>>
>>
>>>
>>>
>>> On 11/27/12 1:09 AM, Antoine Isaac wrote:
>>>
>>>> 2. We can ask libraries to munch their various persona into "real"
>>>> person records, before exporting the data using schema:person for
>>>> representing real persons.
>>>
>>> For subject access (subject headings and classification) US libraries
>>> use only one "persona" where more than one has been used. It is not
>>> necessarily the "real" person -- it is the "best known" -- so that in
>>> subject classification all works by or about Twain and Clemens are
>>> under Twain. You can see which is used in the authority record because
>>> "Clemens" is marked "not for use for subject access."
>>>
>>> That said, I still don't understand the necessity. Remember that
>>> schema.org will mark up the library data that is there, as it is used
>>> by libraries on their web pages. It's not about exported data, it's
>>> about marking up what you show your users. This already means that
>>> libraries in Russia will have author Лев Никола́евич Толсто́й, and in
>>> the English-speaking world we will have author Leo Tolstoy (or some
>>> variant on that). These are the same real person, but I don't think
>>> that's the point -- the point is that schema.org allows you to mark up
>>> your data, it doesn't control the value space, and the value space is
>>> going to be messy.
>>>
>>> kc
>>>
>>>>
>>>> While 2 may first sound harsh on our community, I believe this we can
>>>> name it "reasonable" when library data goes out and meets more general
>>>> scenarios. Also, I believe that current initiatives (e.g., every project
>>>> that seeks to align name authority with DBpedia) are working towards
>>>> making this possible, even though it will be sometimes bumpy.
>>>>
>>>> An important decision criterion would of course be the usage scenarios
>>>> there are, and their accompanying information needs, either voiced by
>>>> users or extrapolated by the search engines serving these users. In
>>>> other words, what is the take of schema.org on the topic? And how
>>>> leading commercial sites are treating personas? (I'm offline while
>>>> writing this so can't check, alas).
>>>> I suspect anything libraries come up on this issue will have little
>>>> weight compared to what search patterns established by Amazon and the
>>>> likes.
>>>>
>>>>
>>>> Note that for other kind of entities like places I'm slighlty less sure.
>>>> Maybe there are some applications that would benefit from not trying to
>>>> geo-localize places that are not on this Earth.
>>>>
>>>> Best,
>>>>
>>>> Antoine
>>>>
>>>>
>>>>
>>>>> I'd argue for Fictional Things.
>>>>>
>>>>> On 14 Nov 2012, at 18:27, Kevin Ford wrote:
>>>>>
>>>>>> we need to make a strong case for enabling the possibility of
>>>>>> fictional medical studies or postal addresses, among other things.
>>>>>
>>>>> Fictional place names certainly exist -- as in Winnie the Pooh:
>>>>> https://en.wikipedia.org/wiki/File:100acre.gif
>>>>>
>>>>> Or even styled as (fictional) postal addresses:
>>>>> 4 Privet Drive
>>>>>
>>>>> For a Fictional Medical Study, the backstory of the games Portal and
>>>>> Portal2 come to mind. In fact, game backstories would often provide
>>>>> examples, e.g. the studies at locations such as these
>>>>> http://en.wikipedia.org/wiki/Locations_of_Half-Life
>>>>>
>>>>> Or I suppose this might do:
>>>>>
>>>>> https://www.facebook.com/pages/QASE-fictional-BioMedical-Research-Facility/169647519762414
>>>>>
>>>>>
>>>>>
>>>>> (maybe http://www.facebook.com/r.php?fbpage_id=169647519762414&r=111
>>>>> <http://www.facebook.com/r.php?fbpage_id=169647519762414&r=111> but
>>>>> that gives a loginwall):
>>>>>
>>>>> Pardon the excessive examples. It's the end of the day.
>>>>>
>>>>> :) -Jodi
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
Received on Wednesday, 28 November 2012 22:34:23 UTC