minutes POIWG teleconference 08 September 2011

Hi all,

For last week's meeting we had a special presentation from Henning Schulzrinne.  The slides are available here:
	http://www.cs.columbia.edu/~hgs/papers/2011/poi-urn.pptx

and the minutes are available at:
	http://www.w3.org/2011/09/08-poiwg-minutes

and as text below.

-Matt

--
   [1]W3C

      [1] http://www.w3.org/

                               - DRAFT -

            Points of Interest Working Group Teleconference

08 Sep 2011

   See also: [2]IRC log

      [2] http://www.w3.org/2011/09/08-poiwg-irc

Attendees

   Present
   Regrets
   Chair
          SV_MEETING_CHAIR

   Scribe
          Matt

Contents

     * [3]Topics
         1. [4]Identifying and Categorizing POIs, presented by Henning
            Schulzrinne
     * [5]Summary of Action Items
     _________________________________________________________

   <trackbot> Date: 08 September 2011

   <robman> hey matt

   <robman> great thanks...you?

   <robman> sure...lets talk later

   <cperey> 848.aaaa is Christine

   <cperey> called BuildAR

   <scribe> Scribe: Matt

Identifying and Categorizing POIs, presented by Henning Schulzrinne

   ->
   [6]http://lists.w3.org/Archives/Member/member-poiwg/2011Sep/att-0005
   /poi-urn.pdf Slides

      [6] http://lists.w3.org/Archives/Member/member-poiwg/2011Sep/att-0005/poi-urn.pdf

   Henning: pulled some use cases out of the draf
   ... The ones I pulled out seemed to be about finding categories of
   things.
   ... How can we divide the millions of POIs into manageable
   categories for searching?
   ... Some problems we won't be solving:
   ... Properties of POIs vs category, e.g. "restaurant that takes
   credit cards" isn't a category.

   <cperey> restaurant a favorite category :-)

   Henning: The distinction between what is or isn't a category is
   somewhat arbitrary. You can claim that "Italian Restaurant" could be
   a category, or it could be a cuisine attribute of a restaurant.
   ... It's fuzzy, and one has to make pragmatic choices about what
   user's typically expect.
   ... The two characteristics I see defining categories is: 1. ?? and
   2. that categories are not interchangeable
   ... A "gas station" and "restaurant" is not interchangeable, but
   even this gets tricky, "synagogue" and "Christian Church".
   ... North American Industry Classification System (NAICS) is one
   standard, the only one I've noticed. Based on Census.
   ... Very much comes out of industry, designed for classical
   manufacturing type industries, i.e. "this establishment produces
   cutlery"
   ... It struggles today with identifying services.
   ... While it's a fine example of categories, it isn't what we'd want
   to use though. Restaurants for instance have just two
   classifications: full-service and limited service.
   ... That may be okay, but is somewhat limiting, and not really what
   I'd expect users to care about.
   ... Great from a statistical perspective though.
   ... Many of things you'd want to look up, aren't in the system at
   all
   ... I tried some things that are common from GPS POIs and they don't
   appear at all in NAICS, e.g. ATMs, wifi hotspots, monuments, etc.
   ... One alternative is to say we've got Google, just use free-text.
   That works, and is probably better than many alternatives, but free
   text is also hard to translate into other languages, and the same
   service has many names.
   ... e.g. ATM, cash machine, automated teller machine, etc
   ... Then there are also things like distinguishing between a diner,
   a café, coffee shop, etc. McDonalds calls itself a restaurant, but
   most of us would think of it as another term.

   <cperey> McDonalds =? Restaurant?

   Henning: So you might get lots of things that you wouldn't expect
   showing up on a list.
   ... There's also properties, such as "public", "university", of
   library.
   ... and hierarchy is missing too: French vs French Restaurant.
   ... Another option is map overlay labels. These are usually used to
   label topographical features and not services, like restaurants,
   ATMs.
   ... GPS POIs are much more consumer applicable.
   ... As far as I can, there isn't any standardization for POI labels,
   at best informally standardized. I haven't confirmed with every
   vendor though.
   ... Some of the categories are a bit odd too, sometimes very broad:
   all government services labeled as one, even if it's everything from
   libraries to prisons, but sometimes libraries are separate etc. It's
   inconsistent and not clear if it's official or just made up.
   ... And lastly, the Yellow Pages labels.
   ... I've heard every region does there own thing and often picked in
   a way that businesses would appear in as many places as possible,
   rather than where users might expect them.
   ... Yellow Pages wouldn't contain bathrooms and ATMs or other things
   without a phone number.
   ... Coming from another direction, we had a similar problem when we
   started redesigning the emergency calling system in the US.
   ... One of the big problems is that every country pretty much has,
   for historical reasons, used a different digit pattern.
   ... Confusing scenario where in some cases a number is used
   generically for all services, i.e. 911 (excepting poison control),
   but other countries have different numbers for police, fire,
   ambulance, etc. So, there was no hope of a standard for a number
   identifier.
   ... We started on something different, RFC 5031. It's a URN for
   services. urn:service:sos
   ... They're extensible via IANA.

   -> [7]http://tools.ietf.org/html/rfc5031 RFC 5031

      [7] http://tools.ietf.org/html/rfc5031

   Henning: These are internal to the system, used for call routing.
   ... Allows devices to have an entry that gets to the right people.
   ... NG911 and NG122 have been moving in this direction. 911 it seems
   to be accepted and getting deployment.
   ... We determined that this was extensible to other services too.
   ... N11 services -- always reserved 3 digit numbers other than 0 and
   1 and end in 11, e.g. 211, 311

   <ahill2> 611 is used by the cell company for their information

   Henning: In that same spirit, we explored extending it to
   non-communication services.
   ... Designed for consumer use, things that we can label, e.g.
   "food", "fuel", "business", "communication" (wifi hotspots, internet
   café). Design not to have thousands of categories, but still similar
   to a GPS POI finder.
   ... 13 top level categories that then have further detail within.
   ... e.g. transportation.airport
   ... So, remaining issues and to-dos:
   ... Are there systems out there already? Can we extend it without
   breaking? Maintained in some way that there is coherence in the
   labeling? If not, and we go forward with the URN model, would we
   register these things? With IANA?
   ... IANA would be just a database, would we sub-delegate that?
   ... How would maintenance be done on that?
   ... How is it maintained? I'm partial to something like the Olson
   time zone database model.
   ... Not an official group, or a government thing. It used to just be
   one person, but now there's a mailing list with consensus process.
   ... It will be maintained on a long term basis through IANA.

   ->
   [8]http://tools.ietf.org/html/draft-lear-iana-timezone-database-02.h
   tml IANA Procedures for Maintaining the Timezone Database

      [8] http://tools.ietf.org/html/draft-lear-iana-timezone-database-02.html

   Henning: So that's my brief summary of what I'm trying to do.
   ... So two things: is there obviously related work that someone else
   has done? Or if not is there a group of people that might be
   interested in forming a nucleus of an organization to do this?
   ... If people take it up, yes, great, if not the harm is fairly
   limited.

   robman: Two questions: how would URNs support non-English?

   Henning: Are you familiar with IETF i18n document?
   ... This falls in the category of protocol label, not meant for
   human consumption. Each label would be called something different in
   each language, but as a protocol label for routing and querying is
   that you should stick to one language.
   ... At the IETF, it's been English. There's nothing inherently in
   the labels to prevent other languages, modulo the i18n URN problems.

   robman: Because it's much more into the meaning than normal labels,
   it'd have labels that mean things just in certain cultures.

   Henning: Right, if there was a category that doesn't have a good
   label in English, it's still plausible. It's not intended to
   preclude anything, but with it being a protocol label there's less
   concern over it at the moment.

   robman: This is very top down. What if people could create random
   labels? And if they're not used, they'll die off or be redundant
   anyway?

   Henning: There's likely not going to be a way to solve the language
   labeling problem. I'm not opposed to the free text model. Because of
   the need for i18n, and because it's not used as a search term, that
   I have some doubt that free text will be successful.

   [[I was just thinking of schema.org given this conversation, and now
   I see: [9]http://www.schema.org/Place]]

      [9] http://www.schema.org/Place

   Henning: I'm not thinking the urn would be typed into a search for
   instance.
   ... Right now we don't have network databases, but these static ones
   that vendors maintain.

   robman: I think the international bridging point you made is quite a
   good argument for it.

   ahill2: How do you see this translation between labels and URNs
   happening?
   ... I can imagine a world where Google says "these are the
   categorizations" and translate them to URNs, but for an ordinary
   individual, what services are available to them to translate their
   terms and searches into these categories?

   Henning: I imagine the software built by say a tourist app or a GPS
   vendor, would in turn, depending on the appropriate interface, would
   build some subset of these labels into their system and translate
   and expose them appropriately.

   ahill2: You think this translation between common labels and URNs
   happens ad-hoc? No central database of any sort?

   Henning: I hadn't thought that far ahead, but if we were more
   ambitious, there would be a description for each term, geared
   towards localization.
   ... Nothing prevents that if we get enough people together.

   ahill2: Isn't that what is happening today with Google? Isn't it the
   translator today?
   ... If this knowledge is being crowd sourced somewhere, Google or
   OSM, we should use that.

   Henning: I was unimpressed by OSM process.

   ahill2: I'm envisioning a search where these urns are available,
   e.g. every result has a category. That would be neat, because then
   the search would be the translator between free text and labels and
   urns.

   Henning: If someone could do that I'd be delighted.

   <robman> [10]http://en.wikipedia.org/wiki/Web_Ontology_Language

     [10] http://en.wikipedia.org/wiki/Web_Ontology_Language

   robman: What about these taxonomy communities that are working in
   their own domains, like OWL.
   ... It's not just locations we're talking about, but we're cross
   domain here.

   Henning: Yes, that's part of why I was asking for pointers to these
   communities. Once we get into the ontology side more, that would be
   helpful.
   ... I'm unsure that the type of work, if it's property attributes,
   etc, if it's directly applicable, or if sub-pieces of that can be
   pulled out. We don't, from a POI perspective, want a complete
   ontology that crosses categories and properties.
   ... If there are communities we should know about, please let me
   know. We looked about a year ago into this, building a system than
   could combine ontologies, e.g. find a specific movie and dinner with
   a cuisine type. We didn't find anything then, but we might have
   looked in the wrong place.

   ahill2: Can we remind ourselves of some of the other categorization
   efforts that we discussed. I believe Library of Congress was
   discussed and a number of them had URLs involved.

   Henning: Any pointers you have, please pass along. One difference
   between identifying specific objects and categorization is
   one-to-one vs one-to-many.

   Raj: Geonames

   ahill2: Does that do categorization?

   rsingh2: Yes, but they're just categorizing places, not business
   classifications.
   ... They started with USGS classification system, but theirs is much
   smaller problem than ours.

   Henning: Looking at geonames, they've got postal codes as the lowest
   level I see.
   ... School, post office, cemetery, etc. Not sure how many features
   they have.

   rsingh2: That's right out of USGS.

   <karls> hi

   ahill2: What did we propose to use?

   Henning: The doc is all I know is from the doc NAICS.

   rsingh2: I think coming up with a single country classification
   scheme is easy, but what's harder is a POI system like for AT&T,
   where they want you to search for say where to get phone cards.
   ... That's at one type of business in the USA, but another type in
   other countries.
   ... Reconciling that between countries is very difficult.

   karls: There's a ton of work on brand binding and chain binding to
   help that work.
   ... That side-steps classification though.
   ... Using NAICS is mostly for information exchange, most of the time
   these are hand tuned by the app devs. Many schemes that are app
   specific.
   ... The low-level standards are just used for hand-off so people can
   do mappings.

   Henning: That's what I've seen as well.

   karls: It's useful to carry around NAICS codes in terms of the spec,
   as our spec is about exchanging information, but in terms of
   customer facing stuff, it's pretty open ended. Our model should be
   we'll support the structure, but you make it up.

   rsingh2: My instinct is similar, we're not ready to tackle that in
   version 1.
   ... We might be overstepping the bounds of what innovative
   developers would build.

   karls: Typically these systems were done for handhelds, or
   constrained environments. I think search trumps all though.
   ... The conversation at Microsoft/Nokia/NavTEQ is do we care about
   categorization anymore?

   Henning: The search experience is good from large providers, but it
   requires a fair amount of user skill to get what you want. Looking
   at Restaurant, you have two things like Google maps, but also
   specific ones like Urban Spoon.
   ... There's more relevant hits in the latter.

   karls: Here's what I see: one end there are proprietary category
   systems, on the other there's web page crawling for open ended
   search.
   ... In between, you've got a lot of POI gazetteers who are doing
   meta tagging, as it facilitates parametric search.
   ... The middle ground is the tagging. I thought the spec addressed
   that capability to open endedly do the metatagging.

   ahill2: Can you elucidate on that a bit more karls?

   karls: Take a service like Open Table, where they have restaurant
   categories and sub-categories.

   <robman> +1 to link based structured data 8)

   karls: You're not going to get that information out of scraping a
   web page. That information is best consumed by an application by OT
   if a POI has a pre-set, open-ended list of terms that describe it
   well. It's tantamount to the meta tag on HTML pages.
   ... Gazetteers are doing field ops, web scraping, crowd sourcing,
   etc, to distill down to ten or twenty keywords that are the most
   descriptive to put in the POI.

   <rsingh2> parametric search = faceted search

   <rsingh2> [11]http://en.wikipedia.org/wiki/Faceted_search

     [11] http://en.wikipedia.org/wiki/Faceted_search

   karls: Typically the app tier puts a parametric search on top of
   that: hours, beer, etc.

   ahill2: We're talking about somewhere between category only search
   and free text.

   karls: You could argue that it's all categories or parameters, e.g.
   24 hour restaurant could be a category or a property.

   rsingh2: The popular term would be faceted search.

   Henning: Close, but not quite, you might have things like types of
   credit cards accepted, and it might be labels drawn from a set, or
   specific information that isn't categorized: e.g. open hours.

   <rsingh2> I'm late for another call. Bye all.

   robman: That's why we were thinking open ended links, because it is
   so closely tied to the users' mind space when they search.
   ... If we approach it as a categorization problem we have to
   approach it differently.

   Henning: I think I differ on that. If you look at OT, they do do
   categorization, they do much better than just crowd source tagging.

   karls: I think what we want to do is to be able to have OT exchange
   their POIs outside their business sphere.
   ... So, we want to make sure the spec can support rich and
   proprietary tagging, without defining the facets ourselves.

   Henning: Why not some of the facets? I think I've demonstrated that
   some are viable.

   ahill2: One of the things we've been careful about is making sure
   that there are multiple categorizations that could apply to a POI.

   Henning: It could have multiple category schemes too.

   ahill2: In your proposal, are you open to the idea that NAICS ends
   up adding some of these categorizations that are facets as opposed
   to routing to a specific business?
   ... That is: if there were a number of different categorizaties that
   a business has, would NAICS be the appropriate place to build up a
   category?

   Henning: I'm not part of NAICS, but given that they're part of the
   census, I imagine they wouldn't be looking at these properties. I
   can't say what they should do, but my perception is that their
   mission is industry classification statistics.
   ... eg. how many people work in fast food restaurants, rather than
   say what credit cards they take.

   karls: They're also missing juicy POIs, like golf courses, transit
   stops, etc.

   Henning: Yes, so far it seems outside their mission of what they're
   doing.

   ahill2: Sorry, I think I asked the question wrong. In your URN
   proposal, would you see those categories, which are facets outside
   of a category being appropriate, e.g. hours, or all the way down to
   the kind of information from crowd sourcing.
   ... Where do you draw the line?

   Henning: A URN to my mind is not as suitable for these
   non-categorization models. You've identified some binary things, but
   many are not easily represented in the same fashion. That said, we
   have separately, and I didn't talk about it here as it's
   preliminary, in the system we built, that has the ability to
   retrieve an XML type document with suitable tags that have that
   information.
   ... We could envision that being useful for us to agree on labeling
   to enable exchange.

   <ahill2> thanks, that answers my question

   Henning: There's an opportunity there, didn't discuss it here, and
   it's to some extent orthogonal, but there's a need for that as well,
   maybe industry specific bodies, which might be in a position to do
   that more appropriately.
   ... I look forward to the mailing list conversation.

   cperey: As for next steps, Matt will publish the minutes of the
   meeting. It's almost a transcript.
   ... He publishes that as a URL, it becomes archives for the group.
   That gets it out to a larger audience, but after that it's kind of
   up to this group. We're having our F2F in two weeks.
   ... We should work on this at the F2F and followup with actions from
   that.

   Henning: There's no dependency here, so that's fine.
   ... Right now, I don't even see it as appropriate to include it in
   the doc, as it's not specific to this effort. But, I would like to
   look for a community of interest to take it to the next level of
   specificity.
   ... I'm not asking the WG to take on this particular task, it's
   probably outside the immediate scope.

   matt: Could be a CG perhaps? POI WG decided not to do this.
   ... Thank you!

   Henning: Thanks, and thanks to Christine for arranging this.

Summary of Action Items

   [End of minutes]
     _________________________________________________________


    Minutes formatted by David Booth's [12]scribe.perl version 1.136
    ([13]CVS log)
    $Date: 2011/09/08 15:29:05 $
     _________________________________________________________

     [12] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
     [13] http://dev.w3.org/cvsweb/2002/scribe/

Scribe.perl diagnostic output

   [Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.136  of Date: 2011/05/12 12:01:43
Check for newer version at [14]http://dev.w3.org/cvsweb/~checkout~/2002
/scribe/

     [14] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/sos/service:sos/
Found Scribe: Matt
Inferring ScribeNick: matt

WARNING: No "Present: ... " found!
Possibly Present: Henning P11 P14 Raj aaaa aabb aacc aadd ahill2 cperey
 danbri joined karls matt poiwg robman rsingh2 trackbot
You can indicate people for the Present list like this:
        <dbooth> Present: dbooth jonathan mary
        <dbooth> Present+ amy


WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Found Date: 08 Sep 2011
Guessing minutes URL: [15]http://www.w3.org/2011/09/08-poiwg-minutes.ht
ml
People with action items:

     [15] http://www.w3.org/2011/09/08-poiwg-minutes.html

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.


   End of [16]scribe.perl diagnostic output]

     [16] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm

Received on Thursday, 15 September 2011 02:59:21 UTC