I was on the call when this was decided, but please don't take this as authoritative.  I am still trying to grok for myself the distinctions between levels.
Captions are fundamentally essentially for multimedia to be accessible.  The use of Closed Captions means the feature can be invisible to those who don't want it.  There is years and years of experience with providing captions both on the Web and with broadcast video.  Post-production software, like MAGpie, is freely available and is not difficult to use, although it is a time commitment.  If a video is important enough to be on a Web site, it is important enough to be made accessible.  If the requirement to add captions means that a site will not include video, that -is- an acceptable compromise!  If the site is interested in being accessible, they will make the effort, especially so if the requirement is at Level 1.  If the site doesn't care about WCAG 2.0 Single A conformance, then the distinction between Level 1 and Level 2 doesn't matter any way.
Personally, I cannot think of a credible reason why captions on prerecorded multi-media should not be at Level 1.  I would be curious to get the explanation of how such a thing happened with a previous public working draft, but I am trying to let that go!
Live captioning for broadcast television is a reasonable expectation.  The resources required to produce studio-based broadcast television with live captioning are not significantly greater than the resource required to produce live broadcast television without captioning.
Live Internet-based video broadcasting is still in its infancy.  We don't have sufficient experience with the medium.  We know that single individuals can provide live video streams.  We know that with the current technology (and the limitations with automatic voice recognition) that a single individual can *not* provide live captioning while they are in a video chat.
It is desirable not to limit the kinds of content a small site might provide.  As it stands, a single individual could have a WCAG 2.0 Single A Web site and provide prerecorded and live multimedia.
I am sure there is some stuff on the cutting edge, but the technology I am familiar with (for live intranet based multimedia) only supports open captioning.  In any case, open captioning is simpler than Closed Captioning but open captions are not invisible.
So, that's two reasons to argue why the requirement for captioning is reasonable at Level 2:  (1) The current commonly available technology for providing live captions on live video is not invisible.  (2) The burden (difficulty and expense) for live video with live captioning is significantly greater than both (a) pre-recorded multimedia with post-production captions; and (b) live video without captions.
Despite my ability to articulate this, I am not entirely comfortable with this distinction.  I would be inclined to assert that any organization that is committed to accessibility and capable of producing *quality* live multimedia is perfectly capable of providing live captions.

