Badly, I expect.

I'd probably do the evaluation with one episode but then get other
episodes later on.

What sort of a meaningful test could you write where you could *tell*
what the implementation did?

