In my opinion, it doesn't really make sense to make judgements based on 2 hero teams. The game was designed to be played with 3 to 5 heros. Every hero plays a role that contributes towards the teams performance. Two hero games are going to be super biased against heros who fill certain niche roles and heros who performance is enhanced when other heros are covering their weaknesses. This is after all, a team game..
It would be like judging the bishop and knights worth in chess, by playing a game where you only use 5 bishops, and a game where you use only 5 knights. That is not how chess is meant to be played, and would be in no way indicative of the worth of those pieces. They both shine in different scenarios, when working together with other pieces as part of a team.
I just don't think it makes sense to perform tests based on a style of play (2 hero teams) that the game was not designed for, and then using those results to judge a heros worth.
The tests are with 3 Hero teams. The Duos are just set pairs of heroes that we will pair with the main hero we are testing: Duo+Test Hero = 3 total. Sorry for any confusion there.
Nobody will stop you from doing this, of course. We're just not sure how useful it is as an exercise. It sounds terrifically boring, but do let us know how it goes if you try it. We may all learn something!
As far as who to use, I recommend mid-difficulty villains who offer a lot of chances for everybody's skills to come into play. Citizen Dawn indeed does that. La Capitan would work well too, I think. What these villains share is a need to strike a balance between attacking minions and attacking the boss, and they require hero finesse in managing their flip conditions.
As an environment, I would choose somewhere fairly neutral that doesn't take over the game. Insula Primalis is a decent default choice. Final Wasteland, Time Cataclysm, or Realm of Discord may work well.
Heroes: I would go with partners that I've rarely heard people complain about one way or the other. Some combination of Tachyon (original), Ra, Fanatic, and Haka would probably do well for you. (Don't use Ra with Insula Primalis, though.)
Alright! Good idea! We'll do the Plague Rat test first, then back to Citizen Dawn and La Capitan. It SOUNDS boring if I ever found myself bored by the game :D. Luckily, it hasn't happened yet. Hopefully these tests don't burn me out :D
10 is way to small of sample. The standard deviation would be around 15%. The difference between in Fixer and Legacy is only 14% in the stats project.
Now if you played 100 games then you would have about a 5% error which would allow you to distinguish between the best and the worst heroes at a statistically small level.
Two thoughts:
1) I don't believe that all the heroes need to be or can be balanced. It is impossible.
2) I believe that random teams is the best way to measure balance, from a statistical perspective.
Oy! Just played a few game today with Mr. Fixer (not vs. plague rat) and he managed to make the bosses cry! I don't know what is different, but I can't complain about him anymore.