Sentinels Statistics Project

Ah, ok. I see the issue.  That does make sense.  I guess I would be inclined to simply drop those four villain entries (Agent Spite C, Agent Spite C+A, Skin Gloom C, Skin Gloom C+A) from the model entirely and not report them, given the concerns you raise above, but I get it.  

The only reason this matters to me, really, is that I have a spreadsheet set up now that randomly picks games for me to play on my iPad but weights the selection (villains, heroes, environment) by the inverse of the number of games that have been reported.  So I’m going to be doing a lot of Challenge + Advanced games over the next little while with obscure promo heroes.  :-)   

EDIT:  it looks like there are only 73 total games across those four villains.  If LynkFox were willing to alter the form in some fashion (even something as small as a note next to the Agent of Gloom Spite and Skinwalker Gloomweaver entries on the first page saying “if you played challenge mode use agent of gloom spite and record for the entire game played against both villains”) it would seem reasonable to me to drop those 73 games and start afresh.  But I’m glad other people are doing this work, not me, so I"m happy as is.  :slight_smile:

As much as I would -love- that Dandolo, there are many many people who log games there who don't visit the forum, and even where I tried to standardize things, it still gets inputed wrong. :P

 

Always recall that any given coding project / interent thing can be broken because someone will not be thinking in the same way you are and will do something you never expected.

 

As it is, I'm currently learning SQL and Database management, so I plan on transfering the project into a far more robust and powerful system, which will hopefully help clean up a lot of this!

Well, that's exciting. :D

Mindwanderer, I've been pondering Fear Factor in the statistics, and I am confused by something.  All of the Villain fear factors are positive.  How does this come about?  I'm trying to wrap my head around both the math that makes that happen, but also what it means.  I think it implies that the win rates against ALL villains are somehow inflated, but that doesn't seem possible. Could you provide some insight?

Good question, and it took me a minute of thinking about it to figure it out, but it's actually simple: More successful heroes are more popular than less successful heroes overall, so every villain is more likely to have been challenged by "better" heroes than by "worse" heroes.  There aren't any villains for which people actually prefer to use "worse" heroes against, on average, so all the values are positive.

That’s interesting.  I really don’t see much correlation between win rate and number of games (even accounting for the age of the card), to directly support the idea that “better” heroes are played more often.  But its not clear you would see it, really.  All it takes is a tendency to have at least one good hero per game and the fear factors would then end up net positive.  The good heroes could be “carrying” the bad, essentially.

I had never really thought about it this way before, but this “carrying” idea matches with anecdotal experience.  I’ve been in many a game that was won, but one or more heroes in the game (usually the likes of Bunker, Absolute Zero, Argent Adept, heroes that take a lot of set up time) didn’t really contribute much overall to the game.  It might have been a five hero game, but really it was a three or four hero game with a little extra help.  To put it another way, it seems possible that “good” heroes increase the overall chance of a win more than “bad” heroes decrease it. 

This morning it occurred to me that there is another effect that could be affecting the fear factor for villains: "publication" bias.

Looking at the video game data, the completion rates for games can be as low as 25%.  Now there are all kinds of reasons why games would not be completed, but I think it is safe to assume that a large fraction of those games would have been losses.  Therefore, the win rates from the video game data are almost certainly inflated compared to some kind of theoretical true "win rate" which includes games that were given up as futile.  This is similar to publication bias in scientific reasearch; papers with positive results are more likely to be published.

Now, I personally will play a game to the bitter end nearly every time, but in hindsight I recognize that I am likely an outlier; most people are not such gluttons for tedious punishment in their spare time.  So it is very likely that the submitted games also are affected by this publication bias.  And since anecdotally games I have been inclined to give up seem to be far more frequently related to villain effects (e.g. the villain has generated a situation that seems impossible) rather than hero effects (e.g. all the heroes seem useless) this could affect the villain fear factor much more than the hero.  Also, I think a larger proportion of submitted games are coming from the app these days, and I suspect it is easier to end a solo game on the app for futility than it is to end a social game played with cards.

Does this seem likely?  Its really hard to judge, because the submission form does not allow for submission of incomplete games.  I suspect some submitters submit technically incomplete games where the end result was patently obvious.  

It’s an imperfect solution but you could count incomplete games as losses. I suspect you are right and most of them would have been had they been completed.

It might be worth doing the calculations just to see the effect

I actuallly did do that when I first got the video game data, to decide what the most appropriate way to handle incomplete games was.  Very little changed when I treated them as losses.  Obviously the win rates went down, but the relative rankings and the regression coefficients were basically identical.

How do you calculate the Fear Factor?

It's in the notes, although not expressed mathematically.  Basically, for heroes, for each game, look up the average win rate of the villain in that game.  Sum across all games and then divide by the number of games.  Subtract the average win rate of all villains.  For villains, do the same but use hero win rates.

Huh, that’s very interesting.  How could that be?  The regression coefficients are predicting win rate, and if win rate is lower, then something has to shift.  As an example, if you cut all win rates by half in the data, then something in the coefficients would have to change in such a way that the predicted win probs generated on the calculator and randomizer pages produced values that were half what they produced previously.    

I’m going to go out on a limb and say that I think you mean the relative coefficients for villains, heroes, environmenst remained much the same, and the only thing that changed was the intercept parameter.  A lower intercept would reduce ALL win rates at once across the board in a consistent (albeit non-linear) fashion.  For example, multiplying the current intercept (~-2.12) by a factor 1.1. reduces predicted win rates by about 1-5% (more for win rates closer to 50% and the less as you get towards 0% and 100%).  A factor of 1.5 reduces them by 10-20%. 

If I’m right and that’s what you meant, that means that completion rate is uncorrelated with choice of villain, hero, environment, etc.  That is interesting to me!  

Just to be clear, I really am not trying to make trouble here.  I LOVE that you and Lynkfox do all this work, and as far as I am concerned you can keep doing it exactly how you are doing it.  I’m just fascinated by this rich trove of data and all the things that can conceivably be learned from it about this game I love.  

EDIT: I would hypothesize that if there is any correllation between completion rate and villain (which I think you are saying there might not be), it would not be correllated with villain difficulty so much as villain “frustration factor”.  That is, villains that generate annoying situations where you have to do the same repeated actions over and over again in the app on the way to losing (or winning for that matter).  Villains that just take FOREVER to deal with.  Villains that generate obviously impossible situations but will still take a while to beat. Villains with low “frustration factor” are villains that are straightforward, or that play quickly, or that end the game in sudden, unexpected ways.  

Ten Lowest Completion Rate villains (Lowest first): Gloomweaver: Skinwalker, Spite, Citizen Dawn, Akash’bhuta, Spite: Agent of Gloom, The Matriarch, The Chairman, Miss Information, Gloomweaver, La Capitan

Ten Highest Completion Rate Villains  (Highest last): Ambuscade, Plague Rat, Deadline, Wager Master, Infinitor: Tormented, Baron Blade, Kismet, Chokepoint, Kaargra Warfang, Baron Blade: Mad Bomber

Looking at those lists, I feel like my frustation factor idea is born out, with emphasis on the sudden ending issue.  

You got it exactly right--the intercept going down was the major change.  IIRC there were some small changes to the other coefficients (i.e. the slope), but the rankings were the same.

There's probably a correlation between abandonment rate and length of game.  Unfortunately we don't have good data on length of game.  I'd have to compare the self-report data to the digital data, and the "number of rounds" field is optional (and I know I personally fill it in for short games, when I remember, and leave it blank for longer ones, where I lose track).  Looking at it ad-hoc, it certainly seems to hold true--games with abandonment rates disproportionate to win rates include Challenge/Ultimate Akash'Bhuta, Ultimate AoG Spite, and Ultimate Ennead.

Huh. I would’ve done it by measuring how far each hero is from the median of the win rate histogram. Interesting.

I love what you've done with the statistics results! The randomizer is AWESOME! I wanted to point out that stuntman and his variant havn't been added to the randomizer and calculator yet. Otherwise, it's really great! I love going through and playing games based on what is needed for the stats.

Whoops, don't know how I missed Stuntman!  Fixed.

Just a heads-up (and I mentioned it in the notes of a response I sent for a game I played earlier tonight in the environment), Celestial Tribunal's also missing, but only from the video game's randomizer. I figured now was a good time to bring it up, since I imagine the Void Guard (sans variants, for now) are getting plugged in there soon.

Bah, it was in the page, just missing the "The".  Fixed.

I noticed my games for the last couple days haven't shown up in the stats. Looks like that issue with the data sheets is cropping up again.

 

Edit: Nevermind, my games from 1/2 did show up. So I must have either forgotten to record my games from the previous days or made some other mistake in entering them.

Boom.

(Note that the data is necessarily limited.  A lot of games go incomplete and there are a lot of combinations.  In particular, trying to infer anything about Advanced is pretty pointless.)