HoF: WAAS methodology unveiled

cheekimonk · Post by **cheekimonk** » Mon Feb 06, 2012 2:58 pm

Reader's Digest version is in replies for a shorter version.

About mid January I posted a thread about a Peter Keating article for ESPNtheMag that came out just before the MLB All-Star vote. In it he suggested that while he thought the growing use of WAR (Wins Above Replacement) to compare players independent of position, prototype, era, etc. was a good thing, he was troubled by the cumulative nature of WAR as it relates to comparing careers (esp., for Hall of Fame voting). He suggests the use of a variation on WAR and I thought the idea had a lot of merit. Here I'll break with Keating to introduce the stat, discuss the shortcomings specific to OOTP, and talk about my method of hashing them out. I'll be wordy in the beginning but then wrap up quickly. I'll follow with a post using the stat to take a fresh look at our HoF and possibly one looking at the current pool of candidates (I'll defer to what you guys think on that as many people have probably already voted).

Start from the Start

So WAR gives you the number of "wins" a player delivered compared with a theoretical "replacement" player (borderline AAA or bench player). Two questions should jump out: What is a "win"?; and Who is this "replacement" player? First things first, a "win" does not mean exactly what you think it might. The evolution of WAR didn't come from a desire to determine a player's value to his team, but rather his relative value compared to other individual players. So if you have it in your head that Manuel Aguilar's 12.9 WAR so far this year means he personally delivered 13 of Carolina's 84 wins, drop that line of thought. If that were true about WAR then I wouldn't need a context...I could go to the trade market and ask, "Who wants 13 more wins?" The "replacement player" is a sticking point with virtually everyone who uses WAR for anything, but I've come very close to what OOTP seems to use and that's really my aim.

Now that we've started fresh, to compare players you have to get them in the same dimensions. Telling you I traveled 400 to my parent's home but 600 to the beach tells you nothing unless I confirm that both of those numbers are in miles. Or kilometers, or yards, or cubits...it doesn't matter. As long as they are both the same thing then what I said makes some sense. So what units do we use to compare baseball players? Well, we have had raw stats since the beginning and they haven't helped too much. Walks, home runs, strikeouts, doubles, stolen bases, wild pitches...they are pretty much useless outside of defining what a player did. Even saying Player A had 3 times the number of doubles as Player B tells us nothing. So the next step up the "scale" was a look at how those stats might come together. Well, what are baseball teams trying to do? They're trying to win. To increase the chances of winning they have to score runs and prevent runs.

But runs are not really what players directly deliver the majority of the time. In fact, we all know what happens to players who constantly try to hit the home run or strike out every hitter they face. Fail. But runs do have in common that they are all made by baserunners...no one can score sitting on the bench. So for hitters the question is how well the player did in creating baserunners or turning existing ones into runs. But when my single allows a guy on 2nd base to score neither of us should get 100% credit for that run. By the same token, the other player wouldn't have been so valuable had he been on 1st base. Now, since we've gotten to the subject of runs, we can quickly break down WAR.

What is it good for?

As mentioned, once we have WAR it can - is designed to, in fact - be used to compare any player regardless of any other factor(s). But it's probably obvious that you have to come at it from a different angle for pitchers than position players. So:

Pitching WAR

Early last decade a stathead who's last name is McCracken had a breakthrough theory on pitching that was revolutionary mostly because pretty much everyone saw the work when published and said, "Welp, makes sense to me." That never happens in the baseball stats universe. He determined that if you want to know the baserunners allowed 100% by a pitcher, how dangerous each one was, and how many runs they eventually led to, then you have to start with the only things a pitcher does control 100%: walks, hit by pitch, strikeouts, and home runs. From that he did a bunch of mathy stuff and came up with:

Fielding Independent Pitching (FIP) = ((13*HR)+(3*(BB+HB-IBB))-(2*K)) / IP + constant (the constant is typically 3.20 and is only used to put FIP on a similar scale as ERA)

From there it's more mathy stuff to turn FIP into a "runs allowed/prevented" number and then into WAR, but we are fortunately spared that as OOTP dev recognized that they have the data they need for this pitching WAR and so it's included going all the way back to 1973.

Position WAR - Offense

WAR for position players is made up of an offensive (how many baserunners/runs/wins did the player produce) and defensive (how many baserunners/runs/wins did the player prevent) component.

On offense, there are various methods to use depending on what data is available. Remember that the first aim is identifying how many baserunners the player produced (including himself) and how dangerous those baserunners were. This typically means you start by using wOBA (weighted On-Base Average) and converting that into wRAA (weighted Runs Above Average). But OOTP, I discovered, uses VORP (Value Over Replacement Player) instead. It actually makes no difference because wRAA and VORP are both expressed in runs and both compared to the league average. But, of course, we only have VORP going back to 1995. What to do with those missing years?

It turns out that while we were not given VORP, we do have RC/27 (Runs Created per 27 outs) for every HoF player going back to the beginning. This is very fortunate because:

VORP = (player Runs Created) - (league avg Runs Created*replacement factor)

Handy-dandy. But how to turn RC/27 into RC? And how is the "league avg. Runs Created" calculated? First, we'll cover RC. It's tempting to think that,

RC (for entire league) = Runs (for entire league)

but that's not the case. The numbers are similar, but remember that we're using "runs" and "wins" just as expressions to keep us talking the same language. Besides, for a runner that scores from second on a base hit, he gets some credit for the run and the hitter gets some credit, too. But before that, if a defender kept him from turning a triple into a double with a superior defensive play, he is going to get defensive credit for making a baserunner less dangerous...so you can see how it would line up over an entire season. Luckily, we can say this:

RC = A*B/C, where A = on-base factor, B = advancement factor, and C = opportunity factor

Thanks to digging around on the Internet, I found one formulation used for A, B, & C is:

RC = (H+BB)*TB/AB+BB, which can be factored down to,

RC = OBP * SLG * AB

Bingo!! Give me some of that because I've got all those digits. Now how many outs per player in a season? Meh, a little bit tricky but still straightforward:

Outs = AB - H + SF + SH + GDP

So all the ABs that were outs plus everything that was an out but not included in ABs. Now I need to know what "league average RC" is and I'm good to go. Well breaking down VORP further, now that we know about the whole "outs" thing, it looks like this:

VORP = (player Runs Created) - (league average Runs Created for the same amount of outs*replacement factor)

So if we gave an average league player the same number of outs as our player, how many runs would they create. Well taking the entire number of RCs for a league and dividing by the entire number of outs for a league gives us the Runs Created/Out for a league average player. So,

VORP = (player Runs Created) - ((league avg RC/out*# of player outs)*replacement factor)

Awesome! So what's this replacement factor? Put simply, if you don't use a replacement factor all of the VORPs in the league would add up to zero. That makes no sense because we are trying to compare our guy to a replacement level player. So the league average is multiplied by a value that represents how good, percentage wise, we consider a replacement level player to be. I got my hands dirty here and figured out that from 1995 - 2003 (the years we have full data) OOTP is using a replacement factor of 0.8 with a variation of around 1% either way. Works for me!

VORP = RC - RCavg*OUTSplayer*0.8

Assumption: Now when I went back beyond 1995 there was no way to calculate the league-wide AVG/SLG/AB nor the total number of outs (since we don't have SF, SH, or GDP). So I skipped ahead to the output and looked at the league avg RC for every full season back to 1995. It went from a high of 0.218553 in 2001 to a low of .188134 in 1999. That's a pretty low variation and I would probably have been safe just picking 0.2, but I used trend analysis instead and distributed the same numbers similarly over the years back to 1973 so we have a representation of high offense and low offense years.

Position WAR - Defense

On defense, everyone is familiar with using ZR. For its defensive component, WAR uses UZR. If you don't have ZRs this stinks, but if you do have ZRs then you're in luck because:

UZR = SUM(all ZRs for the player)

So an outfielder may have, over the course of a season, ZRs at LF, RF, and CF. Add them all up and you get UZR. And, if you look at the origins of ZR, those numbers are expressed in number of runs prevented over/under average at a position (with a 0.0 ZR being average). But we don't even have ZRs before 1995. Honestly, this was stumping me. All I could think to do was take any years with ZR and extrapolate them backwards, or assign an average ZR to every player. In the end, though, it didn't matter because ZRs in OOTP mostly swing between +10 and -10 with the large majority in a normal distribution from around -3.5 to +3.5. It also didn't matter because...

WAR = VORP + UZR / 10

Yep. Remember that the aim is to objectively compare players with every other factor held neutral. The purpose of WAR was to take all of the numbers that were in terms of "runs" and abstract them one more layer. That extra layer was termed "wins" and the statheads worked out that even though the exact number fluctuated a bit from season to season, it was consistently true that roughly 10 runs = 1 win. This lets me stress a lot less about not having pure ZR numbers from before 1995 as we are only talking differences of around +/- .3 WAR.

Geez, more?

Well, now we have to take the step that Keating recommended. Because VORP and WAR and UZR are all cumulative statistics, a player's career WAR is just the sum of all the WARs from his individual seasons. That means that when we are presented with HoF candidates their WAR could be the result of 15 seasons or 10. Doing an average WAR or a WAR/season doesn't make sense statistically because, again, WAR is cumulative.

Keating suggested that when you are looking at HoF candidates what we should be considering is not how many years they were able to pile up stats. We're looking for the cream of the crop, right? Well, the cream of the crop is typically found in the All-Star game every year (not always but we'll get to that). So, why not figure out the bottom threshold of WAR for being an All-Star (i.e., elite) baseball player and compare candidates against that (i.e., subtract that from every season's WAR and ignore negative years when adding up the new number since we're only concerned with "elite" seasons). He called it WAAS (Wins Above All-Star)

So I went back and gathered the WAR for every single player on every All-Star squad from 1995 to now. Then I took the 20th percentile of their WARs for every season (and also 10th and 15th for comparison) and then I did a weighted average on those. What I found out was that,

MBWBA All-Star threshold = 2.0 WAR

The figure that Keating arrived at for MLB is 2.5 WAR, so we're in the neighborhood. Now, what about all those lame players that get named to all-star squads just because they're famous (or because they play for the Cardinals)? It's not really an issue because if the player I'm considering for the HoF actually had an All-Star year but just didn't get voted to the team (or he got blocked because there was a glut of talent at the time at his position) he still gets credit for having an "All-Star" year if his WAR was above 2.0. The effect of this can be seen in two examples from our current HoF:

RF Peter Pete played 7 seasons from 1973 to 1979. 3B John McNecirty played 15 seasons from 1973 to 1987. Comparing them on WARs:

McNecirty: +48.4 WAR
Pete: +33.2 WAR

But, it turns out that while 6 of the 7 seasons we have on the books for Pete were "All-Star" level seasons or greater, only 11 of McNecirty's years reached that threshold. On the new stat and more focused picture emerges:

McNecirty: +18.9 WAAS
Pete: +22.3 WAAS

So Pete only needed 6 seasons at his "elite" level of production to top McNecirty's 11 years at his top level. This not only shows us that 1/3rd of McNecirty's seasons were less than worthy of All-Star, but even his seasons where he was All-Star level it was just barely. He's a player who played around bare-minimum All-Star levels for long enough to get voted into the Hall of Fame.

So that's that. I'll post the numbers for current HoF members next, but I think I put all the crunching above for everyone to ignore or dissect. Any comments or questions on the methodology just post them here and we'll discuss. Maybe I'll adjust some factors, too, based on feedback.

LambeauLeap · Post by **LambeauLeap** » Mon Feb 06, 2012 3:12 pm

My eyes hurt.

nverhoev · Post by **nverhoev** » Mon Feb 06, 2012 4:39 pm

Awesome. Really interesting.

jumpmancol · Post by **jumpmancol** » Mon Feb 06, 2012 4:58 pm

I love it!

cheekimonk · Post by **cheekimonk** » Mon Feb 06, 2012 5:58 pm

LambeauLeap wrote:

My eyes hurt.

Sorry, bro. I knew I'd get carried away...

cheekimonk · Post by **cheekimonk** » Mon Feb 06, 2012 6:34 pm

WAR (Wins Above Replacement) is used to compare players. It is position, team, park, and era independent. A variation of WAR called WAAS (Wins Above All-Star) is suggested by ESPN's Peter Keating.

Unit is "wins" but it doesn't literally mean the number of wins a player created. It is just a relative comparison and a recognition of the theory that the progression of "critical" things done by players is:

(The teams' goal is to win. They win by scoring runs. Runs are only scored by baserunners.)
Baserunners created by player (inc. themselves, how they advance other baserunners, and how valuable the baserunners are that they create) -> Runs generated/prevented by player -> Wins created by player

Pitching WAR

Based on FIP (Fielding Independent Pitching) which builds on the recognition that pitchers can only 100% control strikeouts, wild pitches, walks, and home runs.

FIP = ((13*HR)+(3*BB+HB-IBB)-(2*K)) / IP) + constant (the constant is typically 3.20 and is only used to put FIP on a similar scale as ERA)

FIP is converted to an expression of "runs allowed/prevented" and then to WAR. OOTP provides WAR for pitchers going all the way back to 1973.

Position WAR - Batting

WAR includes an offensive (how many baserunners/runs/wins did the player produce) and defensive (how many baserunners/runs/wins did the player prevent) component.

OOTP uses VORP (Value Over Replacement Player) which is in terms of RC (Runs Created):

VORP = playerRC - (league avgRC * replacement factor)

OOTP does not have VORP before 1995, but does provide RC/27 (Runs Created per 27 Outs)

RC = A*B/C; where A = on-base factor, B = advancement factor, and C = opportunity factor

One formulation of A, B, and C is:

RC = OBP*SLG*AB

From this a player's RC can be calculated, but:

RCavg = (RC by entire league) / (Total number of outs in league) = RCavg/Out

then...

RCavg = RCavg / Out * playerOUTS

This is the number of runs an average player would have created given the same number of outs as our player. For "replacement factor" (which gets us to "runs created above replacement player") OOTP is using a replacement factor of 0.8 with a variation of around 1% either way.

VORP = RC - RCavg * playerOUTS * 0.8

Position WAR - Defense

For its defensive component WAR uses UZR (Ultimate Zone Rating):

UZR = SUM(all ZRs for the player)

ZR, and so UZR also, is already in terms of runs prevented/allowed. So:

WAR = VORP + UZR / 10; (given that 10 runs = 1 win)

WAAS

A player's career WAR is simply:

WARcareer = SUM (all WARs from each season)

WAAS (Wins Above All-Star) sets a minimum threshold WAR for All-Star level performance. For MLB he suggests +2.5, but for MBWBA I suggest:

WARallstar = +2.0

Therefore,

WAAS = SUM (WAR - WARallstar for each season); negative seasons - when the player was below All-Star performance - are ignored since these are HoF candidates

jiminyhopkins · Post by **jiminyhopkins** » Mon Feb 06, 2012 9:29 pm

An admirable effort. However...

This is how I know I will never win in this league. Its so far beyond my understanding. I am a baseball fan, not a math fan. At all.

But well done, for those who have the fortitude to understand such things.

cheekimonk · Post by **cheekimonk** » Mon Feb 06, 2012 10:38 pm

jiminyhopkins wrote:I am a baseball fan, not a math fan. At all.

Is this even possible?

bschr682 · Post by **bschr682** » Tue Feb 07, 2012 12:40 pm

nicely done

scottsdale_joe · Post by **scottsdale_joe** » Tue Feb 07, 2012 10:31 pm

jiminyhopkins wrote:An admirable effort. However...
This is how I know I will never win in this league. Its so far beyond my understanding. I am a baseball fan, not a math fan. At all.
But well done, for those who have the fortitude to understand such things.

Yup, it is well done.
But hogwash so far as it being critical to winning.
I pay attention mainly to BA, OBP, and RBIs although I hate lots of Ks too. I do think VORP is indicative.
For pitchers I look mainly at overall success: Ws vs Ls, Ks to BBs, ERA, I consider WHIP but not VORP for pitchers.
All the other new-age Bill-James Billy-Beane Sabermetric type statistics I pretty much ignore. They give me a headache.
Oh, and fielding is VERY important. Discount it at your peril.
And player and team strategies can do a lot.

The Brewster

The Brewster