The Brewster

Posted: **Fri Dec 07, 2012 7:59 pm**

Damn! This took a lot more effort than I anticipated. One thing I track for Carolina during the season that helps tremendously is the variance of my team stats to enhance the league rankings given in OOTP. When the 2007 season ended, I decided to run the numbers for every team and provide them for consumption. The spreadsheet for all of MBWBA along with graphs (explained below) for each club on separate tabs can be found at the link below:

MBWBA_TMstats

Here's a quick explanation if needed:

Unique Stats

Offensive stat: BB% - Takes the number of walks earned by a team and expresses the rate at which the team earns walks: BB% = BB / PA
Offensive stat: K% - Takes the number of strikeouts by a team and expresses the rate at which the team strikes out: K% = K / PA

Fielding stat: UR% - Attempts to provide another overall assessment of team defense by expressing the percentage of total runs allowed that were unearned runs: = UR% = (RA - ER) / RA

Variance vs. Ranking

So why care about variance at all? Let's say that a team ranked 22nd out of the 24 teams in MBWBA in stolen bases and also 22nd out of 24 in errors. The easy analysis is that the team has as much work to do in catching up to the rest of the league in stolen bases as it does in defense (errors). That's not necessarily true, however. Imagine you took every team in the league and plotted their number of stolen bases on a straight line from least to most, and then did the same for errors. It might be the case that there is generally a lot of room between teams on the stolen bases line but on the errors line the teams are clumped up very closely. It would be obvious, then, that the team in question is actually much worse in stolen bases (relative to the rest of the league) than in errors, even though they rank #22 in both. That's a very critical piece of information. As most of you likely know, stats that are widely spread out from least to most have high variance and stats where teams are clumped together have low variance.

How to analyze the team graphs

You probably expect a long explanation of normal distributions and bell curves here, but it's not really needed. Here is what I did for each team with each stat:

1. Calculate the league average for every stat (i.e., the mean).

2. Use Excel's built-in capabilities to calculate the standard deviation for every stat. This is, of course, the first step in calculating variance, but I don't need to go that far. The standard deviation calculation assumes that all of the teams' stats in one category fall into a normal distribution (most teams found around the average with a small number being much higher and a small number being much lower). The distribution itself could be tall and skinny or wide and flat, but regardless it turns out that if I start from the average and go n number of standard deviations below that I will have covered essentially the same percentage of teams as if I had gone n number of standard deviations above the average.

For anyone needing a primer, if you look at the stats spreadsheet you'll see that the standard deviation (STDEV) for each stat is different. The "number" of standard deviations is easily explained by considering the case where the average of a set of numbers is 100 and the standard deviation is given as 10. In that case, 90 would be -1.0 standard deviation from the average. 80 would be -2.0, 85 would be -1.5, 110 would be +1.0, 130 would be +3.0, and so on.

3. Given (1) and (2), I calculate the difference between a team's stat (say, number of strikeouts) and the league average.

4. Taking the result from (3) and dividing by the standard deviation from (2) gives the number of standard deviations away from league average for each team. This is the number that is plotted on the graphs (there is a correction done at this step where stats that are better the lower they are - ERA and WHIP for example - are normalized to make the graphs easier to read...in every case, GREEN = GOOD and RED = BAD).

The Real Skinny

Basically, the further your team is from zero the more extremely better (positive numbers; GREEN on the graphs) or extremely worse (negative numbers; RED on the graphs) the team was in 2007. For reasons I won't delve into, +/- 0.68 is considered average, +/- 1.50 means your team is either very good or very bad, and beyond +/- 2.0 is extremely good or bad.

If you see corrections that are needed or have questions, or just for discussion (including opinions on whether this is useful or not, or how to improve it), use this thread.

As I said, I do this for my team throughout the season and have it as automated as possible. I'll gladly provide that spreadsheet if anyone wants to do the same...there's just not enough time to do this for every team for posting or I would.

Posted: **Fri Dec 07, 2012 8:11 pm**

Awesome job Ben....

Posted: **Fri Dec 07, 2012 9:56 pm**

Awesome work - lot of RED in Chicago!!

Posted: **Sat Dec 08, 2012 1:16 am**

great writeup,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Posted: **Sat Dec 08, 2012 9:53 am**

Wow, eye opening work!! Not sure I even like my P's anymore..

Posted: **Sat Dec 08, 2012 9:57 am**

Great. Gives a nice graphical representation of our teams stats relative to other teams.

(Er, I think that's what it does. Despite being a baseball fanatic, I hated statistics classes.)

Posted: **Sat Dec 08, 2012 1:57 pm**

Al-Hoot wrote:Great. Gives a nice graphical representation of our teams stats relative to other teams.

(Er, I think that's what it does. Despite being a baseball fanatic, I hated statistics classes.)

Just to make sure it's not misleading, your team's graph doesn't actually show your stats relative to other teams. It shows your stats relative to the league average, while the stat ranking itself (on the first tab where all the stats are listed...towards the bottom) is relative to other teams. The plot on the graph is meant to enhance the meaning and usefulness of your team's ranking.

Posted: **Sat Dec 08, 2012 2:17 pm**

So, here is Hawaii's rankings in offensive team stats (number of standard deviations is below). I took this from the first tab and just added labels above the Tropics' line:

And here's Hawaii's graph for offensive stats:

If you look at the top image and you'll see that Hawaii ranked #21 out of 24 in HITS last season and #22 out of 24 in DOUBLES. Now look at the graph and find the "H" and "2B" (HITS and DOUBLES) lines. You'll notice that even though only one "spot" separates the Tropics' ranks in HITS and DOUBLES, the "H" line lands between -0.5 and -1.0 while the "2B" line is nearly to -1.50.

What does that mean? It means that though Hawaii ranked near the bottom in HITS, they were still on the bottom side of league average range...so a little improvement in HITS is likely to increase team ranking a LOT. Given that you're roughly within the -0.68 to +0.68 average region, though, it may make more sense to not pay attention to the very low ranking. Doubles, on the other hand, the Tropics not only ranked near the bottom of the league but are also nearly to the -1.50 line which is significantly low. What your opinion is on the importance of doubles I don't know, but if you were going to choose between focusing on improving HITS or DOUBLES, DOUBLES is the better choice by far since it's a significant outlier.

Hope that helps a bit.

Posted: **Mon Dec 10, 2012 8:27 am**

more confirmation that we suck

Actually, I love it

You put far more thought into things than I do!

Posted: **Tue Jan 08, 2013 10:28 am**

Giving some old features a re-read and felt this one needed a definite bump.

I'd love to see a similiar analysis for team year to year variance (to try to develop an 'improvement' metric - and also to correlate with year to year PYTH records to try and extrapolate what the most important stats to increase are), but I'm debating on whether or not I want to put the time in, haha.

Posted: **Tue Jan 08, 2013 1:39 pm**

agrudez wrote:Giving some old features a re-read and felt this one needed a definite bump.

I'd love to see a similiar analysis for team year to year variance (to try to develop an 'improvement' metric - and also to correlate with year to year PYTH records to try and extrapolate what the most important stats to increase are), but I'm debating on whether or not I want to put the time in, haha.

I offered to make my spreadsheet available, plus the one I used this once is at the link provided. The first tab contains the raw data scraped and pasted and the formulas in each cell. The one I use during the season only requires me to pull up the Frick League batting, pitching, and fielding reports, export each to my browser, scrape/paste into my spreadsheet and it's done.

The Brewster

So your pitchers suck? Do you know how badly?

So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?

Re: So your pitchers suck? Do you know how badly?