There might be other reports coming from the data, for instance on ballpark effects, etc. but we will see.
First, here is the executive summary: Except for GAP and AvoidK, all the ratings turn out to do pretty much what you expect them to do. For predicting wOBA, CON and AvoidK together account for about half the total effect. In an MLB-average situation, AvoidK should come out "in the wash", but in BBA that is not the case: in fact, it is almost as important as POW or EYE, which are of approximately equal importance. GAP is less than half as important as POW or EYE. Surprisingly, left-handed batters have an advantage above and beyond just looking at their vR ratings.
So the data I used was exclusively the vL and vR split data. For the vL splits data, I set all the ratings to their _vL version, for the _vR splits, I set them to their vR version.
So one thing many of you know, but some may not, is that in the player editor, for most hitting and pitching ratings, there is a cutoff at 100 (or 5, using 1 to 10 ratings). The effect, according to the player editor, for ratings below 100 is pretty much linear for all ratings (except for extremely low values of AvoidK). And they are also linear above 100. But the "slope" of the line for each are different. Based on the results of 2045, a cutoff at each rating=5 seemed to match up to the data, and the "slopes" of the lines were close to what the player editor shows. The following table shows what the player editor shows, and also what I calculated the relative slopes (for below 100 compared to above 100) to be for BBA 2045 (I set them to be same as player editor if they were not significantly different than player editor, but if they were significantly different I rounded to the est. slope for BBA 2045 stats):
Rating | Effect | Player Editor | BBA Est. |
---|---|---|---|
CON | Avg | x2 | x2 |
GAP | ebh/AB | x1 | x1.5 |
POW | hr/AB | x.5 | x.5 |
EYE | bb/pa | x.5 | x.7 |
AvK | k/AB | x1.5 | x1.5 |
Since these are piecewise linear, a stepwise regression on the result should generate useful results. So I did such a stepwise regression (look back to the pitcher effectiveness post if you want to see how a stepwise regression works), adding several other potential predictors:
1. batadv corresponds to when a lefty batter is going against a righty pitcher, or where a righty batter is going against a lefty pitcher.
2. b_S is set to 1 if the batter is a switch-hitter, b_R is set to 1 if a batter is a righty (we don't need b_L to be included because that is indicated when both other variables are 0).
3. I also added SPE since that is supposed to affect percentage of extra base hits that are triples.
4. I set alpha to .006 to try to focus on getting the most clear results.
5. As with pitchers, I did a "weighted" stepwise regression, weighting by number of plate appearances for each vL, vR split. Only players that had more than 25 PAs for the given split were included in the analysis.
_DEPVAR_ | _TYPE_ | _RSQ_ | Intercept | batadv | CON | GAP | POW | EYE | AvK | b_S | b_R | SPE |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AVG | PARMS | 0.55016 | 0.19622 | . | 0.018371 | . | . | . | 0.007718 | . | -.009767527 | . |
AVG | STDERR | . | 0.00265 | . | 0.000890 | . | . | . | 0.000862 | . | 0.002196009 | . |
bb/pa | PARMS | 0.68632 | 0.07249 | 0.008064 | . | . | . | 0.022982 | . | . | . | . |
bb/pa | STDERR | . | 0.00104 | 0.001377 | . | . | . | 0.000513 | . | . | . | . |
ebh/ab | PARMS | 0.37914 | 0.03239 | . | . | .007265693 | -.001734033 | -0.001217 | 0.002195 | . | . | . |
ebh/ab | STDERR | . | 0.00137 | . | . | .000400295 | 0.000290149 | 0.000425 | 0.000385 | . | . | . |
hr/ab | PARMS | 0.67868 | 0.01893 | 0.003364 | . | . | 0.009564971 | . | . | . | . | . |
hr/ab | STDERR | . | 0.00076 | 0.000930 | . | . | 0.000217765 | . | . | . | . | . |
so/ab | PARMS | 0.78522 | 0.34976 | -0.012657 | -0.006011 | . | 0.002401271 | 0.002931 | -0.042131 | . | . | . |
so/ab | STDERR | . | 0.00306 | 0.002715 | 0.001239 | . | 0.000735608 | 0.001028 | 0.001161 | . | . | . |
wOBA | PARMS | 0.56438 | 0.24563 | . | 0.014478 | .003133732 | 0.008726320 | 0.008183 | 0.006116 | . | -.008491819 | . |
wOBA | STDERR | . | 0.00348 | . | 0.001326 | .001028786 | 0.000715892 | 0.000964 | 0.001106 | . | 0.002446564 |
Batting average has a nicely high r-square value (55%) and is affected by three of the ratings: CON, AvK, and b_R (in a negative direction). Why it is affected negatively by being a right-handed batter is not clear to me, but, essentially, right-handed batters as opposed to left-handed or a switch hitter results in about a ten point drop in average. The other interesting result is that AvoidK contributes above and beyond CON alone. As most of you know, the CON rating is a calculated rating by the engine, based on the batter's BABIP rating, POW rating, and AvK rating. The CON rating is calculated so as to be a perfect predictor, all by itself, of avg. The fact that AvK contributes to it, above and beyond CON itself is not entirely surprising, however. This result seems to occur whenever a league has particularly high STU pitchers, relative to MBA "normal", which would seem to characterize BBA. A way of thinking about this overall is that a player with a 5 CON, 5 AvK would be expected to get a .196 average (the intercept value) in the BBA if they are left-handed or a switch-hitter, but only a .186 average if they are a righty. It would be about .026 higher for a 6 CON, 6 AvK batter.
bb/pa , as expected, is closely related to EYE. Every additional point in EYE increases walk rate by about 2.3%, and having a batting advantage against the pitcher adds another 0.8%. The r-square of 68.6% indicates that more than 2/3 of the variance of all walk rates can be accounted for by the rounded EYE value.
ebh/ab is not as simple as just looking at GAP, which replicates the finding I have had for every single analysis I have done in OOTP. First of all, the r-square is by far the lowest of all of these, at around 38%. Secondly, it appears to be affected negatively to some extent by POW and EYE, and positively to some extent by AvK.
hr/ab IS however as simple as looking at POW and batadv. A POW 5 hitter will hit, on average, about ten home runs a year, plus another 5 for each point in POW above 5. The r-square here is about the same as it is for bb/ab, at about 68%
so/ab is also more complicated than expected. Although the r-square is the highest of any of the measures here at almost 79%, it appears to be affected not just by batadv and AvK, but also by POW and EYE (high values leading to more strike outs), and by CON (higher values leading to fewer strike outs). Somebody with a 5 rating across the board here would be expected to get strike outs in about 35% of their at bats when they don't have the advantage against the pitcher, or about 33.7% when they do.
wOBA is the stat everybody has been waiting for, of course. Considering that this is a complicated stat with many predictors, getting an r-square of 56.4% is impressive, I think. All the stats come into play here, except SPE and batadv. SPE might show up once we have more data, but I am a bit surprised that batadv does NOT show an effect, since walks, strike outs, and home runs all do show an effect with it. Perhaps it is being masked by b_R, which shows a similar affect that it does with BA, and that more data will show that effect. Here, a righty with ratings of 5 across the board would only be expected to get a .237 wOBA (.246 for lefty or switch-hitter). The rest of the results are summarized in the executive summary above.
Thoughts? Is this close to what you all expected or quite different? Is there anything you find somewhat surprising?