Friday, February 29, 2008

Stat Primer

I know the level of discource on here (writing quality not withstanding) is somewhat higher than what your average fan is used to. Therefore I thought it would be handy if I went over some stats I reference.


Batting Average (BA): This used to be the #1 place a person would look to see how good a player is - not so anymore. Batting average's main flaw is the level to which it is luck dependent. How many times have you seen a guy go 0 for 4 in a game with four line-drives hit right at people; or go 3 for 4 with two bloops over the infield and a "seeing-eye" grounder just by the shortstop? BA is important in that a guy who is able to consistently hit for a high average (like Albert Pujols, Vlad Guerrero, or Ichiro) was good bat-control skills, which makes him a better hitter. An "empty" batting average occurs when a guy hits .300 (or some other high mark) but that goes along with few walks and little power (Juan Pierre) which severely dampens the overall effectiveness of the player as an offensive force. BA gives you some info. about a player and - all else being equal - it's better for it to be higher, but it doesn't tell you too much about a player's overall value.

BA: [.300 and above is good, .250 and below is not.]

Batting Average on Balls In Play (BABIP): This goes right along with the luck aspect of batting average. It is calculated by [Hits - Home Runs] / [At Bats - Home Runs - Strike Outs + Sacrifice Flies]. It shows how often a play that is put in play by the batter ends up being a hit. The league average for BABIP is around .300. That means that a player who hits .330 but has a BABIP of .400 has probably gotten lucky ( a lot more balls are falling in than can be expected) and his BA is likely to go down. Similarly, a player who hits .250 but has a BABIP of .210 has been unlucky, and his BA is expected to go up. Batters do have some control over this statistic (Ichiro, for example, has higher a BABIP because he uses his speed to beat out infield hits) and so it is useful to look at a particular player's career BABIP numbers, but in general it serves as a useful way to look at BA in some context. There are many studies looking at which factors go into BABIP so that a better understanding of lucky/unlucky can be developed.

BABIP: [.340 and above is good, .230 and below is not.]

Line Drive Percentage (LD%): LD% is important since line drives result in hits much more often then ground balls or fly balls and so a better hitter (Pujols) who hits more line drives is likely to then have a higher BABIP (and thus BA, all else being equal) that isn't as luck-based. Intuitively; if a guy is always hitting the ball hard he's probably a pretty good hitter.

LD%: [.200 and above is good; .150 and below is not.]

Home Run per Fly Ball (HR/FB): It's a measure of both power and luck. If a player sees a drop in HR/FB then it could indicate a loss of power. By actually looking at the distances of balls hit, this can be corroborated (if all HRs were shorter) or possibly negated (if there were several balls that just missed, by still a good number of long ones). On the other hand, a HR/FB rate much higher than a player's established career baseline could indicate a flukey HR season that won't be repeated.

On-Base Percentage (OBP): This is, in general, the most important statistic for determining a player's value to an offense. The objective of the batter is to not make an out - outs are a scarce resource, and as soon as you use up 27 (usually) you don't get any more chances to score runs. OBP shows you how good a player is at not making outs. Having a high OBP (via walks) also indicates a patient approach and good pitch-recognition skills. These things allow a hitter to not only walk more often, but to be better able to identify good pitches to hit (swing at strikes, not at balls) and not to make "bad" outs (bad in the sense that they have little chance to become hits - pop-ups, easy grounders, and of course strike-outs). Team OBP correlates very highly to scoring runs, as the first part of scoring is getting guys on base.

OBP: [.360 and above is good; .330 and below is not.]

Slugging Percentage (SLG): This is, in general, the second most important statistics for determining a player's offensive value. It is a way to measure the efficiency of a players at bats (not plate appearances). The higher the SLG, the more bases a player accumulates and the closer he is to scoring a run (and is better at driving them in). Hitting a HR results in at least one run automatically; a triple will generally score all runners on the bases and put the batter just 90 ft. away from home plate himself; and so on. Therefore, a player with a high SLG is able to contribute to the second thing necessary to score runs - advancement.

SLG: [.450 and above is good; .400 and below is not.]

Triple Slash (BA / OBP / SLG): Showing all three stats like this gives a quick picture at the hitter's skills. .300/.340/.370 is a Juan Pierre type hitter; .240/.380/.500 is a Adam Dunn type; .260/.310/.480 is a Mike Jacobs type; and so on.

Isolated Power (ISO): It's SLG minus BA, and gives an idea of how much power a player has by looking at the number of extra bases per at bat. A guy can have a SLG of .450 as a product of hitting .350 with a lot of singles and only a little bit of power (.100 ISO), or he can hit .250 with a lot of doubles/triples/homers (.200 ISO). League average is around .150. [Baseball Prospectus has a modified version of ISO that counts triples the same as doubles, since stretching a double into a triple is more a product of speed than power.]

ISO: [.215 and above is good; .110 and below is not.]

On-Base Plus Slugging Percentage (OPS): While not completely, OPS does a great job of quickly giving a player's total offensive contributions. High OPS means a player is good in both areas of getting on base and advancement, or exceptional at one of the two. It does weight OBP and SLG the same, which is an issue.

OPS: [.800 and above is good; .700 and below is not.]

OPS+: The "+" indicates that the player's OPS has been adjusted for context (their home park, for example) and then compared to the rest of the league. An OPS+ of 100 indicates league average. An OPS+ of 110 means a players adjusted OPS is 10% above league average. This allows players from different eras (when pitching was dominant vs. when hitters had the upper hand) by comparing them to everyone they played with. It still has the same weighting issue as OPS.

Walks, or Base on Balls (BB): A player with a good walk rate can maintain a high OBP even when his BA is low (whether because of ability or bad luck). As mentioned with OBP, patience and pitch-recognition are valuable all the time, not just when taking a base on balls. BB% is the rate at which a player walks.

BB%: [12% and above is good; 7% and below is not.]

Strike-Outs (K): In general, a K is just like any other out. Players who K a lot can still be very productive hitters. A high K rate does make it harder to have a high BA as there are less balls being put in play that can result in hits. Also, a K does not help advance runners, whereas a ground ball can (though it can result in a double play also - one out is bad; two is much worse). For younger players a high K rate could indicate a problem in their swing (can't handle inside pitches, for example), poor pitch recognition (swings at breaking balls in the dirt), or a propensity to swing for the fences (which tends to show up in how often they pull the ball also). These things can keep a player from having success at the major league level if they aren't corrected. K% is the rate at which a player strikes out.

K%: [13% and below is good; 23% and above is not.]

Stolen Bases (SB): Stealing bases at a rate below 70% actually costs a team runs. When looking at the value of having a player (starting at first base with none out) at second with none out or on the bench with one out, being caught has a change in magnitude (of expected run scoring) 70% larger than being safe. This can be found by looking at a run expectancy matrix. For 2005 it looked like this:

--- 0.5165 0.2796 0.1075
1-- 0.8968 0.5487 0.2370
-2- 1.1385 0.6911 0.3502
12- 1.4693 0.9143 0.4433
--3 1.5120 0.9795 0.3718
1-3 1.8228 1.1830 0.4931
-23 2.0363 1.4144 0.6073
123 2.3109 1.5279 0.7485

So with a runner on first and no outs, a team from 2005 scored an average of 0.8968 runs. (These values tend to be pretty close in recent years. In 2006 it might have been .9001 runs or something like that.) Having the runner steal second successfully will result in E(R) going up by 0.2417 to 1.1385. If he is caught, it goes down by 0.6172 to 0.2796. Therefore a player needed to be successful 71.9% of the time (0.6172 / [0.6172 + 0.2417] ) for it to be worth it. [It's actually been more like 74% recently.] Guys that are gooding at stealing bases help their team by running selectively; guys that aren't good at stealing bases should usually stay put (even if they're fast).

Runs (R) and Runs Batted In (RBI): I mention these as they give an idea of how a hitter is doing, even though they say very little about his actual abilities. Not particularly good hitters can have 100 RBI (Jay Gibbons) or score 100 runs (Juan Pierre) almost entirely because of opportunities they get. Jay Gibbons hit 4th and had many chances to hit with men on base. Juan Pierre bats first and so gets on base more times than other hitters (even though his OBP is lower) and so has more chances to have someone drive him in. They are very team dependent stats, and tell you little about a player's value.

Weighted On-Base Average (wOBA): It combines OBP and SLG basically using linear weights.
The nice thing about baseball is that everything that happens on the field has happened that way hundreds of times before (in a non-context dependent sense - a single to right; not a player's 3000th hit on a single to right). That means we can look and see how much any given event helps a team score runs. For example, on average, a double increases a team's expected runs scored by about 1.08 (relative to making an out). For a walk it's about 0.62 runs. Linear weights multiplies the value of each event by how many of them a player accumulated. Tom Tango (and co.) took these values for a bunch of stats, scaled them to OBP, and then divided by plate appearance to get a rate stat. That way, wOBA is on the same scale as OBP (since they're linear), so a wOBA of .340 is about average. Expect it gives a more complete measure of offensive prodiction.

wOBA: [.370 and above is good; .300 and below is not.]


Hits Allowed (H): This can give some idea of the effectiveness of a pitcher, but just like BA for offense it is also dependent on luck, as well as a team's defense. A pitcher with a high BABIP against is giving up more hits than he should (the BABIP should be compared to the team average, not the league as this normalizes it for the defense he pitches in front of). A pitcher who has a good season based on giving up few hits while having a very low BABIP should be expected to regress in the future. Some pitchers, through deception or pitch selection or what have you, have been able to keep their BABIP lower than would be expected (by giving up less line drives, generally). Thus, it is a good idea to look at career BABIP also.

BABIP: [.275 and below is good; .325 and above is not.]

Walks, or Base on Balls (BB): A pitcher with high walk totals is not only going to allow more men on base, he is usually going to be less effective when not walking guys and he'll have to throw more pitches (and thus pitch less innings). Pitchers whose pitches are not very effective themselves (they have bad "stuff") need to limit their walk rate (BB/9 - walks per nine innings pitched) to be able to have success at the major league level. A high BB/9 will not doom a pitcher, but it makes the margin for error in other parts of his game very small. If a guy can't get on base he can't score.

BB/9: [2.5 and below is good; 4.5 and above is not.]

Strike-Outs (K): Every batter that a pitcher K's is one that can't get a hit. A high K rate (K/9) means a pitcher is being successful more often than not, and a low K rate means that a pitcher is in trouble without some other saving grace. Like BB/9, a bad K/9 can be made up for - it requires very good control (low BB/9) or the ability to get ground balls (to prevent extra-base hits). Many sinker-ball pitchers have low K/9 but are still successful - it is just a fine line that they walk with their control and infield defense being very important.

K/9: [7.5 and above is good; 5.5 and below is not.]

Walks and Hits per Inning Pitched (WHIP): This is a quick way to see how many runners a pitcher is allowing. Don't give up hits and don't walk guys, and it makes it tough for the opposition to score.

WHIP: [1.20 and below is good; 1.40 and above is not.]

Home Runs (HR): A pitcher who gives up a lot of home runs will, in general (Johan Santana seems to be doing OK), give up a lot of runs. Keeping the ball in the park forces a team to put together strings of hits (and walks) to score runs, which is much more difficult. This is park dependent though, as giving up 1 HR/9 at the Cell in Chicago is very different from giving up 1 HR/9 in Petco.

HR/9: [0.70 and below is good; 1.35 and above is not.]

Ground Ball Percentage (GB%): You can't hit a grounder for a HR, so it helps in that respect. All extra base hits are less frequent on groundballs, in fact. It results in more outs than line drives, and also more double plays. Brandon Webb would appreciate having a good defensive infield behind him.

Earned Run Average (ERA): ERA is dependent on luck - not only how many hits a pitcher gives up but their distribution. Also, the earned run vs. unearned run disparity doesn't make that much sense anymore and so it isn't even a good measure of how many runs a pitcher gives up. Another problem occurs when relief pitchers allow runners a starter puts on to score. It is park and defense dependent and generally not very useful for saying much about a pitcher's abilities. It does give a general idea though.

ERA: [3.50 and below is good; 5.00 and above is not.]

ERA+: Like OPS+, this adjusts ERA for home park, and compares it to the league average.

Feilding Independent Pitching (FIP): Looks at the things a pitcher controls more directly (strike-outs, walks, and home runs) and scales it to ERA. It takes out a lot of the luck from balls in play (though not flukey HR rates - there's xFIP which gives expected FIP given a regressed HR rate).

FIP: [3.50 and below is good; 5.00 and above is not.]

tRA: "Developed by Graham MacAree... tRA involves assigning run and out values to all events under a pitcher's control and coming up with an expected number of runs allowed and outs generated in a defense and park neutral environment. tRA is on a R/9 scale and does not involve any regression of the rates so while it should be more useful at determining a pitcher's true talent level, the best method for pitching projection is to use tRA*, the regressed version of tRA." -

tRA is calculated in a manner similar to wOBA in that it assigns run values to different events (a K saves about 0.11 runs, while a BB costs about 0.33 runs, a line drive allowed costs about 0.38 runs, and so on). Then the number of runs a pitcher is expected to give up is divided by the numbers of outs he is expected to get, and then multiplied by 27 to give the expected numbers of runs allowed per 9 innings. So one can think of it kind of like ERA (except without the whole earned/unearned thing so it's about 8% higher).

tRA: [3.75 and below is good; 5.50 and above is not.]

tRA+: Adjusted tRA, like OPS+ and ERA+.

FanGraphs also has info. on what pitches a guy throws, how often he throws them, and their velocity. And - for both pitchers and hitters - the percent of pitches thrown in and out of the strike zone; the percent (also split out) that are swung at; and the contact rate (also split out). They are putting out a ton of useful stats.


This is the part of the game where the usefullness of stats is still catching up. There are a lot of places to go to get data regarding defensive value, including Plus/Minus from the Fielding Bible; Ultimate Zone Rating (plus arm and double play turning stats) from FanGraphs; the Probablistic Model of Range from BaseballMusings; the Fans' Scouting Reports complied by Tom Tango at TheBook; and a whole host of other sources.

No comments: