26 March 2010

JA Happ and Building Bridges

from Sports Illustrated
 

I was watching the Braves-Phillies game on ESPN Wednesday, and JA Happ took the mound against Tim Hudson. Seeing Happ reminds me of a lot of things wrong in the advancement of sabermetrics. It's not his fault. He really has nothing to do with it, but his performance is a symbol of it.

Phillies fans love him, and why wouldn't they? The 27-year old comes in and goes 12-4 with a 2.89 ERA in 166 innings, and everything baseball fans have ever been told portrays this performance as a masterpiece of sorts. It's hard enough to have an ERA under 4 in the majors, and this guy just had an ERA under 3! If he had done it after just three starts, Philly fans would have known that he probably couldn't continue doing such an impressive job, but after 23 starts, it becomes more difficult to believe that he isn't that good.

Of course, sabermetricians have other ideas. They've come to realize that wins and ERA are context-dependent and don't properly value a pitcher's performance. Wins do somewhat measure a pitcher's performance (he is the one pitching), but they also depend on run support, how long the pitcher pitches, defense, etc. ERA is similar. It does reflect, somewhat, a pitcher's performance, but it's also dependent on defense, where hit balls fall, scorer's decisions, etc. So when sabermetricians see the above statistics, they've been trained to be wary.

Instead, they use things like FIP that are based on K/9, BB/9, and HR, which Voros McCracken determined (through rigorous research) that are the only statistics that pitchers have control over. The idea being that a pitcher's stuff determines how many bats they miss (K/9), his command determines how many batters get on strictly because of him (BB/9), and his stuff and tendencies determine the amount of fair balls struck that a defense can't get to (HR). According to FIP, Happ isn't such a good pitcher. His FIP (which is made to look like ERA) was 4.33, a full run and a half higher than his ERA. This disconnect between the sabermetric and traditionalist perspective causes a little rift between the two groups in a few ways.

1) Sabermetricians generally attribute the gap between ERA and FIP to "luck", and oh, does that cause problems. Here in America, we don't like "luck". We get what we get because we "deserve" it. Of course, the world doesn't really work that way, but when it comes to criticisms of ourselves or of our team's players, don't even go there (I wonder how many Philly fans agreed Kevin Millwood was having a fluky season but disagreed about Happ). The problem is that the difference wasn't necessarily due to luck, per se. Part of it was probably due to Philadelphia having an above-average defense (that was intended), thus making plays on balls in play that other defenses wouldn't have (Thus, with other pitchers on other teams, the runner reaches base and has an additional opportunity to score). Part of it was due to Happ's out of average .270 BABIP. But even that doesn't mean he was "lucky". He might be able to hold hitters down to that level, though it's very unlikely, but we won't know for a few more years. Regardless, "luck" really isn't the right word, though there likely is some luck involved. Instead, let's try "due to difference in team support and chance". "Luck" implies that Fortune smiled on Happ giving him an unfair advantage, but "chance" doesn't have a connotation and implies that it just happened to end up the way it did, which is what most sabermetricians mean anyway (language is a funny thing).

2) What's a small sample size? Most fans realize that a pitcher won't pitch the same every time out. If he throws a complete-game shutout, he likely won't do it again the next time out. Most people even realize that a month's worth of starts don't properly indicate what a pitcher is. But what about a full season? It seems so long that bounces should even out, right? 162 games of 54 (sometimes 51 and sometimes more) outs will likely see everything even out, right? Sorry, but it doesn't. There's been a lot of research done, and there's a lot of fluctuation from year to year. Happ's 166 innings are impressive, but we need a couple more seasons to get a real indication of his performance (really more like an entire career).

3) Even if people understand all this, the popular comparison to Happ is Tom Glavine. Superficially, neither throws particularly hard and both have good change-ups. Statistically, Glavine wasn't all that impressive. He struck out 5.32 per 9 and walked 3.06 per 9 for an uninspiring 1.74 K/BB. But he had a way with home runs (0.73 HR/9). If he could do it, then why can't Happ (6.69 K/9, 3/21 BB/9, 2.08 K/BB) do it as well? The problem is that we don't really know. Glavine, however, was an exception, but unfortunately, we don't know why. He outperformed his FIP (3.95) by a half run (3.54 ERA), and pitchers just don't do that. Around 95% (someone correct me if I'm wrong) of pitchers' FIPs end up +/- 0.2 away from their ERA for their career, but Glavine seemed to have some ability to suppress runs. Finding this ability would help explain the difference (possible) between Happ and Glavine, but not having it leaves a hole in the argument. (To be honest, I've always considered Glavine the worst of the Big Three, and I've often wondered if he really wasn't a Hall of Fame pitcher and just a very good pitcher that benefited from a good team and almost no injuries, which allowed him accumulate counting stats in a manner similar to what Andy Pettitte could do). Happ could be that good, but the odds are against it.

4) Even knowing this, Happ will regress. Glavine, with his "special abilities", couldn't sustain sub-3 ERAs, and neither will Happ. Bill James and CHONE see major regressions to the 4.30 ERA range. A few things, however, can happen. One, Happ regresses to this level, and someone will claim that he got screwed on the luck scale (which may or may not be right). Two, he regresses a bit to the 3.60-3.70 range, and we'll need some further evaluation. Three, Happ repeats, and we're left to wonder whether he's that good or if he just received the positive side of "team support and chance" (even that phrase is somewhat colored; damn). In any matter, we won't really know. Unfortunately, you have to wait a few more years, and really an entire career, but we don't really like to wait that long even though it's the only way to gain historical perspective.

5) Traditionalists take this to mean that sabermetricians don't think Happ is a good pitcher, and that just isn't true. Happ's ERA indicates an ace, but he just isn't that good. He's a solid pitcher, and there's nothing wrong with that. It still makes him better than most major-league pitchers, and he's definitely better than 99.999% of all pitchers in the USA. Unfortunately in the effort to prove that Happ wasn't ace good, sabermetricians might have gone overboard. It's also very possible that Happ fans overreacted to a bit of criticism. It's most likely that a bit of both happened.

6) It's also entirely possible that Happ improves. Hitters improve -- learning the strike zone, what pitches they can hit, etc. --, and pitchers can, too. Happ could strike out a few more batters by a) learning a new effective pitch (he's apparently learned a cutter, which didn't help when Prado raked it for a home run but that's a really small sample size, isn't it?), b) re-ordering his pitches and learning to change speeds better, and/or c) throwing his pitches with increased consistency at a high level. He could also stand to walk a few less hitters (3.04 BB/9 is okay but leaves something to be desired). If he does this, his FIP will come down and his ERA won't elevate so much. This might cause a few problems, though. One, it doesn't mean sabermetricians were wrong about 2009, but Happ fans might not see it that way. Two, it demonstrates the weakness of statistics -- they can't predict new variables like that (yes, ladies and gentleman, intangibles exist, but yes, they will also be measured). Either way, improvement won't look good for sabermetrics, which really isn't fair.

7) The word "regress" is another nasty word. It has a few definitions. One is "to go back; move backward", and another is "to return to a previous, usually worse or less developed, state". Most fans identify these definitions with what sabermetricians mean by "regress". This, then, implies that Happ will somehow lose talent, and fans, rightly, don't see how that would happen. Of course, this isn't what sabermetricians mean. The other definition of regress is "to have a tendency to approach or go back to a statistical mean" This definition has no connotation like the other two do. Statisticians just mean that Happ's statistics will go toward a mean, or average, but regress doesn't necessarily mean that in a bad way. Ricky Nolasco will also "regress" this season, but his statistics will likely go the opposite way. Fans need to realize this. Sabermetricians also need to realize what most fans will think, and when they use the word "regress", they should also mention what they mean by that word. It helps. Happ's talent won't regress, in the sense of the first two definitions. It's just that the results were a little better than his talent implies.

My point in all this isn't to prove whether or not Happ is a great or just good pitcher. Other people who understand these stats better have done that, and I assume that a Google search will lead one to one of those explanations. The point I'm trying to make is that there are certain misunderstandings that occur between traditionalists and sabermetricians. In order to build bridges instead of burning them, we have to be constantly aware of what we say and their underlying meanings and connotations. It takes more work to do this, but to get somewhere in this little dilemma, we have to be willing to put in a little extra effort. 

No comments:

Post a Comment