Rany moves quickly into one of the core areas of argument about the draft among fans today: College vs. High School players. Before you keep reading, think through your prejudices on this subject. It’s important to recognize them now before you rationalize them away later if you’re wrong. Do you think that high schoolers have a higher upside? That college players make the majors more often? It’s been a while since I’ve read these articles in detail so these write ups are a good review for me as well.
The starting point for Rany is to determine whether college or high school players make the majors at a greater frequency. His first chart shows that at every subset of 5 picks, college players reach the majors at a higher frequency than high schoolers picked at the same point in the draft. It’s a relatively even split in the dataset with 749 of the picks coming from high schools and 715 coming from college. (I know that doesn’t add up to 1,526 — the junior college players are left out right now.) Again, it’s a noisier version of the linear trend we saw in Rany’s first article. There are some random spikes and valleys but the higher the draft pick the more likely to make the majors. All the 1st and 2nd draft picks that were college players between 1984 and 1999 reached the majors. For high schoolers it was 75%. Those high draft picks have a pretty good success rate as far as reaching the majors.
The study continues by splitting up those same percentages into college vs HS for 1984 through 1991 and 1992 through 1999. It isn’t surprising that we see a noiser version of the same trend. The separation between college draftees and high school draftees making the majors is similar between each 8 year data set. We’re down to around 375 data points in each set now so the signal to noise ration is decreasing and we’re seeing the random variation have a more significant effect but Rany’s initial point on college draftees being a better bet holds true.
He arrives at Draft Rule 3 via the following data:
High School, 1984-1991: 41%
High School, 1992-1999: 39%
College, 1984-1991: 60%
College, 1992-1999: 57%The gap between college picks and high school picks reaching the majors, which was 19% in the earlier era, has shrunk all the way to…18%.
And hence, Draft Rule #3: College players are roughly 50% more likely to reach the major leagues than high-school players of equal draft caliber. This advantage has not changed over time. But wait you say, we already knew this. No one disputes that college players make it to the majors less frequently. Acknowledging this, Rany sets up to deal with the next premise in this debate: college players make the majors more frequently but more often as marginal role players.
We’re going to get a little mathy, so hold on. He begins by calcuating the WARP for each draft pick in each year they’ve played. That is to say there are 16 “bins” to dump data into each one accounting for the draft year and the 15 years after the player was drafted. The number 1-100 pick in 1984 would have 16 bins, Rany calls them Y0 through Y15, representing the years 1984 through 2000. The number 1-100 pick in 1985 would have 16 bins representing the years 1985 through 2001. Draftees from 1990 on obviously won’t be able to have the full 16 bins since the study was done in 2005. For each individual pick, the average WARP for each bin #1 (Y0) is added together and divided by the number of contributers. So if sixteen #1 draft picks accumulate WARP in Y6 (their 6th year after the draft year), then the WARP for #1 draft picks would be their totals divided by six. Similarly since only 6 #55 draft picks have reached their Y14, then that total WARP is divided by 6. The higher the bin number the lower the number of players we divide by because fewer players have reached that separation since their draft year. The goal being that this should show how much the players that make the majors contribute on average. If you sum up the totals for each bin 1 through 15 for each pick, that shows you the cumulative WARP that that pick contributes on average.
Rany singles out the first round draft picks and finds that a) they reach the majors quicker than their HS counterparts, b) they are more productive in the first 4 years, c) they are less productive over the 15 years as they tail off quicker. Since we’re talking about a set of players that is 3-4 years older than their high school counterparts, this isn’t an unexpected result. He also cedes the point that since our data set is so small at this point, it’s more difficult to draw conclusions but “exactly one-half of the #1 overall picks out of high school went on to become superstars.” I’m willing to agree with his conclusion that comes next. The #1 draft pick is so highly scrutinized and, prior to MLB pushing teams around, often the consensus best talent regardless of whether they are from high school or college. Hence, Draft Rule #4: In a year where there is a clear superstar talent available in the high school ranks, it is a perfectly acceptable draft strategy to select that player with the #1 overall pick.
As we approach the rest of the draft, picks 2-100, that draft rule doesn’t apply. Rany’s last chart shows that throughout the course of the draft college players out perform high schoolers made with the same pick. There are two data spikes (picks 31-35 and 46-50) where high school draftees are better but given the definitive separation elsewhere and that there’s no discernible reason why those spikes occur, I’d guess that’s random variation. If anyone is truly ambitious, you could look over the draft years at baseball-reference in those pick ranges and see if you can’t identify a few high schoolers that turned into Hall of Famers or future inductees. My guess is there’s some aberrant data points that are attributable to luck of the draw without affecting our overall conclusion. In picks #16 through #20, college players produced around 17 WARP over their careers on average while high school draftees produced around 12 — Peter Kozma, a high school draftee, was taken by the Cardinals at pick #18 this past year, in case you forgot.
The difference in contributions is staggering. For picks 2-70, the college players out produce the high school players by over 45%. After that, the difference is even greater with college players contributing 180% more than high school players drafted from pick 71 through 100. So not only did Rany find way back at the beginning of this piece that college draftees reach the majors more frequently but they’re better players too. For those of us, myself included, who clamor for “high upside high school picks that can be a tough pill to swallow. Rany enumerates Draft Rule #5: In the first three rounds, not only are college players about 50% more likely to reach the major leagues than high-school players drafted in the same slot, they produce approximately 55% more value over the course of their careers. This advantage is persistent at every point after the #1 pick.
I have to admit that’s some bitter medicine as I tote Colby Rasmus as the future of the Cardinals organization (along with Pujols). I like the risk reward combination that high school picks have to offer, but so far Rany’s study has shown that that’s a losing strategy unless I’m the Tampa Bay Rays (who had the first pick for 3 of the last 10 seasons — no other team has had it more than once.) Obviously, these rules can never ever be hard and fast absolutes. I’m not arguing for decisions made in a vacuum rather that decisions be informed on the greater overall trends of drafting.
Filed under: Sabermetrics, analysis













How does the study handle high school picks that go on to college? Example, I get drafted out of high school, go to junior college for two years and then get drafted by another club after than make the majors. Who gets credit? Or better yet, how much credit does each team receive?
Players are only considered as “drafted” when they’re signed. It ignores the instances where a player is drafted but unsigned — i.e. only the team that drafted them out of college would get “credit”.
so a guy like jeremy sowers who was an unsigned first rounder out of high school would count as a failed high school pick, but a successful college pick even though he is the same player.
Re: “Tough pill to swallow.”
Does this really disprove your feeling that HS players are “higher upside.” My take-home message is that college players accumulate 50% more WARP over the course of their careers, but this doesn’t neccessarily indicate the rate at which each draft model (HS or Coll) produces superstars, does it?
What’s the WARP for a superstar season? 10? How many single seasons of a 10 did each draft slot produce? What about for college vs HS? Maybe the college players are more likely to be reliable MLBers, but the HS players are more likely to account for those All-Star seasons?
Or does his analysis also get into this?
FGC - no he would only count as a successful college pick. Rany threw out all picks that went unsigned — it’s like that pick never happened.
sidd — I think he touches on that in a later installment but I’m not sure to what extent he delves into it.
Interesting data. I did an analysis a few weeks ago trying to figure out not just where the major league pitchers come from, but where did the best pitchers come from. I didn’t have access to WARP data, but used ERA+ as a measure of success. Looking at currently active pitchers with 1000+ IPs, those data seemed to indicate that a superior pitcher (>110 ERA+) was a high school draft pick 55% of the time, a college pitcher 24% of the time and an amateur free agent 21% of the time.
I know this was a small sample as there were only 38 active starting pitchers who fit the criteria, but I am curious as to whether WARP data would indicate the same tendency? I would also like to know if the WARP data for pitchers and position players showed a similar breakdown between HS and college.
i see a parallel in drafting high school players to signing high school football or basketball players which suggests to me that part of this is physical. 17-18 year-olds can change dramatically in terms of size and strength over the next 2-4 years. i suspect that is why drafting the physically more mature players out of college helps their relative success. the new transfer rules for ncaa baseball will, i believe, improve the quality of college baseball and increase the viability and desirability of drafting college players (20-21 year old college and juco players).
Interesting…one study I would like to see done that would help settle the higher upside debate a little better is just looking at all of the superstars throughout the league right now, and check out their roots (see if they’re a high school, college, or JuCo draftee or international signing).
I’m not sure if I understand the study above correctly, but it sound like it just takes average values. If that’s the case, college players look better in the last point because they’re more likely to reach the majors. With high schoolers, you could get more superstars, but also a lot more busts, making the average value lower, and thus making it appear as though high schoolers aren’t as likely to be superstars.
If one were to look at the roots of current superstars around the league, one might come away with a different view of things.
“If that’s the case, college players look better in the last point because they’re more likely to reach the majors.”
This is inaccurate. College players were a) more likely to reach and b) on average, provided more value. A greater number of players reaching the majors does not equate to a higher average WARP value.
“With high schoolers, you could get more superstars, but also a lot more busts, making the average value lower, and thus making it appear as though high schoolers aren’t as likely to be superstars.”
The players that don’t make it at all don’t contribute to the WARP value and aren’t a part of the denominator in averaging. They simply aren’t considered in this part of the study.
Thanks for clarifying that azruavatar; I wasn’t sure if I understood the study.
Still, I’m not sure if it still means college picks are better. College players could have averaged the WARP value of a solid regular, with a lot of players falling around this WARP. High school players could have included more superstars and also a lot of players that don’t have much value and the average WARP would be a lot lower.
Without looking into it too much, it seems to me especially that the top pitchers around the league are far more likely to come from from a high school background than a college background. Specifically, if we are looking at where the Cardinals usually end up picking, somewhere around pick #15 and below, if you look at the top pitchers in the league who were taken after that, I don’t think you’ll find too many from college.
This is obviously a quick and dirty list, but take a look at the background of some of the top pitchers around the league. Only one of them (Webb) was drafted or signed after his 21st birthday..
Erik Bedard was drafted out of Norwalk Tech University in the 6th round at age 20 by Baltimore.
Johan Santana was an international signing by Houston.
Chris Carpenter was drafted in the 1st round out of high school by Toronto.
John Lackey was drafted in the 2nd round out of Grayson County Junior College by Anaheim.
Jake Peavy was drafted in the 15th round out of high school by San Diego.
Brandon Webb was drafted in the 8th round from the University of Kentucky by Arizona.
Roy Oswalt was drafted in the 23rd round out of Holmes Community College by Houston.
Brad Penny was drafted in the 5th round out of high school by Arizona.
Cole Hamels was drafted in the 1st round out of high school by Philadelphia.
Matt Cain was drafted in the 1st round out of high school by San Francisco.
Fausto Carmona was a international signing by Cleveland.
Dan Haren was drafted in the 2nd round out of Pepperdine University by St. Louis at age 20.
C.C. Sabathia was drafted in the 1st round out of high school by Cleveland.
Josh Beckett was drafted in the 1st round out of high school by Florida.
Scott Kazmir was drafted in the 1st round out of high school by New York.
Something seems wrong with your/his data/comments.
“All the 1st and 2nd draft picks that were college players between 1984 and 1999 reached the majors.” and the fact that about 60% of the college guys in the first 100 picks made the majors would probably mean that almost none of the college guys in the top 100 picks after the first round made the majors. The same would hold true with the 75% comment for HS players in the first two rounds and the 40% of the first 100. For all those numbers to be correct is almost a statistical impossibility.
Doing a little more research it is obvious that not all 1st and 2nd round college picks from 84 to 99 reached the majors. Just looking at 1988 and 1991 the Cardinalls had 2 1st or second round college draftees in each year that didn´t reach the majors. (per thebaseballcube.com)
CCC - I think you are misreading “1st and 2nd”. Jazy means 1st or 2nd OVERALL, not 1st or 2nd ROUNDERS. At least, that’s how I interpret it.
Thanks sid for clearing up my poor reading skills!!!
here’s a look at the starting lineups in this year’s all-star game and how they entered professional baseball.
american league
ichiro- FA (japan)
jeter- high school
ortiz- FA (dominican)
a. rodriguez- high school
guerrero- FA (dominican)
ordonez- FA (venezuela)
p. rodriguez- FA (puerto rico)
polanco- junior college
haren- college
national league
reyes- FA (dominican)
bonds- college
beltran- high school
griffey- high school
wright- high school
fielder- high school
martin- junior college
utley- college
peavy- high school
so, out of the 18 starting spots in last year’s all-star game, only 3 players came from 4 year colleges while 7 were from high school and 2 more from junior colleges. now i know this ins’t the most comprehensive of research, but obviously high school players do have some pretty high upside.
“now i know this ins’t the most comprehensive of research”
Rany took 16 seasons of draftees and some 1500 data points. I’m not sure how you can debate the merits of what he finds via 18 players. If you want to question his methodology, that’s one thing but at the point you cede his methodology is correct, then you really can’t debate his his conclusions much.
AZ and FGC, you guys aren’t contradicting each other, and FGC is not contradicting Rany. Rather, FGC is just showing a snapshot that supports the question that both FP Slacker and I asked earlier - even if the TOTAL WARP that collegians put up is higher/similar to the TOTAL put up by HS players, can’t that still allow room for HS players to comprise the major portions of future stars? Which in turn supports the idea held by many of us (incljuding AZ) that HS draftees are higher risk/reward?
Put another way, maybe 5 college draftees currrently in the Cards system will go on to each post a total WARP of 50 in the majors, for a total of 250. Say, Mortensen, Henley, Jay, Perez, and Ottavino. From the same system, only 1 HS player makes it (Rasmus), but he puts up a total WARP of 250 all by himself. He’s a perennial all-star, while the others are the plodders. As I read it, this scenario could easily fit within Rany’s findings. Am I right?
The solution, of course, is not just to look at total WARP accumulated, but also at the mean WARP from each spot, and the variance in that mean. My working hypothesis would be that for each spot, the mean WARP for a HS player would be much higher, but that the variance in it would also be much greater.
“From the same system, only 1 HS player makes it (Rasmus), but he puts up a total WARP of 250 all by himself. He’s a perennial all-star, while the others are the plodders. As I read it, this scenario could easily fit within Rany’s findings. Am I right?”
This isn’t a correct interpretation but this is my fault as several people have asked a similar question. The average WARP is only an average of the players that MAKE the majors. So the other 4 players would at least have to get a cup of coffee in the majors to be included in that otherwise the average WARP from your scenario would be 250.
I guess it is possible that the high school players have a larger percentage of high-WARP players but you’d need some kind of breakout into bins of WARP produced so you could see the distribution curve of those players. FGC seems to be talking about would require some kind of a heavy binomial distribution where the players are clustered as either really good or really marginal without very many in the middle. While that’s possible, that doesn’t strike me as the most likely distribution.
I think Bill James argued that it was a pyramid scheme where the talent decreases by a factor from level to level (however, you want to define “level” — all-star to above-average to average to marginal). So that the mean point would be located closer to the average and marginal players. If you want to contend that HS players have a greater number of all-stars (for lack of a better term) . . . . . I guess I could see that, but I’ve never read anything to indicate that’s the likely answer. It seems a much simpler explanation that college players on the whole simply outproduce high school players. Quite honestly, that would a be a study all in itself. Looking at the composition of say the 1000 players that accumulated the most WARP in their first 10 seasons from the last 30 years.
But I’ll accept the fact that I don’t have any research to really refute the line of argumentation that there’s still higher upside from high school players.
AZ,
Yes, I see now that average WARP is the variable, not total. This make it less likely for HS players to provide the disproportionate number of stars, but still possible if they have the stars and scrubs type of contribution.
I think you mean a “bimodal” distribution instead of “binomial” when you say “FGC seems to be talking about would require some kind of a heavy binomial distribution where the players are clustered as either really good or really marginal without very many in the middle.”
Actually, though, it wouldn’t have to be. A right-skewed distribution with a mean that’s smaller than for HSers but a mode to the left and a tail that extends further to teh right would also do the trick. In your example of picks 16-20, imagine if the distributions centered on 17 for collegians and 12 for HSers, but the tail for one ran to 35 while the tail for the other ran to 55. It might be that 35-55 range where the all-stars lie. These kind of distributions happen all the time.
meant to say “…a mean that’s smaller for HSers”, not “a mean that’s smaller THAN for HSers”, obviously.
Damnit. Yes I absolutely meant bimodal.
The skewed distribution would be possible as well, and probably more likely than a bimodal one. That would be a nice study to read about. I don’t think anyone has looked at the distribution of the top level talent in terms of the draft. Rany touches on it but I don’t recall him getting into it very much.
az, chill out. i was saying mine wasn’t the most comprehensive, not the guy that wrote the article.