This is a very long post. If you want to skip reading it, the point is – all is well!
(Now, stop thinking about animal house)
Seriously though, this is a long, data-based view of looking at ND performance. There is a lot of skepticism/negativity on this board and I don’t think it’s warranted. When I (or anyone, doesn’t have to be me) look at a data-based view of performance, there is great cause for optimism. I understand, as I’m sure everyone here does as well, that we tend to make judgments in life based on perception and piecing together our experiences – and rarely change those perceptions based on data. With that in mind, I don’t expect this post to convince anyone – and perhaps this post is no more than mix of a test to see how quickly people give up on reading a long post an ND fan-style Rorschach test. But I enjoy playing with data and the journey of looking at this data has been fun for me – perhaps some here will find it interesting as well.
Preamble (yes, this is long enough to need a preamble)
I’ve started my yards per play differential analysis at the start of the year trying to find an objective way to measure performance. After playing with data, looking at a variety of stats, I’ve come to believe it is an excellent indicator of how good a team is or is not. Using this metric to assess ND’s progress shows some good news – a pretty optimistic story actually. Let me show you some more the data and the thinking behind it.
The theory
For anyone who’s done a lot of data analytics, you’ll know that it’s very easy to get data to lie to you, particularly when you take small cross sections of data – there can be a lot of noise that creates misleading stats. I was looking for a simple, effective measure of team strength. I also wanted a way to adjust that metric for strength of schedule – as SOS plays a very significant role in how a team’s football stats would look.
I settled on yards per play differential. I had started off with total yards differential – inspired by omahadomer’s pts differential. I wanted to do yards because points can have significant swings due to things I thought were just as much luck as skill (like turnovers returned for TDs), granted better teams may do this more – but, in general, a team that moves the ball better will win games more frequently.
In looking at yards, I saw that the number of plays you ran had a very significant impact on some yardage totals. There was just a lot of variability – but I notice that the yards per play differential seemed to be far more consistent – after doing some adjustment for strength of schedule. It still addresses that core concept – of better teams move the ball better – and is simple, so I went with it. (I should also note that while this metric says ND is getting better, I started looking at this metric last year – well before I would know that it tells an optimistic story about ND)
Does the theory work?
Using yppd with a few adjustments (like home field advantage), I can “predict” winners to games about 75% of the time. It takes about 5-6 games into the season for the stats to start smoothing out, but it becomes a very good indicator (although, my strength of schedule adjustment needs to get better).
Not only can I predict winners, I can look at how people have performed in the past –and this shows a very strong linkage between yppd and overall winning %. Using my “yppd score” (which includes a strength of schedule adjustment), I get the following distribution of teams (using 2008-2011 data):
typpd score, % of teams win%
2.5, 4% , 93%
2, 9% , 87%
1.5, 19% , 77%
1, 33% , 69%
0.5, 48% , 63%
0, 79% , 50%
-0.5, 91% , 41%
-1, 96% , 27%
-1.5, 99% , 25%
-2, 99% , 12%
-2.5, 100% , 11%
The way to read this table – all numbers are a range. 0 is really -0.4 to + 0.4. 0.5 is really 0.5 to 0.9, etc. An example – 4% of teams get a yppd score at 2.5 or above – and they win 93% of their games.
The math here actually turns out to be pretty good. Thanks to another poster (I forget your handle, take credit for helping me out!), I have data from 2000-2007 as well. This data does not have my strength of schedule adjustment, but you can see that comparing yppd against top 25 finishing position shows this same story:
AP Final Ranking, YPPD
1 to 4, 1.8
5 to 9, 1.4
10 to 14, 1.1
15 to 19, 0.8
20 to 24, 0.9
If this doesn’t convince you of the validity of the statistic, nothing will – but I am a big fan.
What does this mean for ND?
For me, it means a lot of optimism. Why? Check this out. Since 2000 (thanks again other poster!), here is ND’s unadjusted yppd:
2000 (0.5)
2001 (0.6)
2002 0.3
2003 (0.4)
2004 (0.3)
2005 0.2
2006 0.2
2007 (1.3)
2008 0.2
2009 0.2
2010 0.5
2011 1.0
Using my “yppd score” (which would correspond to the tables above), ND has been:
2008 0
2009 0.3
2010 0.9
2011 1.3
Put differently, ND was in the top 79% of teams in 2008 & 2009 (although on the cusp of the top 48% in 2009), then in the top 48% of teams in 2010 (although, on the cusp of the top 33%), and is in the top 33% of teams in 2011 (although on the cusp of the top 19%).
This is a clear, significant upwards trend. We’ve gone from being decidedly average to just around the top 25 teams.
What I regret is that I haven’t found a way to get the yppd score to truly account for strength of schedule. For example, Houston has one of the highest yppd scores this year. But if I split their schedule into thirds, here is the average difficulty of each third:
1. top 4 0.7
2. top 8 (0.8)
3. rest (1.7)
Overall (0.5)
The overall average for D-IA is:
1. top 4 1.1
2. top 8 (0.1)
3. rest (1.2)
Overall (0.0)
Wow, so their schedule is easy – and their yppd score looks better than it should because of this.
Compare this with Alabama:
1. top 4 1.6
2. top 8 0.4
3. rest (1.2)
Overall 0.4
Wow, what a night and day schedule. Alabama’s best 4 opponents are all, on average, in the top 20% of teams. Compared to Houston’s toughest 4 being in the top 48% on average.
But strength of schedule is another reason I’m optimistic. Check out 2009 v 2011 yppd score SOS for ND:
2009
1. top 4 1.1
2. top 8 0.3
3. rest (0.9)
Overall 0.2
2011
1. top 4 1.6
2. top 8 0.1
3. rest (0.5)
Overall 0.5
Granted my yppd score is supposed to account for strength of schedule, but as it doesn’t go far enough and you can see how our schedule is harder across the board this year (and comparable to Alabama) – the 8-4 schedule doesn’t look as bad (I could go a lot more into how SOS impacts W’s… and it has a very significant impact on how many games a team wins, but this post is long enough already).
What about next year?
For fun, I did these same calculations with our schedule for next year, based on our opponent’s performance this year and came up with:
2012
1. top 4 1.6
2. top 8 0.5
3. rest (0.5)
Overall 0.5
Wow, this is a killer schedule – likely to be one of the toughest in the nation.
Why this data makes me feel good?
I know I get a accused of playing with data – but this data and analysis is very good, I promise ;)
Seriously though, data analytics has been part of my career and I’ve seen all sorts of good and bad analysis. If I were to be critical of what I’ve just shown you, I would say that SOS is not quite accounted for – and there is a chance that some noise exists around this. Otherwise, this is really good stuff. And the conclusions are very clear – ND is making good progress with Kelly. Is it great progress? No, I would say great progress would be more W’s. But things are moving in the right direction.
And I can also promise you that I’ve looked at more data than likely anyone on this board (or anywhere?) when it comes to evaluating a team’s performance based on numbers. I’ve looked at point differential, wins against different qualities of teams, yards, turnovers, contribution of a running-focused offense to winning %... and everything I look at tell the very same story that is here. We are making progress.
Why this data makes me feel sad?
Because we just might be killing a good thing. Go look at the yppd numbers for ND since 2000. Wow, we are clearly doing better now relative to the last 10 years. If you want, go check out point differential, two year win totals, strength of schedule, have at it – but you’ll get the same story.
I look at this board as the loudest ND fan base site on the net (props to the board ops). I don’t know if that’s the case, but it appears to be so. I just assume that some players, recruits, and maybe even coaches read this site. I also believe that, with time, people become what they are perceived as. And I am genuinely concerned that the negativity on this board could be the most likely to derail the team’s success.
Why?
For Kelly to be successful, players and recruits need to believe in him. It sounds cheesy, but I have never seen a leader be successful when people don’t believe. And I believe if the negativity on here continues to swell, I wager it will trickle into the program (if it hasn’t already).
Granted, we’ll never know if this happens, just like we’ll never know if Crist would have beat UM, USC, and Stanford. But – just in case, would you want to give some support, say – until signing day this year?
Because I can tell you next year will be a more difficult schedule than this and who knows what that will bring. But it would be some great irony if the school’s most passionate fans drive away a very capable coach because we were asking for too much too soon.
Final thought
For full disclosure, I’m a big believer that you win games when you out prepare the other team. For ND, this means recruiting, strength training, practice, player development etc. I think Kelly is building the program by making these work.
As fans, we don’t see this part of the program. What we do see are the W’s and the L’s, the playcalls, the helmets. But the data I’ve shown you here, which measures a lot this, shows that progress is happening. Now, is it happening due to luck? No (three goal line fumbles for TDs, no in our favor? When has that every happened?). It’s not happening because we’re playing easier teams. It’s not happening because we have a stud QB winning extra games for us.
Maybe, just maybe, it’s happening because we have a coach that identified and recruited talent (thank you Weis and Kelly) and have a coach that is developing talent and putting a program together (than you Kelly).
In any case, rather than calling our coach shanty or coach 30 or something else, I think a little more optimism (perhaps optimism is the wrong word – it’s just seeing it is as it is and showing some patience) may go a lot further than any of us can appreciate.
"yppd", or "yards per play differential, is the #1 non-scoring metric that that identifies a winning team (79% of winning teams won this metric). The y-score is a SOS-adjusted measure of yppd to measure the strength of NCAAF football teams.
