F1: Are today's cars more likely to finish each race?

As an F1 fan, it seems like the cars are growing more reliable (despite the FIA requiring engines to rev higher, and components to last long). It also seems like there are fewer accidents than there used to be. I'd like to see if those perceptions are correct. I'll do so by looking at a dataset of F1 data that covers 1950 through 2017.

I'll cut to the chase by showing this figure, which communicates both the absolute and relative results of races over time, then I’ll show my work below.


Confirmed: modern F1 cars have fewer DNFs.

Far more cars are making it to the end of each race both in absolute and relative terms, and we see that the decrease in DNFs is driven by improvements to mechanical reliability, a reduction in accidents, and predictable performance reducing problems with qualifying.


Below are the contents of the Jupyter notebook I used to make the figure above.

import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
sns.set(context='notebook', font_scale=3, style='ticks', palette=sns.cubehelix_palette(rot=-.6))
results = pd.read_csv('~/git/f1/data/results.csv', prefix='result')
status = pd.read_csv('~/git/f1/data/status.csv')
races = pd.read_csv('~/git/f1/data/races.csv')
rename_races = { 'name': 'raceName', 
                'url': 'raceUrl', 
                'date': 'raceDate', 
                'time': 'raceTime', 
                'year': 'raceYear'}
cleaned_races = races.rename(rename_races, axis=1)
wide = pd.merge(results, status, on='statusId')
wide = pd.merge(wide, cleaned_races, on='raceId')

The dataset has 134 possible statuses for each driver in each race, which is far too granular for my purposes. I'll reduce cardinality by creating a new column with simplified statuses of Finished, Accident, Mechanical, DNQ, and Misc.

def simpler_status(result):
    if result['status'] in [ 'Finished'] + ["+{} Lap{}".format(x, 's' if x > 1 else '') for x in range(1,50)]:
        return 'Finished'
    elif result['status'] in ['Accident', 'Collision', 'Spun off', 'Tyre puncture', 'Puncture', 'Collision damage',  'Injured', 'Injury', 'Eye injury', 'Fatal accident']:
        return 'Accident'
    elif result['status'] in ['Driver unwell', 'Retired', 'Not classified', 'Disqualified']:
        return 'Misc'
    elif result['status'] in ['Did not prequalify', 'Did not qualify', '107% Rule']:
        return 'DNQ'
        return 'Mechanical'

wide['sStatus'] = wide.apply (lambda row: simpler_status(row), axis=1)
annual_results = wide.groupby(['raceYear', 'sStatus']).agg({'sStatus': 'count'})
annual_results['sStatusPct'] = annual_results.groupby(level='raceYear').apply(lambda x: x / x.sum())\

annual_summary = annual_results.unstack(fill_value=0)['sStatus'][['Finished', 'Mechanical', 'Accident', 'DNQ', 'Misc']]
annual_summary_pct = annual_results.unstack(fill_value=0)['sStatusPct'][['Finished', 'Mechanical', 'Accident', 'DNQ', 'Misc']]
labels = ['Completed Race', 'Mechanical Failure', 'Accident', 'Failure to Qualify', 'Other']

f, ax = plt.subplots(2, 1, sharex='col', figsize=(25, 15), gridspec_kw = {'height_ratios':[5, 1]}, constrained_layout=True)
ax[0].stackplot(annual_summary.index, annual_summary.T, baseline='zero', labels=labels)
ax[0].legend(loc='upper left')
ax[0].set_xlim(left=1950, right=2017)
ax[0].set_ylabel('# of race results')

ax[1].stackplot(annual_summary_pct.index, annual_summary_pct.T)

f.suptitle('Formula 1 Race Results by Year (1950-2017)')

sns.despine(fig=f, offset=10)


  • The underlying dataset can be obtained here.