F1: The Parade Factor

A common criticism of modern Formula 1 is that some races are "parades" where the finishing order is essentially the same as the starting order. Such races are less exciting to watch, as they are essentially won in qualifying. As such, I'd like to identify which modern circuits are most and least likely to have on-track action, of any sort.

As such, I will create an "F1 Parade Factor" which is a measurement of the difference between the rank of the cars at the start and at the end of each race for each circuit used in the past decade. Specifically, I'll use the Kendall rank correlation coefficient to quantify this difference. A value of 1 means a perfect correlation between starting and finishing orders, a value of 0 means the rankings are independent, and a value of -1 would mean they're perfectly inverted. I expect to see positive values for all tracks, given that the Formula 1 starting grid is ordered by ascending qualifying times, but I'll be interested to see how much it varies between tracks.

This sort of metric can be difficult to parse, so in addition to calculating the tau for each circuit, I will plot it alongside a chart of how often cars have issues at a particular circuit, such as crashing or having mechanical problems.

Results

This method is promising, identifying a number of low-action Tilkedromes as the most parade-like, and identifying several of my personal favorite circuits as the least.

The Five Most Parade-Like Circuits:

  1. Istanbul
  2. Magny-Cours
  3. Rodriguez
  4. Bahrain
  5. Hockenheinring

The Five Least Parade-Like Circuits

  1. Albert Park
  2. Spa
  3. Fuji
  4. Baku
  5. Sepang

Note

  • The underlying dataset can be obtained here.

The Jupyter Notebook

import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
sns.set(context='notebook', font_scale=3, style='ticks', palette=sns.cubehelix_palette(rot=-.6))
results = pd.read_csv('~/git/f1/data/results.csv', prefix='result')
status = pd.read_csv('~/git/f1/data/status.csv')
circuits = pd.read_csv('~/git/f1/data/circuits.csv', encoding='ISO-8859-1')
races = pd.read_csv('~/git/f1/data/races.csv')
rename_races = { 'name': 'raceName', 
                'url': 'raceUrl', 
                'date': 'raceDate', 
                'time': 'raceTime', 
                'year': 'raceYear'}
cleaned_races = races.rename(rename_races, axis=1)
rename_circuits = { 'name': 'circuitName', 'url': 'circuitUrl'}
cleaned_circuits = circuits.rename(rename_circuits, axis=1)
wide = pd.merge(results, status, on='statusId')
wide = pd.merge(wide, cleaned_races, on='raceId')
wide = pd.merge(wide, cleaned_circuits, on='circuitId')
def simpler_status(result):
    if result['status'] in [ 'Finished'] + ["+{} Lap{}".format(x, 's' if x > 1 else '') for x in range(1,50)]:
        return 'Finished'
    elif result['status'] in ['Accident', 'Collision', 'Spun off', 'Tyre puncture', 'Puncture', 'Collision damage',  'Injured', 'Injury', 'Eye injury', 'Fatal accident']:
        return 'Accident'
    elif result['status'] in ['Driver unwell', 'Retired', 'Not classified', 'Disqualified']:
        return 'Misc'
    elif result['status'] in ['Did not prequalify', 'Did not qualify', '107% Rule']:
        return 'DNQ'
    else:
        return 'Mechanical'

wide['sStatus'] = wide.apply (lambda row: simpler_status(row), axis=1)
grid_results = wide[wide['raceYear'] > 2007][['circuitId', 'raceId', 'grid', 'positionOrder']]
circuit_corrs = grid_results.groupby('circuitId')[['grid', 'positionOrder']].corr(method='kendall').unstack()['grid'][['positionOrder']]
ordered_df = pd.merge(cleaned_circuits, circuit_corrs, on='circuitId').sort_values('positionOrder')
track_statuses = wide[wide['raceYear'] > 2007].groupby(['circuitRef', 'sStatus']).agg({'sStatus': 'count'})
track_statuses['sStatusPct'] = track_statuses.groupby(level='circuitRef').apply(lambda x: x / x.sum())
track_statuses_pct = pd.merge(ordered_df, track_statuses.unstack(fill_value=0)['sStatusPct'], on='circuitRef')
labels = ['Completed Race', 'Mechanical Failure', 'Accident', 'Failure to Qualify', 'Other']

f, ax = plt.subplots(2, 1, sharex='col', figsize=(30, 20), gridspec_kw = {'height_ratios':[1, 1]}, constrained_layout=True)
f.suptitle('Formula 1 "Parade Factor" by Circuit (2008-2017)')

sns.barplot(x=track_statuses_pct['circuitRef'], y=track_statuses_pct['positionOrder'], ax=ax[0], palette="rocket")
ax[0].set_ylabel('Kendall\'s Tau')
ax[0].set_xlabel('')

ax[1].stackplot(track_statuses_pct['circuitRef'], track_statuses_pct[['Finished', 'Mechanical', 'Accident', 'DNQ', 'Misc']].T, baseline='zero', labels=labels)
ax[1].legend(loc='lower left', frameon=True)

ax[1].set_ylabel('% of race results')
ax[1].yaxis.set_major_formatter(mtick.PercentFormatter(1.0))

ax[1].xaxis.set_tick_params(which='major', labelrotation=90, labelsize=30)
ax[1].set_xlabel('Circuit')

sns.despine(fig=f, offset=10)