As has been seen in the preceding, it is possible to extract information from both the race in which horses run (and their performances in that race) and from the horses themselves to assist in assessing form.
Any data point in isolation may give rise to misleading conclusions – which is what can happen with so-called yardstick handicapping in which a race is based on an assumption that one horse “ran to form” – but if you bring many data points together that likelihood diminishes.
Here, I would like to introduce a relatively new concept in handicapping, which harnesses both approaches and which may be termed “horse uplift”.
It works by measuring the degree to which horses tend to improve upon, or fall short of, their pre-race ratings when occupying prominent positions in the race under consideration.
“Pre-race ratings” can be problematical here in that the BHA – whose ratings I have been referring to thus far – do not publish performance ratings for all races or master ratings for all horses. You could calculate your own master ratings – a horse’s one defining rating at this moment in time – such as by identifying the peak rating in recent starts.
Timeform do, however, so for an illustration of horse uplift I will use their master ratings and The Derby at Epsom in 2019. A reminder that Timeform ratings are about 5 higher than the BHA ones for better horses as a result of the BHA figures having dropped over time.
Those figures include the pre-race Timeform rating for each horse, the average uplift on pre-race ratings for horses occupying those positions in the previous five years, and what that indicates as a standard rating for the winner of this race when added to pre-race rating and the amount in pounds a horse was behind the winner.
If you have been paying attention, you will know that those final figures need to be weighted according to each horse’s finishing position (1/N, where N is finishing position). The result is a horse uplift standard figure of 122.6 (see if you can arrive at the same).
Timeform initially rated Anthony Van Dyck 125 but edged him down to 123 in due course. The latter is equivalent to about 118 on the BHA scale, which is exactly what the BHA ultimately rated the horse’s performance in this race.
A couple of observations: most races will have a greater incidence of negative values in terms of uplift, as, regardless of what they have achieved previously, horses generally improve more when figuring prominently in a race like The Derby than in normal races; and the by-year figures vary between 119.5 (based on Masar’s win in 2018) and 124.7 (Wings of Eagles the year before).
There is a fair amount of variation in figures when using horse uplift, and when we get increased variation it may be sensible to increase sample size, by using horses further back in the field (that makes little difference here, as it happens) and/or races from further back. Always be wary of losing topicality by doing the latter, however.
This seems like an opportune moment at which to discuss briefly how master ratings – which are the inputs in this process – are arrived at.
Timeform pick from an array of recent performance ratings (i.e. a horse’s race-by-race ratings) in order to attempt to convey what a horse should be capable of now, and they layer some qualitative judgements on top of that.
The BHA introduce a horse on a particular figure and then allow that figure to decay, or increase, according to how that horse (and form associated with it) performs thereafter: in some cases, a horse will have a rating to which it has never actually run.
Both approaches have merit. There is, however, evidence that the “best recent rating a horse has achieved” can be improved upon by using all those performance ratings (such as the five most recent in this code in the last 12 months) and weighting them according to both the maximum figure – something which influences the Timeform approach especially – and all the other figures depending on how recently they were achieved.
This is another complex area, and I do not intend exploring it further in this context. Readers should, however, be aware that a master rating derived from a string of performances with ratings of, say, 100/90/95/100/100 (most recent effort first) should be viewed differently to one with ratings of, say, 70/90/80/100/50, despite them both having the same maximum value.
Readers should also be aware that a formalised approach to rating a race at the time at which it occurs lacks one essential ingredient: it does not know how well or how badly that race will work out.
We take great care with our initial assessment, hoping to get it “right”, but the best handicapping systems are dynamic and respond to new information as it comes in, though not in an overly-sensitive way. This is elementary information analysis: in the face of uncertainty, you should adjust your assessments if useful information thereafter appears.
This is what is known as “back handicapping”, or “retrospective handicapping”, and it is why, for instance, Timeform will have reduced their initial assessment of the 2019 Derby in the light of subsequent events.
Anthony Van Dyck was beaten in his five subsequent starts, though he did run creditably in three of them, while Madhmoon won one of his next three, but at Group 3 level.
Japan won his next three, including the International Stakes at York, and is now Timeform’s highest-rated horse from the race on 126. But if you rated The Derby “around” what Japan did after Epsom you would be giving others in that race far too much credit.
Back-handicapping can be formalised according to strict rules, also, though that is challenging and this is an area in which human judgement is likely to continue to be influential. Horses can run well or poorly for a multitude of reasons, some of which should have little impact on a previous assessment of that horse.
Rate the race at the time, be aware of the likely parameters between which that rating falls, and tweak your assessment as it is tested against reality.
Back-handicapping is crucial to a properly-run handicap. Besides anything else, without it standardisation will become self-perpetuating rather than fluidly responsive to real-life events.
The preceding is intended as an introduction to the subject, if one that tackles some fairly advanced issues. For instance, there is more that could be said about the effect of field-size on performance, about pounds per second (from which pounds per length can be derived) under different circumstances, and about the need to interpret events in a more touchy-feely manner when significant non-completers or significantly unlucky horses are concerned.
Whether you choose to take handicapping seriously, lightly, or leave it up to others, I would recommend some sort of form of interaction with the numbers that underpin horseracing analysis. They force you to take a clear and quantifiable view, and they keep those views accountable and honest. You are likely to learn a great deal about what is going on even if you never go near a spreadsheet yourself.
Modern data techniques mean it is highly realistic to automate or semi-automate many of the procedures and to tackle racing in its entirety: indeed, that has already happened in some areas. For the rest, specialisation is recommended. Perhaps try rating all Group/Graded races first then see if you can expand your approach.
I will be back in the final part of this series next Wednesday with some more working examples from recent races.