Untitled Document

Large Scale Pace Study

(Note: This document was reconstituted in 2008 from a cached version, after AOL deleted the 'hometown' web page where it had existed for over 10 years. A graphic and some formatting and links have been lost. I hope to find the original document on an old computer or disk, but haven't yet.)

Contents:

Purpose
Methodology
Resulting Performance Curves
Comparison of Various Models
Discussion of Sources of Error
Conclusions

Purpose

I've had a strong interest in running pace versus distance curves since I started running in the mid-late 1970's. At the time, my "analysis" consisted of plotting my pace per mile in races versus the race distance, and plotting world records on the same graph, just to see how far I had to go! (Never got anywhere close, to say the least!) I developed empirical rules for the slowing with increasing distance, but lacked the mathematical experience to go much further with it.

I began seriously considering the problem again in 1991 after I started coaching high school track and cross country. This time, "armed" with a physics education, some computer programing experience, and 15 years or so of my own running records, I developed a computer model (see top of this page). I've been re-evaluating the program in light of new data from time to time, and this study is the biggest attempt I've made. The purpose is thus to learn as much as I can about performance curves, with the added "bonus" that I can use this data to evaluate my program and make any needed adjustments

Methodology

In an effort to get some really definitive information on the pace vs. distance relationship, and in the process test (and if necessary refine) my program, I conducted a rather large scale study of runners' personal records at various distances. I posted a request for such data in the rec.running newsgroup, which so far (as of June 21, 1997) has generated responses from only 4 people. I also had data for myself, my wife, a couple of people I have corresponded with via email. I also had data for the high school runners I coach, but these tend to run only a narrow range of (mostly less than 5 K) distances and progress too rapidly in many cases to compare spring track performances with fall cross country races; so, I omitted these, even though such data had played a large role in the development of Runpaces.

With less than 10 data sets collected on my own, I decided to make use of Steve Isham's wonderful Gopper's Running PR's page, on which well over 200 runners of a wide range of abilities from all over the world have posted their PR's. The standard entries on this page are for the 1 mile (or alternatively 1500 m), 5K, 10K, half marathon, and marathon, but many posters add times for other distances as comments.

Ideally, one would want all the runners in the sample to have provided a PR for each distance. In fact this is not the case here. I did manage to find a total of 115 runners (106 of them from Isham's web page) who had recorded PR's at at least three fairly standard distances. Of these, 85 had in common times for the 5K, the 10K, and either the mile or the 1500 meters.

The following are (all, I think) of the instances in which I "screened" the data:

Races run at a slower pace than a longer race by the same runner were omitted.

Times identified as splits in a longer race were omitted.

Generally, races identified as having occured more than about 10 years apart were omitted (though this rule was not followed with great care, and in many cases dates were unavailable).

A 4:10 marathon performance given with a 1:21 half marathon was omitted from the final analysis.

The 19+ 5K and 42+ 10K performances of a runner with a sub-5 mile were omitted.

A 3 mile time for one runner was extrapolated using a linear pace vs. log distance rule to a 5K.

The 10K PR and better 6.5 mile PR of one runner were combined (pace vs. log distance interpolation for the 6.5 mile, then averaged with the 10K) to produce that runner's 10K PR.

A few times were not precise and were adjusted to a most likely time. For example "xx" in times such as "18:xx" were given the value 30, to yield, in this example, "18:30", unless it was identified as a cross country or difficult course, in which case a faster time, like 18:10, was used. Times rounded of to the nearest minute were given an additional 15 seconds, so that a "3:23" marathon would become a "3:23:15".

One runner's 6:03 mile, described as "two weeks after marathon PR ... track, cold and windy" was adjusted to 5:53 to compensate for the adverse conditions.

A 4:23 mile described as a "won fairly slow tactical race 67,67,67,62" was adjusted to 4:20.

There were actually 113 runners. Two of these (including myself) had data sets from 12 or more years apart which were considered separately (hence 115 data sets).

The average date of birth and date started running (available for all but a few in the sample) were 1966 and 1984, respectively. Only 11 of the 115 data sets were those of women. Nine females were included in the final analysis.

The following table summarizes the number of runners providing PR's at each of the listed distances:

Distance 100m 200m 400m 800m 1.5K/1mi 3K/2mi 5K 10K 10mi H.M. Mara
115 initial 5 7 18 29 105 40 109 96 7 68 60
84 final 3 5 8 16 84 21 84 84 6 59 48

Interpolation/extrapolation schemes:

In 38 cases, a 1500 meter time was converted into a mile time by assuming a constant pace increase with logarithm of distance (linear relation between time/distance and log(distance)), interpolating along such a line from the 1500 to the next greater distance given (usually a 3000, 2 mile, or 5K). Experience has shown that this is a good approximation (very good for such a minor adjustment as going from 1500 to the mile) and yet amenable to easy analysis on a spreadsheet.

The 5 and 10K columns were virtually complete. It remained, though, to fill in the rest. In each case, an analysis was done on all the samples that had the desired distance (i.e. 300 meters) to determine the relation between this and adjacent distances. The log-based interpolation was used to get a comparison time by interpolation or extrapolation using adjacent distances. Then the average ratio of the actual time to the comparison time was determined and the process was used in reverse to "fill in the blanks". For example, to fill in missing 3000 meter times, all the samples with a 3000, a mile, and a 5K were collected. For each mile and 5K, a "predicted" (using the log relation) 3000 was calculated. These were all averaged and compared to the average of the actual 3000's. If, for example, the predictions averaged 0.998 times the actuals, then log-based interpolations to fill in the blanks were divided by 0.998 to simulate the curve, whether it's log based or not. Separate factors were determined for different interpolation endpoints, to allow for interpolation between the closest distances for which a good number of samples was available.

Extrapolation was a bigger problem than interpolation. The half marathon and marathon "blanks" presented a greater challenge, because different types (with regard to distance specialty) of runners might have different correction factors (such as the 0.998 in the example above) to the log-based extrapolation. To handle this difficulty, runners were sorted by ratios such as 10K time/mile time (typically between 7 and 8) and placed in groups. The factors used to adjust the extrapolations were then roughly appropriate to the "type" of runner. For the half marathon/marathon relation, a simple ratio specific to type of runner was used, averaged in with the log-based approach. Fortunately, over half the runners had either marathon or half marathon times anyway, and of course these were left alone!

The sprints presented a greater problem. The same type methods as described above were employed here, but with the smaller number of runners reporting these times, the results are less dependable. Still, I think the results are fairly good once the entire sample is used. Isolated instances of unrealistic times occured (such as a female runner with a predicted 100m time of 9.4 seconds!), but for the most part, it "all comes out in the wash".

Resulting Performance Curves

I sorted the 84 "finalists" into groups according to "ability" (actually 5K time) and "specialty" (determined by the ratio of 10K to mile time). The groups are as follows:

The fastest quartile (21 runners) by 5K times.
The fastest half (42 runners).
All 84 runners averaged together.
The slowest half (42 runners).
The slowest quartile (21 runners).
The shortest-distance specialty quartile (21 runners, those with the greatest 10K/mile ratio).
The shortest-distance specialty half (42 runners).
All 84 runners averaged together, again, for clarity.
The longest-distance specialty half (42 runners).
The longest-distance specialty (21 runners, those with the smallest 10K/mile ratio).
All 84 runners averaged together, one last time!.
A group of 51 runners collected to "weed out" poor marathon performances. (This will be described below).
The first table shows the average times for each set of runners. The second gives their speed at the indicated distance as a percentage of the 100m average speed.

Set 100m 200m 400m 800m 1mi 3K 5K 10K 10 mi H.M. Marathon

1 13.2 25.8 55.4 2:00 4:22 8:42 15:04 31:50 0:53:52 1:11:59 2:42:42
2 13.0 25.5 55.1 2:01 4:27 8:54 15:31 32:40 0:55:28 1:14:15 2:46:50
3 13.6 26.9 59.0 2:11 4:54 9:49 17:07 36:19 1:02:01 1:23:18 3:07:55
4 14.3 28.4 62.8 2:22 5:21 10:43 18:44 39:59 1:08:34 1:32:21 3:29:00
5 14.7 29.4 65.7 2:30 5:41 11:26 19:57 42:39 1:13:17 1:38:50 3:43:03

6 12.2 24.7 55.5 2:09 4:57 10:05 17:50 38:56 1:07:25 1:31:19 3:26:23
7 12.5 25.1 56.0 2:08 4:53 9:53 17:25 37:30 1:04:25 1:26:50 3:17:59
8 13.6 26.9 59.0 2:11 4:54 9:49 17:07 36:19 1:02:01 1:23:18 3:07:55
9 14.7 28.8 62.0 2:15 4:55 9:45 16:49 35:09 0:59:37 1:19:46 2:57:51
10 15.6 30.6 65.5 2:21 5:08 10:04 17:18 35:54 1:01:13 1:22:10 3:02:02

11 13.6 26.9 59.0 2:11 4:54 9:49 17:07 36:19 1:02:01 1:23:18 3:07:55
12 13.7 27.0 59.0 2:11 4:54 9:49 17:06 36:19 1:01:26 1:22:06 3:00:42

Set 200m 400m 800m 1mi 3K 5K 10K 10 mi H.M. Marathon

1 102.2 95.2 87.8 80.8 75.7 72.9 69.0 65.6 64.4 56.9
2 101.7 94.1 85.8 78.1 72.7 69.6 66.1 62.7 61.4 54.6
3 101.0 92.4 82.9 74.5 69.4 66.3 62.5 58.9 57.5 51.0
4 100.4 90.9 80.4 71.5 66.6 63.5 59.5 55.8 54.3 48.0
5 99.9 89.3 78.3 69.2 64.2 61.3 57.4 53.7 52.2 46.3

6 99.1 88.3 76.0 66.3 60.7 57.2 52.4 48.7 47.1 41.7
7 99.8 89.5 78.2 68.8 63.3 59.9 55.7 52.2 50.7 44.5
8 101.0 92.4 82.9 74.5 69.4 66.3 62.5 58.9 57.5 51.0
9 102.1 94.9 87.4 80.1 75.5 72.8 69.7 66.2 64.8 58.1
10 102.4 95.6 88.6 81.8 77.7 75.4 72.7 68.6 67.0 60.4

11 101.0 92.4 82.9 74.5 69.4 66.3 62.5 58.9 57.5 51.0
12 101.1 92.7 83.2 74.9 69.7 66.6 62.7 59.7 58.6 53.2

Discussion:

Sets 1 through 5 show that runners who are faster at 5K also tend to be faster at all other distances. While the data used for the shortest distances is sparse and therefore not very reliable, there is some suggestion that the very fastest 5K runners do not show an advantage at 800m and shorter over those nearly as fast, though this is, again, only a suggestion implied by the limited data. Also, not surprisingly, the faster 5K runners maintain a higher percentage of their maximum (i.e. 100m) speed as distance increases. At 5K, the fastest 25% maintain about 75% of their 100m speed, while the slowest 25% manage only 57%.

Sets 6 through 10 show profiles of runners with different "specialties", presumably resulting from some combination of genetic and training factors. The data clearly show that runners equal at one distance can be quite different in ability at another. This is no surprise, but it may be interesting to note the amount of variation here. For example, notice that while the extreme quartiles (sets 6 and 10) have very similar 3K times just over 10 minutes, they differ by about 10 seconds in the 400m, 3 minutes in the 10K, and 24 minutes in the marathon.

Unfortunately, training data was available for only a small fraction of the sample, so a meaningful analysis of the connection between specialization and type or amount of training is not possible here. It's probably a safe bet, though, that the typical long distance specialist runs greater weekly mileage than typical short distance specialist. A related question is whether long distance specialists are so because they run high mileage, vice versa, or some of both!

Finally, sets 11 (the original data for comparison) and 12 illustrate the results of an experiment I did on the data. With only a bit over half the sample having reported a marathon, I suspect it's likely that many of those who had may have run only one or two. I've never run one myself (did a 30K once), but from what I hear a lot can go wrong, especially on a first attempt. This suggests that the average marathon time in the sample may be "contaminated" by performances that were well short of the runners' potential. Personally, I think if I were to try I would be concerned only with finishing on a first attempt.

To "weed" out particularly bad marathon performances, I divided the marathon/10K ratio by the 10K/mile ratio and sorted the set by this index. I found that by eliminating 33 of the highest index runners and exercising some choice in order to keep the 5K time and 10K/mile ratio of the remaining group constant, I rather produced a new data set (12). Of course this data is "doctored", but I think it's worthwhile to consider that a sample of runners having completed at least, say, 3 marathons each might be more like set 12 than the original set, 11. For comparison with empirical rules that I've heard before, the original data has a marathon/10K ratio of 5.17, while the "doctored" data's ratio is 4.98.

I believe the 10 mile times are a bit too slow and that this is an artifact of the race being rarely run; only 7 of the total 115 listed a time for it and I wouldn't be surprised if even these had run it only once or twice. More will be said about this in the discussion of sources of error.

Comparison of Various Models

Numerous attempts have been made to model various aspects of the "running performance curve". Before discussing these in turn, it is important to clearly define the problem. As I see it, "running performance curves" fall into two general categories: (1) comparison of performances at different distances in general and (2) modeling of an individual's own running performance curves. is the latter of these that we are chiefly interested in here, but the former warrants discussion as well since it is sometimes used as if it were the latter.

First of all, there are a lot of frequently encountered and well known instances of the first category. These include decathlon and heptathlon points, sets of qualifying standards for certain track and field competitions, Purdy points, and Hungarian points. Any set of performance records for a large group of people at a certain "level" is essentially what these efforts model. An example is the set of world records at various distances. At other levels one finds national records, state high school records, etc.

The second category refers to the set of performances by an individual. A bit of thought quickly shows this to be a different problem from the first, and with a different solution. If the two problems were the same then the world best marathon performer might well also be the world record holder in the 100 meter dash. While this is obviously not the case and the phenomenon of specialization is expected, it may be tempting to use Purdy points, etc. to predict performance at one distance from those at another. Actually, as we will see, this can work rather well as long as the distances being predicted are close to the one being "plugged in", at least if these distances are close to the runner's "best" distance. Still, individual performance predictions are simply not the purpose of these models.

There is actually a bit more to this issue. An individual's performance curve could change somewhat over time, especially if the training focus is shifted dramatically. An 800 meter specialist in high school might later shift to high mileage training to run marathons and thereby change the "tilt" of his or her performance curve. The set of all this individual's performances over time would then look a bit more like the Purdy point type models than would that person's performance curve at any one time. The degree to which this occurs depends on the "nature versus nurture" (nurture here being mainly training focus) balance of the act of running. I don't have enough data to comment very strongly on this, but my own experience was that a major increase in my mileage from about 5-10 miles a week to 40-50 mostly slower miles per week (from the ages of 16 to 18) produced only a small change in my profile. Instead of being best at the mile, I became best at perhaps 2-3 miles - less change in specialty than I would have thought. Aging may produce some changes, but I've yet to collect enough data to quantify this (maybe that's next!). Overall, I believe a runner's performance curve is a stable enough entity to be worthy of modeling.

Now we introduce the models:

Pace (inverse speed) linear with logarithm of distance: time = distance/pace, where pace = a + b log(distance), with a and b determined from two input times. A couple of interesting plots for one individual's data can be found on Kevin Krisciunas' web page.

Race time proportional to distance raised to a power: t = a distance^b, with a and b determined from two input times.

Race time proportional to distance raised to the 1.07 power (a specific instance of #2, and found on Runner's World's web page and also, I think, on the Team Oregon Pace Wizard: t = a distance^1.07, with a determined from one input time.

Runpace, a model I've developed over the last few years in the form of a computer program. Can use from one to five input times along with other data, but here using force fit through two input times.

Purdy points - not intended as an individual performance predictor, but included for comparison as it is sometimes used this way. A calculator for these and other such models can be found on Patrick Hoffman's web page

Purdy1 points, a variant of the Purdy point scheme by Patrick Hoffman, fit to a different set of data (see Hoffman's web page for details)

I'd like to comment on the inclusion of my own model in this analysis, which I intend to be fully objective. I'm concerned that, since I do offer a version of my program for sale, that some might think that it's inclusion here smacks of advertising (there is a free version with the performance model fully functional, but with other features absent). To anyone with this impression I can only say that I wrote the program because I'm interested in this issue and I did this study because I'm interested in this issue. If the program worked it's because of my past efforts in this area. If it didn't I'd certainly try to change the model to fit the data, and not vice versa. On to the analysis!

Models 1, 2, and 4 are by nature curves through two given points. I chose to use as input the 1 mile and 10 K times. Models 3, 5, and 6 draw a definite curve through one input point. For this I used the 5 K time.

Runpace was used here in the "fit best two" mode, which forces only one solution up to nearly 10K. I also input 24 year old male, but this matters only with the "use all data option" and was irrelevant here. Over 10K, weekly mileage begins to have an effect. I decided a priori to use the following weekly mileages:

main group of 84 runners: 40main 51 (bad marathons culled out): 40
slowest 25%: 30; slowest 50%: 35; fastest 50%: 45; fastest 25%:50
shortest specialty 25%: 25; shortest 50%: 30; longest 50%: 50; longest 25%: 55.

The predicted (and input) times for all distances are listed for each model. Below these predicted times are the net errors (using predicted minus actual) in seconds per mile. The first table shows errors for the entire group of 84 runners.

Again: 1 = log, 2 = general power, 3 = 1.07 power, 4 = Runpace, 5 = Purdy, 6 = Purdy1

Model 100m 200m 400m 800m 1mi 3K 5K 10K 10 mi H.M. Marathon

Act 13.6 26.9 59.0 2:11 4:54 9:49 17:07 36:19 1:02:01 1:23:18 3:07:55

1 12.9 28.5 62.4 2:15 4:54 9:44 17:03 36:19 1:00:55 1:21:42 2:52:48
-11.3 12.5 13.6 8.1 ---- -2.5 -1.4 ----- -6.6 -7.3 -34.6

2 14.0 29.9 63.9 2:17 4:54 9:42 16:59 36:19 1:01:12 1:22:22 2:56:07
5.8 23.6 19.9 10.6 ---- -3.6 -2.6 ----- -4.8 -4.3 -27.0

3 15.6 32.8 68.9 2:25 5:05 9:55 17:07 35:57 0:59:49 1:19:55 2:47:46
32.4 47.1 39.9 26.6 11.4 3.2 ----- -3.6 -13.2 -15.5 -46.1

4 13.6 27.4 59.4 2:14 4:54 9:48 17:09 36:19 1:01:01 1:22:13 3:01:27
-1.0 3.6 1.8 5.3 ---- -0.5 0.5 ----- -6.0 -5.0 -14.8

5 11.9 24.4 55.4 2:10 4:54 9:51 17:07 35:58 0:59:59 1:20:07 2:48:48
-27.3 -20.8 -14.3 -2.8 -0.1 1.2 ----- -3.5 -12.2 -14.6 -43.7

6 11.9 24.6 55.0 2:08 4:59 9:58 17:07 35:26 0:58:47 1:18:53 2:53:21
-28.2 -18.6 -16.1 -6.8 4.9 4.9 ----- -8.6 -19.4 -20.2 -33.3

The graph below should aid in visualizing the above data. I've left out Purdy1 as it is quite similar to Purdy and the graph was cluttered enough already!

(I've lost the original graphic and will place it here if I can find it)

Next we consider how to give a more overall view of the correspondence between each model and the actual data. I decided to evaluate mean absolute error and mean net error (both in seconds per mile, the latter in the rows below the former) for three groups of distances. The first group is 1 mile through 10K, inclusive, but excluding the input data. In other words, for models 1, 2, and 4 this means the 3K and 5K, and for 3, 5, and 6 it means the mile, 3K, and 10K. The second group is the entire range from 100 meters through the marathon, excluding the input data. The third is the same as the second, but with 100m, 200m, and 10 miles omitted because of the scarce and therefore somewhat unreliable data at those distances. Note that the power in the "general power law" fit, as determined by two points, is given in parentheses.

For the main group of 84 runners, we obtain:

Distance group Log Gen Power 1.07 Power Runpace Purdy Purdy1
(here 1.097)
mile-10K: ABS 1.97 3.12 6.07 0.49 1.56 2.51
NET -1.97 -3.12 3.64 0.03 -0.79 2.39

100m-mara: ABS 10.88 11.36 23.89 4.27 14.02 16.41
NET -3.28 1.94 8.20 -1.79 -13.79 -15.32

400m-mara: ABS 11.26 11.33 20.89 4.62 11.43 13.56
(no 10 mi) NET -4.03 -1.18 2.25 -2.11 -11.10 -18.84

For the "bad marathons weeded out" group of 51 runners, we get:

Distance group Log Gen Power 1.07 Power Runpace Purdy Purdy1
(here 1.097)
mile-10K: ABS 1.88 3.04 6.11 0.59 1.64 2.61
NET -1.88 -3.04 3.46 0.25 -0.66 2.61

100m-mara: ABS 7.89 8.34 21.16 1.93 11.71 13.83
NET -0.84 4.43 10.28 0.86 -11.41 -12.67

400m-mara: ABS 7.33 7.83 17.75 1.68 8.38 10.52
(no 10 mi) NET -0.47 2.40 5.08 1.57 -7.96 -7.60

The second group's mile-10K data is almost identical to the first's, because they were matched to be similar in that range. Note the across-the-board improvement in the other two data sets, which include the "doctored" marathon times.

Considering each model in turn as applied to these large groups of runners, we see first that the log model has a very small mean net error, but does show significant mean absolute errors as it attempts to model the wavy data as a straight line. The general power law is somewhat similar, but not as good, as its slight concave-up appearance on the log plot opposes the slight convex-up bend in the actual data in the 800 meter to 10 mile range.

The 1.07 power law has much too shallow a slope for this data, in addition to sharing the same concave-up form that produces some of the general power law's error. This model's strength, particularly it's requirement of only one input performance, is also its weakness, as it will fit only runners in a narrow range of specialization. For the groups of runners above, the errors become very large as distances stray farther from the input performance.

Runpace has not only very small mean net error, but since its curves tend to match those of the data, it has by far the smallest absolute errors of all the models tested. Of course this is possible partly because the model is not nearly as simple as the others, which is why a program is needed to use it.

The Purdy and Purdy1 curves are not actually intended to model individuals' performances, as discussed earlier. Nevertheless, the fit is quite good over part of the curve. That part is the region immediately surrounding the runner's best distance, which in this case appears to be about 3K, though this of course depends on the particular population of competitors (i.e. a "long distance specialist" in a high school full of sprinters might be a 400-800 m runner in a school full of distance runners). Clearly, though, the Purdy curves, like any general performance curves, are definitely convex-up on the log plot and can match an individual only over a limited range. Since these curves have a different purpose anyway, they will be omitted from the rest of this analysis.

Next we consider the versatility of the different models as applied to runners of differing abilities and specialties. I will present here four data sets - those of the extreme quartiles for ability and for specialization. For brevity I show only the 1mi-10K and 100m-marathon mean absolute and net errors, omitting the large tables of errors for each distance.

First we look at the fastest (5K) 25% of the sample:

Distance group Log Gen Power 1.07 Power Runpace
(here 1.087)
mile-10K: ABS 1.45 2.26 2.99 1.04
NET -1.45 -2.26 1.67 0.01

100m-mara: ABS 7.89 7.32 12.79 3.99
NET -5.10 -1.37 2.43 -3.76

For the most part, the models do slightly better with this group than with the main group. The 1.07 power law is now at least reasonable, though it is still the least accurate of the four. The detailed data (not shown here) show that the 1.07 power curve is still too "flat", even for these runners, though it is definitely better for this group than it is for the main 84. Runpace is still the best in terms of absolute error, and shows virtually no net error from 1 mile through 10K, though the power laws now show the smallest net errors for 100m through the marathon (but with significant absolute errors).

Now look at the slowest (5K) 25% of the sample:

Distance group Log Gen Power 1.07 Power Runpace
(here 1.103)
mile-10K: ABS 1.60 3.13 8.61 1.38
NET -1.60 -3.13 3.77 0.75

100m-mara: ABS 14.36 17.74 36.02 8.03
NET -1.14 5.78 12.47 -0.12

Here, the log, general power, and Runpace models hold up about as before - just a bit more error in terms of seconds per mile, but considering the errors as a percentage of somewhat longer race times, the difference in performance between slower runners and faster ones is very small indeed. The order remains the same: Runpace best, log model next, and general power law third. The 1.07 power law is clearly inappropriate for these runners. The specific data have it predicting 80.3 seconds in the 400, versus the actual 65.7, and a 3:15:32 marathon versus the actual 3:43:03.

Next we consider the longest distance specialty (smallest 10K/mile ratio) 25%:

Distance group Log Gen Power 1.07 Power Runpace
(here 1.065)
mile-10K: ABS 2.76 3.46 2.20 1.35
NET -2.76 -3.46 2.15 -1.35

100m-mara: ABS 10.55 10.62 12.31 3.76
NET -3.09 0.14 4.19 -2.76

Here, the 1.07 power law is reasonable, on a par with the general power law (which here fit with a power of 1.065) and the still-somewhat-better log model. Runpace's errors are still much smaller than the others.

Finally we have the shortest distance specialty (largest 10K/mile ratio) 25%:

Distance group Log Gen Power 1.07 Power Runpace
(here 1.129)
mile-10K: ABS 1.17 2.87 11.14 0.99
NET -1.17 -2.87 5.14 0.99

100m-mara: ABS 11.19 12.27 35.83 5.66
NET -3.47 4.15 12.21 -0.08

The order of performance is the same as in all the other cases. The 1.07 power is not a good fit for these short distance specialists.

The nature of the errors (or at least the "mismatches" between the models and the data) can be understood with the help of the graph. The log model is of course a straight line on the graph, while Runpace is gently convex-up and the power laws are gently concave-up. This means that net errors (at least for the two-point input models) tend to be opposite in sign in between the mile and 10K as compared with outside this range.

Two striking features, which the simple models fail to match, are the upward curves at both the short (under 400 meters) and long (approaching the marathon) distances. On the short end, much of this is due to the effect of reaction and acceleration time in sprints. In fact, I'd suggest that these simple models could be extended better to the sprints by chopping a couple of seconds off the input, and then adding it back into the output I've done essentially this for Runpace and it's at least part of the reason it works well down into the sprint range.

The upward curve at the long end may be due largely to physiological phenomena such as glycogen depletion and even increased opportunity and likelihood of injury. As mentioned before, I suspect that some of the slowing in this range is due to these races being run relatively infrequently, and perhaps only once, thus missing out on chances to reach one's current potential. Runpace does have an empirical correction that largely produces the upward curve. The degree of this correction depends on weekly mileage, which is why some care was called for when running the program, even in the "fit best two" mode. Again, these values were decided in advance, not manipulated to fit the data.

Sources of Error

With at least three of the models fitting the data, reasonably well, any comparisons among them depend critically on the integrity of the data. I'm well aware that this is not a rigorously controlled study, but rather an effort to squeeze information out of what was not originally intended as scientific data. A discussion of error is certainly in order!

First of all, there is the expected variation in performance due to the weather, how the runner felt that day, how hard he or she tried, and so on. The hope here is that, with the large amount of data, these "random" types of error pretty much even out. Random errors "conspiring" to throw the results off would likely do so in a way that could not be modeled in a reasonably simple way. The success of any models at all argues for the assertion that random errors here have only a small effect.

The effect of such errors becomes more significant, however, if the number of performances at a given distance was small. This was especially the case with the 100, 200, and to some extent 400 meter distances, and also with the rarely run 10 mile. In the case of the 10 mile, which could easily be compared with the more frequently run 10K and half marathon, I believe the times simply weren't the best these runners could do if they had tried it more often (this said in absence of knowledge of how many attempts were made!). The 100, 200, and 400 could suffer from this effect as well, but since these are extrapolated outside the range of more reliable data, instead of being in between better established values as is the 10 mile, I can't be at all sure of it.

Perhaps more important than the issue of random error is that of systematic error. Such errors could cause fundamental "warping" of the "true" curve. In particular, consider the issue of whether the pace curve, as plotted against the log of the distance, should be essentially straight, concave-up, or convex-up. This question is best asked in the range of about 400 or 800 meters up through the 10K and possibly the half marathon. In this range, the data do show a slight convex-up nature, with times a bit slower than would be interpolated by the log model (and hence faster upon extrapolation), by only 4 to 5 seconds in the 3K and 5K as interpolated between the mile and 10K.

Following is a list of sources of error, first those which may have made the curve from 1 mile to 10K bulge upward (convex-up, like Runpace) and then those could cause it to sag in the middle (concave-up, like general power laws).

Tending to make convex-up:

Mile times were usually on tracks, whereas 5 and 10K races are often road races or cross country courses which may be hilly and thus slower, making the curve rise quickly from the mile to 5K, and then level off somewhat.

We might speculate that some of the "mile" times were actually 1600 meter times. Being only about 9 meters short of a mile, it occasionally gets called "the mile" at high school track meets (I'm guilty of this when I get lazy!).

Performances may span several years, during which the runner may have had a chance to specialize to some extent in both short and long races, thus tending a bit towards general performance curves such as Purdy's.

Tending to make concave-up:

Runners make more attempts at their favorite distance, increasing the likelihood of "having everything fall into place" for a good performance relative to shorter and longer distances.

Even when runners do run distances shorter or longer than their favorite, they may make errors in pacing due to the unfamiliar distance, thereby increasing their times relative to the races an intermediate distances

Some 5 and 10K races may have been mismeasured. Even if overmeasurement is as likely as under, chances are a runner will encounter a short and get a PR on it. Even properly measured courses often descend somewhat from start to finish. This will make the curve sag somewhat, especially if the 5K is run often.

Quantifying the effects of these is very difficult, especially when we lake details about what surface the races were run on, what conditions were, etc. In a very few cases, I made some adjustments when specific information was provided; all of these were discussed in the methodology section and constitute a very small fraction of the data.

In the case of convex-up #1, indeed, most road race and cross-country courses are "slower" than a track, sometimes a lot slower. Not all are, though, so a PR is likely to have been run on one of the "easier" courses, thereby reducing what would otherwise be a major "warping mechanism".

An effort can be made here to quantify convex-up #2. Thirty-eight of the 105 "mile times" were actually converted from 1500's, which are in this respect "clean". With an average date started running of 1984, when high school track programs were rapidly converting from imperial to metric, perhaps half of the remaining 67 runners actually did run a whole mile. Of the remaining 30 or 40 who actually ran 1600 meters, perhaps a bit less than half remembered to convert (by adding about 2 seconds). If roughly 20 of 105 runners mistakenly gave 1600 times as "miles", our "mile" times would be fast by a bit less than half a second, not a major source of error.

Convex-up #3 must be considered as a potential source of error. A high school 800 meter runner who changes training focus and intensity may provide times that could not have all been run in the same season. One's best 10K is not necessarily done in the same season as one's best mile. I believe this has some effect on the curve; however, as discussed earlier, I believe it is fairly minor, as genetics probably play a major role in determining specialty and, besides, many runners keep essentially the same training routine for most of their running careers (or at least haven't chenged it much since that average starting date of 1984).

As for concave-up #1, I believe this has some effect for most people's profiles. The effect is lessened, though, when the number of races is large. For example, someone who has run the 5K twice and the mile and 10K only once each is very likely to make their curve sag. Twenty 5K's with only 10 each of the mile and 10K probably has less effect, though, as after 10 races, one's PR's have tended to stabilize fairly close to one's potential (given training) at that distance. At least some runners in the sample have run a lot of races, even at their "less favorite" distances.

Concave-up #2 likely has some effect, thoughas with #1, the effect lessens quickly with even a modest amount of experience. Runners interested enough in PR's to post them on the internet may well tend to be those who have tried various distances enough to avoid beginner's mistakes.

Finally, concave-up #3 is reasonable, though perhaps most courses are well-measured. Downhill courses are not too uncommon, and may be advertised as "fast" courses on which to get qualifying times for other races. Whether or not a slight downhill or mismeasurement manages to outweigh the considerable effect of hills, though, is hard to say.

Arguments on either side regarding the track versus other surface problem should perhaps be considered in light of the likelihood that there is a gradual transition from track to other surfaces as we increae distance from the mile (almost always on a track), to the 3000 (still pretty much exclusively a track event), to the 5K (perhaps road race or cross cuontry for most, but commonly a track race in college, at all-comer's meets, etc.) to the 10K (a classic road race distance, but still common on the track at college level and above). The problems may still have effects, but I believe they will be mitigated somewhat, at least within the mile to 10K range.

All in all, it seems to me that the net result of these effects is quite small, though with the small differences between some of the models in the mile to 10K range, there's still a chance they could be decisive effects. For now, the most reasonable course seems to me to be to cautiously accept the data as is.

Aside from the subject of error in the general curve, there is the question of likely error when using a model on a particular occasion to predict an individual's time. Such data is not available here, as only personal records - not particular performances - were provided. I did do an analysis of the success of simple models in predicting an individual's PR's from his or her other PR's. This was practical here only with models simple enough to put on the spreadsheet. Runpace was not involved here because I would have had to run it hundreds of times!

This type of error shows best, perhaps, in the form of standard deviation of the errors. When predicting 5K times from 1 mile and 10K, using the log model, the mean net error is only -4.5 seconds, the median error -7.2 seconds, but the standard deviation is 27.5 seconds. Similarly, predicting the mile and 3K from the 800 and 5K, mean net errors were -2.8 and -5.4 seconds for the mile and 3K, respectively, with corresponding standard deviations of 3.3 and 6.6 seconds. Part of the improvement with these shorter distances is simply because less time is involved. Perhaps another factor is that they are more likely to have been run on tracks.

Another experiment consisted of using log-based extrapolations and also simple multipliers to predict marathon time from mile and 10K. This time the extrapolations contained correction factors to make the mean net error zero, which was also the case for the simple multipliers since I got them by dividing the average marathon time by the average mile or 10K time.

The best of these was a simple multiplier of 5.173 times the 10K time. Mean absolute error here was 10.0 minutes. About the same, but interestingly no better, was a mile through 10K extraoplation. Mean absolute error here was 10.3 minutes. A multiplier of 38.342 times the mile gave MAE of 12.9 minutes. It would seem from this then, that marathon times are inherently difficult to predict within about 10 minutes, regardless of how good the model is in general.

Conclusions

Pace prediction, at least for large ensembles of individuals, even if grouped by ability or specialty, can be done with considerable accuracy. Simple, two-point input models may work very well, at least for interpolation between distances in the 800 meter to 10K range. Outside this range, they tend to fail as changes in the performance curve may demand more complex models. An apparent, but uncertain considering possible error in the data, convex-up "hump" in the performance curve (as viewed on a pace versus log of distance plot) is modeled best here by the Runpace program, second-best by the straight line of the log model, and least accurately by power-law model, which is slightly concave-up.

Single point models, such as the 1.07 power law, have the attraction of requiring only one input performance, but at the loss of considerably reduced accuracy for most runners as by their very nature they assume what amounts to a particular distance specialty. Moreover, the 1.07 power law is appropriate only for about a quarter of the population of this sample, namely those who specialize at the longest distances. To a lesser extent, it is also somewhat appropriate for the fastest quartile of 5K runners. To be successful for most runners, a single point model needs some other input to help determine the runner's specialty. If one insists on a power law fit, the most appropriate one for the sample of 84 runners considered here is approximately the 1.097 power.

Some models, such as Purdy points, decathlon points, and many others, are not actually intended to predict an individual's running performance over a range of distances, but are simply ways of comparing performances (usually by different individuals) in general. Since no individual is likely to be equally good at all distances, these performance curves are strongly convex-up, touching an individual's best performance at one point and then being faster at all other distances. The difference between the curves is small enough, though, so that useful predictions can be made for distances in a moderately wide range surrounding the runners best event.

While individual performance curves can be modeled quite well, it should be remembered that on any given day, such predictions can be far off. Errors may be commonly less than 5 seconds per mile for track events less than 5K, but 5K road race predictions (at least in the case of PR's predicted from other PR's) are often off by up to 30 seconds (about 10 seconds per mile). Marathons may be difficult to predict to within 10 minutes, or 20-30 seconds per mile.

One last thought:

Predicted PR's don't count ... that's too easy! :-)

Note: This is a bit of a rough draft, and I still need to look over it. Comments, suggestions, or corrections are welcome! I may want to list more credits, recommended reading, etc. Aside from the web pages I've provided links to, one reference to note is:

Gardner, James B. and Purdy, J. Gerry, Computerized Running Training Programs. Los Altos, California, Tafnews Press, 1970.