View Full Version : AT Hiking Rates, Section by Section


map man
02-12-2006, 18:02
AT Hiking Rates, Section by Section
(updated in March 2008)

by map man (Steve Shuman)



I often see anecdotal estimates of how long it takes to hike various sections of the AT, sometimes as miles per day estimates and sometimes as proportions (someone might say, "allow three days to hike in the White Mountains a distance that had taken two on other parts of the trail"), but since the advent of Trailjournals.com we now have access to hundreds of detailed accounts of AT thru-hikes, and since I'm a numbers nerd from way back I decided to see if I could verify with real numbers just how long the typical Thorough Journal Keeper (I'll call them TJKs from now on) takes to hike the AT and various sections within the AT.

So I went through many, many journals day by day for the 2001 thru 2007 hiking seasons and ended up with a group of 173 TJKs who did such a precise job of documenting not only when they started and stopped their thru-hike and when they passed certain relevant landmarks along the way, but even accounted for when and where they took their "zero days."

This study is limited to north bound thru-hikers (NOBOs) who completed their hike in one hiking season and passed ten section defining landmarks along the way in south to north chronological order (though the study does include hikers taking occasional SOBO dayhikes within sections). The sample sizes would not be large enough yet for me to make a meaningful study of SOBOs or flip-floppers, but someday I would like to replicate this study for SOBOs.

The section defining landmarks I chose either have psychological significance (Harpers Ferry is an example of this -- it's often called the psychological half-way point) or mark a change in topography that can influence hiking rates (the Glencliff to Gorham section is an example of this). The points I chose are: Georgia Border, Fontana NC, Damascus VA, Waynesboro VA, Harpers Ferry WV, Delaware Water Gap (DWG) PA, Kent CT, Glencliff NH, Gorham NH and Stratton ME. Combined with Springer and Katahdin they mark off eleven distinct sections.

The study includes 7 NOBOs from the class of 2001, 17 from 2002, 24 from 2003, 33 from 2004, 24 from 2005, 38 from 2006 and 30 from 2007 (these were the only journals from these years detailed enough for this study). This is how the journal keepers broke down by gender: 121 were male, 26 were female, 25 journals were for a male and female hiking together and 1 was for two females hiking together (I count each journal as one hike for this study even if it is for multiple people). When calculating the distances between my landmark points I was aware that these distances sometimes changed slightly from year to year as the trail changes, so with the help of the AT Data Books for various years I calculated a weighted average for the distance of each section based on how many hikers were in the study for a given year.

The result of all this journal reading and number calculating is a DESCRIPTIVE study of a certain thru-hiking population (NOBO thorough journal keepers) and may or may not be representative of all thru-hikers (though I hope it's pretty close). This study is in no way meant to be a PRESCRIPTIVE analysis of how people OUGHT to hike the AT.

So with that said, let's get to the good stuff. Table 1 shows the average (mean) number of days it took these TJKs to hike each section. The first number is the days for that section, the second number is the running total for the hike, and the median number of days to hike each section is listed after the description of that section (The mean number of days for these TJKs to thru-hike was 167.7 while the median number was 170.):


TABLE 1 -- Days to Complete Various Sections

DAYS ~~~ TOTAL DAYS ~~ SECTION
7.9 days..........(7.9)............Springer to Georgia Border (7.7 days)
7.8 days.........(15.7)...........Georgia Border to Fontana (7.7 days)
24.0 days.......(39.7)...........Fontana to Damascus (24 days)
28.7 days.......(68.3)...........Damascus to Waynesboro (28 days)
11.2 days.......(79.5)...........Waynesboro to Harpers Ferry (11 days)
19.4 days.......(98.9)...........Harpers Ferry to DWG (19 days)
12.4 days......(111.3)...........DWG to Kent (12 days)
23.3 days......(134.6)...........Kent to Glencliff (23 days)
9.6 days........(144.2)..........Glencliff to Gorham (9.5 days)
9.8 days........(153.9)..........Gorham to Stratton (9.5 days)
13.8 days...... (167.7)..........Stratton to Katahdin (14 days)


When I started this study I was not going to try to figure out how many zero days were being taken but it quickly became apparent that some sections were a lot more prone to hikers taking zero days than others. This would in turn have an effect on how long it took to hike each section and might provide somewhat misleading numbers when calculating miles per day for the average TJK in any given section. Did a section take a longer time to hike solely because of difficulty or were many tempting places to take zero days in the section also playing a role? By figuring out how many zero days were taken in each section both Miles Per Day (MPD) and Miles Per Hiking Day (MPHD) could be calculated. When stripping out the zero days from the calculations, Table 2 shows a remarkably smooth linear progression in the number of miles covered in the first four sections of the trail as TJKs gradually increased the number of miles hiked per day (in the Miles Per Hiking Day calculation). The table also shows that thru-hikers were slowed down by the rugged terrain of the White Mountains and western Maine, though perhaps not quite as much as legend suggests. The first number in this table is MPD (Miles Per Day) and the second number is MPHD (Miles Per Hiking Day). The weighted distances for each section follow the section description:


TABLE 2 -- Miles Per Day and Miles Per Hiking Day

MPD ~~~~~~~ MPHD ~~~~~~ SECTION
9.5 miles..........(10.1 miles).........Springer to Georgia Border (75.2 miles)
11.3 miles........(12.0 miles).........Georgia Border to Fontana (87.6 miles)
12.3 miles........(14.0 miles).........Fontana to Damascus (296.0 miles)
13.5 miles........(16.0 miles).........Damascus to Waynesboro (388.0 miles)
14.4 miles........(16.8 miles).........Waynesboro to Harpers Ferry (161.3 miles)
13.9 miles........(16.9 miles).........Harpers Ferry to DWG (270.2 miles)
13.9 miles........(16.1 miles).........DWG to Kent (172.5 miles)
13.9 miles........(15.6 miles).........Kent to Glencliff (323.8 miles)
10.5 miles........(11.6 miles).........Glencliff to Gorham (100.6 miles)
11.3 miles........(12.7 miles).........Gorham to Stratton (110.1 miles)
13.6 miles........(14.6 miles).........Stratton to Katahdin (187.9 miles)
13.0 miles........(14.7 miles).........The entire AT (2173.2 miles)


Here's the distribution of hikers grouped by the month they left Springer and the month they reached Katahdin:

1 hiker in this study left Springer in January
25 in February
102 in March (59%)
42 in April
3 in May

4 arrived at Katahdin in June
25 in July
44 in August
77 in September
23 in October

The first date in Table 3 is the median date each point was reached by the TJKs (same number of hikers arriving after this moment as before) and the second date is the mean date:


TABLE 3 -- Date Landmarks Were Reached

MEDIAN DAY ~~ MEAN DAY ~~~ LANDMARK
March 16............March 18...........Springer
March 24............March 26...........Georgia Border
April 1................April 3...............Fontana
April 26..............April 27..............Damascus
May 28..............May 26..............Waynesboro
June 9...............June 6...............Harpers Ferry
June 27.............June 25..............DWG
July 10..............July 8................Kent
Aug. 3...............July 31..............Glencliff
Aug. 13.............Aug. 9...............Gorham
Aug. 24.............Aug. 19..............Stratton
Sept. 8..............Sept. 2..............Katahdin!


Of course no single hiker is "typical" and people will vary in their own ways from the 167.7 days (about five and a half months) it took these TJKs to get to Katahdin. But I think it can be useful to see the rate of progress these TJK thru-hikers have experienced on their way there for planning purposes. One thing I discovered is that if a group of hikers in this study taking four months to thru-hike takes 20% less time to get there than a group taking five months, 33% less time than six month hikers and 43% less time than seven month hikers, those figures tend to stay true for each section within the thru-hike as well. So with that in mind I calculated the number of days a "typical" TJK thru-hiker might have needed to reach these section landmarks for four different hypothetical hikes. Table 4 lists the "typical" number of days it would take to reach each landmark for hikers taking 4 months (122 days), 5 months (153 days), 6 months (183 days) and 7 months (214 days) to thru-hike:


TABLE 4 -- Four Hypothetical Hikes

4#HIKE ~~~ 5#HIKE ~~~ 6#HIKE ~~~ 7#HIKE ~~~ LANDMARK
6 days...........7 days..........9 days..........10 days..........Georgia Border
11 days.........14 days........17 days.........20 days..........Fontana
29 days.........36 days........44 days.........51 days..........Damascus
50 days.........63 days........75 days.........88 days..........Waynesboro
58 days.........73 days........87 days.........102 days.........Harpers Ferry
72 days.........90 days........108 days.......127 days.........DWG
81 days.........102 days......122 days.......142 days.........Kent
98 days.........123 days......147 days.......172 days.........Glencliff
105 days.......131 days......157 days.......184 days.........Gorham
112 days.......140 days......168 days.......197 days.........Stratton
122 days.......153 days......183 days.......214 days.........Katahdin


Finally, I wanted to take a close look at the nature of "zero days," the days that no miles are logged by hikers on the AT (TJKs took a mean 20.3 of them on their thru-hikes -- the median number of zero days was 19). I wanted to look at both the short term breaks (1 or 2 day breaks from the trail) and long term breaks (3 or more consecutive days with no AT miles hiked). I hypothesized that hikers would need to take a lot of the short term breaks in their earlier days on the trail to cope with hiker's fatigue and with the sometimes nasty weather in the southern Appalachians in March and April, and that these short term breaks would lessen in frequency as a hiker walked north. It appears I was wrong, as Table 5 shows. TJKs took very few of these zero days in the first two sections. It's my speculation now that hikers seemed to take these short term breaks largely due to the availability of trail towns and concentration of hiker focused shuttle services and hostels. For example, the section with the highest percentage of these short term zero days taken was the Fontana to Damascus section with Hot Springs, Erwin and many famously hospitable hiker services in this stretch.

On the other hand, I hypothesized that long term breaks (breaks of 3 or more consecutive days when hikers often leave the vicinity of the trail completely) would be scarce in the early days when the novelty and newness of the experience alone might carry people forward and again scarce toward the end when the goal was so close, and more frequent in the middle of the journey. On this, it sure looks like I was right, as the "Long Term Break" percentages in each hiking section in Table 5 show. In this table the first number is percentage of days taken to complete a section that are zero days. The second and third numbers break the zero days into two groups -- STBs (zero days taken in Short Term Breaks of 1 or 2 days) and LTBs (days taken in Long Term Breaks of 3 straight days or more):


TABLE 5 -- Zero Days

%ZERO DAYS ~ %STB ~~~ %LTB ~~~ SECTION
....(5.4%)..........(4.8%)........(0.5%)........Sp ringer to Georgia Border
....(6.5%)..........(5.6%)........(0.8%)........Ge orgia Border to Fontana
....(12.1%)........(9.4%)........(2.6%).........Fo ntana to Damascus
....(15.2%)........(8.0%)........(7.2%).........Da mascus to Waynesboro
....(14.1%)........(8.0%)........(6.1%).........Wa ynesboro to Harpers Ferry
....(17.6%)........(6.9%).......(10.7%)........Har pers Ferry to DWG
....(13.8%)........(7.8%)........(6.0%).........DW G to Kent
....(10.7%)........(6.9%)........(3.8%).........Ke nt to Glencliff
....(9.0%)..........(6.2%)........(2.7%).........G lencliff to Gorham
....(11.2%)........(8.9%)........(2.2%).........Go rham to Stratton
....(6.8%)..........(5.6%)........(1.2%)........St ratton to Katahdin
....(12.1%)........(7.4%)........(4.7%).........Fo r entire AT


METHODOLOGY

If a hiker started on the approach trail to Springer and only went as far as the Springer Mountain Shelter I didn't count that as the first day of the thru-hike even though .2 miles of the AT were covered. Likewise, if a person got a ride to USFS 42 and walked the .9 miles to Springer and hiked no more of the AT that day I didn't count that as the first day of the hike either. I think these small partial days would distort the results for the first short section to the Georgia border. The day a hiker passes USFS 42 going north is the day I start the thru-hike clock ticking for the purposes of this study.

When a hiker reached one of my landmark points -- for example, Waynesboro -- I stop the clock for that section (in this case, the Damascus to Waynesboro section) and start the clock for the next section. So any zero days that hiker took in Waynesboro are counted in the Waynesboro to Harpers Ferry section.

If a hiker passed a landmark, let's say DWG, and hiked on past without stopping for the day, I break that day into fractions of tenths of a day. So if that day began at Kirkbridge Shelter, 6.4 miles short of DWG, and ended at the "Backpacker Site" 4.8 miles past DWG, I counted six tenths of that day in the Harpers Ferry to DWG section and four tenths of that day in the DWG to Kent section.

In tracing hikers' progress in their journals I used all the clues available to tally zero days and hiking progress. Some were very thorough and gave exact starting and ending points for each day, with mileage accurately logged and separate entries for each zero day as well. These journals were easy to follow. But not all journals used in the study were this thorough. Some just gave starting and stopping points. Some only registered mileage. Some were odd combinations of the two. Some left gaps when they took zero days. Some would recount multiple days of hikes in one entry (and all of these oddities often meant that the "Stats" section available to look at for each journal at Trailjournals.com had inaccurate numbers for "zero days" and "hiking days"). As long as I could reconstruct what had happened, even if it took reading the entire text of multiple journal entries to get it done, I made every effort to do it. But if there was anything in the journal that made me uncertain if every stretch of trail was actually hiked, and about tracking which days were hiked, and which devoted to zero days, I did not include that journal in this study.

(For a more in-depth discussion of how data was gathered for this article, and a series of tables and illustrations going into more detail about different aspects of the data, as well as my responses to suggestions from White Blaze members with more knowledge of statistical methods than I have, see Post #28 in this thread. For information on which towns TJKs were most likely to take zero days, see Post #69. For preliminary findings on how the numbers for men and women compare, see Post #80. For a table correlating miles hiked per day with trail ruggedness, see Post #93.)


ACKNOWLEDGEMENTS

A big thank you to the folks who created and maintain Trailjournals.com and WhiteBlaze.net. These sites are a great service to the hiking community. I could not have done this study without Trailjournals and could not hope to share it with so many people without WhiteBlaze. And thank you as well to all the WhiteBlaze members who offered suggestions to improve this article or offered encouragement.

Roland
02-12-2006, 18:23
Wow! How incredibly time consuming this must have been. I'm going to have to read through it a few times to fully appreciate all the information you present.

Topcat
02-12-2006, 18:35
Map Man,
I hate to mooch others work but i would love to see the raw data on this study. I do statistics for work and also teach it. This would be great for me to use, something different and interesting compared to the dry stuff in our curriculum. Thanks for the interesting analysis.

Kerosene
02-12-2006, 18:47
Whoa, someone's got a little bit of extra time on his hands!

This is quite interesting, map_man. I'll bet that the ATC might be interested in your analysis.

When all is said and done, though, I hope that no one (except potential record-setters) use this to modify their thru-hike. A lot of the pre-planning goes out the window when you get an injury, hit a big storm, get off the trail with your new-found buds, forget that the post office is closed on Sunday, etc. They will, however, be able to classify themselves in retrospect and see how they stack up to "Joe Hiker".

Cuffs
02-12-2006, 18:59
That is one piece of phenomenal work! I do have a couple questions for you...

Do you have the numbers on the genders? And what about the age brackets?

I read many many trailjournals, and those are the 2 things I look for. Looking to find people (women) in my age group (35-40) that are doing a thru. I find I can understand and empathize with them and learn alot from them.

Whistler
02-12-2006, 20:33
Okay. Wow. What a cool study. Let's put this one in the Articles section. Great work, map man.

-Mark

Billygoatbritt
02-12-2006, 21:49
Simply awesome!

attroll
02-12-2006, 21:55
Wow, Oh my godness. I can not imagine how much time was spent on this. Maybe this thread should be moved to the Articles section. What do you think Doctari?

Tha Wookie
02-12-2006, 22:11
I think your right, Atroll. This is a good piece of work. Still, I'd like to hear more about the methods.

How were the averages weighted?

What did the distributions look like? Normal? How did you deal with outliers?

VERY COOL analysis, map man.... thanks for sharing your work. :sun

Alligator
02-12-2006, 23:19
Some thoughts.

The TJK's were grouped without regard to start date, start year. It hasn't been demonstrated that this is a reasonable assumption. There could be year-to-year mean differences, or even start month-to-month differences that have been masked by taking an overall grand mean. Further, this grand mean may be biased if there are significant year-to-year or start month-to-month differences.

Kudos for defining the sample population. It is a self-selected sample, not a random sample however. That bothers me, not necessarily anyone else. I will say though that while everyone who starts hopes to finish, the reality is that about 80% don't.
The result of all this journal reading and number calculating is a DESCRIPTIVE study of a certain thru-hiking population (NOBO thorough journal keepers) and may or may not be representative of all thru-hikers (though I suspect it's probably close).
It is also stated that journalers with ambiguous zero days were dropped, another subsetting. Just splitting hairs here:) .


But I think it can be useful for some hikers who are so inclined to get an idea of the rate of progress a typical hiker might experience on their way there.
I may be taking this out of context but ... the typical hiker fails. About 80-85% of them. The population that these numbers may apply to are successful thru-hikers. Drawing inferences is not possible, because starting out, no hiker really knows if they are going to finish. This is a descriptive study, as stated.


Regarding zero days. On average, by the results presented, it takes about 16 days to get to Fontana. IMO, that may not be enough time before the long reality of the journey sets in. In other words, injuries may not have seriously developed yet, mental fatigue may have yet to develop, the weather could still go bad, etc. I don't feel like the data is sufficient for the following.After having read a fair number of journals for my own enjoyment I hypothesized that hikers would need to take a lot of the short term breaks in their earlier days on the trail to cope with hiker's fatigue and with the sometimes nasty weather in the southern Appalachians in March and April, and that these short term breaks would lessen in frequency as a hiker walked north. Turns out I was dead wrong, as Table 5 shows.
The data could be available. The journals might give a reasonable idea as to the purpose of the zero day. "I was nursing a sore tendon", "I needed a beer", "I had to make repairs", etc.

Of course, standard errors or confidence intervals would greatly improve the understanding of the means:D . There is no description of the variability of these estimates.

A very detailed analysis though and I applaud the effort:clap .

ARambler
02-13-2006, 01:02
Some thoughts.
...
A very detailed analysis though and I applaud the effort:clap .

Actually, I didn't understand may of the thoughts in between.
For me it is very important to look at only data for those who completed the trail.

I recalculated Table 4 for my 2004 hike. I took the number of days in Table 1, multiplied them by the days hiked (100%-%zero)/100 and adjusted by the (total non-zero/my nonzero days). Even though the number of days I hike was relatively far from the mean, I was only off from these calculated days by +/- 1 day. Also, almost all of the discrepancy can be explained by snow around Franklin and slowing down for the last section between Stratton and Katahdin. I was over a day faster for these two sections in 2005. (On the other hand I lost a couple days for snow near Erwin and a couple days for tendonitus in NJ in 2005)

Was there a greater variation in non zero day pace for this last section? It seemed to me that half the hikers were putting their head down and charging to Katahdin and the other half were trying to keep the hike from ending.
Rambler

dje97001
02-13-2006, 01:26
Alligator, the value is in that looking at hikers who finish the whole thing may provide lessons as far as pacing (esp. at the beginning when you are worried about people passing you, or wondering whether you started a week or two too late to make it to ME in time) and setting more realistic approximations for mail-drop locations.

It might be interesting to examine those who don't finish and compare daily mileage... maybe they pushed themselves too hard too fast.... maybe they realized that they would never make it to maine at the speed they were capable of... all of it is interesting stuff.

Yes, it is obvious this isn't a predictive model... but who cares? Consider the years he examined to be the population that exists. The means for the population may be significantly different than the means for those who don't complete the thru, those who don't journal online at trailjournals.com, those who completed in a previous year (not included in the analysis), or those who completed the hike by sections, etc. ... but he didn't claim this extended to those pops. Still, to worry about kurtosis or skewness in this case is pretty much a waste of time--most people don't even bother checking for that stuff anyway... they just live and die by the central limit theorem. Sure, an exceptionally rainy year may have slowed people early on (resulting in more zero days) but then really dry years may have resulted in faster paces (with fewer zeros). It should all even out (life is all about probabilities). We're only talking about 5 years (and ideally we'd have more) so there is likely to be a larger SE than you'd like, but without anything else to use, this is damn good stuff. You could always compare the 3 measures of central tendency to get a better idea of whether or not you have some outliers screwing with the data if you are really worried about it, but again, why bother, this is really interesting to chew on. Thanks map man!

Peaks
02-13-2006, 09:21
Lot of great work. Thanks Map Man.

Roland Mueser did a limited survey in 1988 of thru-hikers. His survey shows a mean of 174 days, and 24 zero days average. I'd say it's a close correlation. So, while many things have changed, your analysis shows that some things have not changed.

Tha Wookie
02-13-2006, 09:30
I think the last two folks who've responded to Alligator have missed his point that the data is good, but there really is no way of knowing how close to reality it is without understanding the nature of the data set. If you have a heavily skewed distribution, then mean averages can be very misleading. In those cases, the outliers could be dropped, the data could be weighted (he already said they were, but by what factor?) or the medians used in place of means.

Like what gater said, the sample population is what it is. It's like all the psychology studies that can only be extrapolated to college students, becuase they were the sample group.

All in all, valiant effort, and I think it can be made better to really solidify the results. Alligator isn't just being picky, but adhering the assumptions of stats (normal distribution, random sample, ect.).

Interesting conversation.

camich
02-13-2006, 09:57
I think this is great. Thanks for all the time you put into it. I'm always happy to have additional information to help plan.:clap :clap :clap

dje97001
02-13-2006, 10:14
I get all of that. I think we all do. But you know as well as I do that very few studies actually use anything other than convenience samples (which are non-random--I'm not talking about random assignment here) because of cost, time and difficulty in compiling the true pop list. So basically we all assume normality, again unless we look at the skewness (or how flat or peaked the distribution is), which no one does. I'm sure you could do it, but again, I'm not sure what good it would do. Just compare the mean to the median... the closer they are to eachother the less likely skew exists.

But let's be honest, we aren't doing significance tests on this data, nor ANOVAs nor Correlations nor anything else for that matter. If you wanted a study that could be published in a journal you probably want to worry about these things--yet again, this is a content analysis not necessarily subject to the same issues of experimental research (samples for CA are often non-random). Frankly with a sample size of 105 (unless map man included Squeaky) there aren't likely to be substantial outliers (again, these people are already outliers from the "normal" population... most people wouldn't walk over 2000 miles). I think that this compilation of data is awesome in and of itself and really doesn't need anything else.

I probably over-reacted, but Academics (who have had sufficient training in stats and methodology) commonly do what Alligator did: finding potential flaws/holes and making it apparent that if the flaws did exist then any conclusions would be shaky at best and then finishing it up by saying something nice about the effort--no knock on Alligator, I've seen it a million times. While it can be beneficial in improving the research, it also can be perceived as really jerky (esp. to people who aren't in academia). It is always easier to critique a study than to conduct one yourself. My apologies, Alligator, if my comment came across jerky.

Anyway, none of that matters to anyone outside of grad students in quantitative programs, faculty who are obsessed with statistics and methodology and people who review/edit quantitative journal submissions (this last group is comprised of people who had previously been in the other groups).

Alligator
02-13-2006, 10:29
1. Alligator, the value is in that looking at hikers who finish the whole thing may provide lessons as far as pacing (esp. at the beginning when you are worried about people passing you, or wondering whether you started a week or two too late to make it to ME in time) and setting more realistic approximations for mail-drop locations.
2. Yes, it is obvious this isn't a predictive model... but who cares?
IMO, the above two statements are contradictory. But, referencing 1., that's why a confidence interval/s.e. would be useful. If it says that it takes a mean of 8 days +/- 0.5 days that would be helpful. If however, it says it takes 8 days +/- 3 days, that creates a different situation. See?

It might be interesting to examine those who don't finish and compare daily mileage... maybe they pushed themselves too hard too fast.... maybe they realized that they would never make it to maine at the speed they were capable of... all of it is interesting stuff.
Sure.
Yes, it is obvious this isn't a predictive model... but who cares? Consider the years he examined to be the population that exists.It is most certainly a sample and not the population.
The means for the population may be significantly different than the means for those who don't complete the thru, those who don't journal online at trailjournals.com, those who completed in a previous year (not included in the analysis), or those who completed the hike by sections, etc. ... but he didn't claim this extended to those pops.
I fully understand that. But if there are actual differences, saying the mean is the same for all groups (even just thruhiker groups) masks what may be important underlying differences. Map man is making a serious effort. I understand that, that is why I gave what he presented a serious review. Items placed up for Articles are subject to review. It makes them better. It's extremely common to have things reviewed. Relax, I didn't mark the article rejected for review box:cool: .

Still, to worry about kurtosis or skewness in this case is pretty much a waste of time--most people don't even bother checking for that stuff anyway... they just live and die by the central limit theorem. Sure, an exceptionally rainy year may have slowed people early on (resulting in more zero days) but then really dry years may have resulted in faster paces (with fewer zeros). It should all even out (life is all about probabilities).
But if there are differences, that evening out may not have any meaning.
We're only talking about 5 years (and ideally we'd have more) so there is likely to be a larger SE than you'd like, but without anything else to use, this is damn good stuff. You could always compare the 3 measures of central tendency to get a better idea of whether or not you have some outliers screwing with the data if you are really worried about it, but again, why bother, this is really interesting to chew on. Thanks map man!
I didn't mention any distributional problems. Given his 105 samples, I wouldn't expect serious problems with his means. Also, I actually liked that he presented the medians as an alternative. He should be cognizant of any extreme outliers though. Further, I only asked for the SE, I haven't commented on how large it may or may not be.

Sure, a lot of effort was put into it. It may certainly be reasonable to use. But care needs to be taken to ensure that any underlying biases are at least considered and hopefully controlled for. For instance, wouldn't it be interesting to know if there are age differences among hikers and if the sample was representative age-wise? Someone previously mentioned they pick a hiker similar to themself to compare to. And wasn't there a really wet year on the AT in that pool of data? Do Feb. starters really complete the trail on average the same as April hikers? It is important to consider factors such as these and not to immediately discount them.

It is good stuff! I'd consider using it.

dje97001
02-13-2006, 10:35
WRT the "contradiction"... I was making the statement that it wasn't predictive in the sense of an academic model. But there are massive differences between theory and practice. Theoretically, you can't make causal assumptions about correlational data (unless you've taken care of all of those pre-requisites, i.e. temporal ordering, etc.)... but practically? I would definitely use this data to "predict" where I will be at x days out... especially since this is the best compilation of numbers that I've seen.

carolinahiker
02-13-2006, 10:41
Im goin to section hike from erwin tenn to hot springs nc in may has anyone done the section lately and whats it like trail wise ? Thanks.

Rick

Alligator
02-13-2006, 11:30
I get all of that. I think we all do. But you know as well as I do that very few studies actually use anything other than convenience samples (which are non-random--I'm not talking about random assignment here) because of cost, time and difficulty in compiling the true pop list. So basically we all assume normality, again unless we look at the skewness (or how flat or peaked the distribution is), which no one does. I'm sure you could do it, but again, I'm not sure what good it would do. Just compare the mean to the median... the closer they are to eachother the less likely skew exists.

BTW, skewness refers to the distributions symmetry--heavy tails on the right, heavy tails on the left. Kurtosis refers to "peakedness".

And I conduct my own research for applied science all the time.

dje97001
02-13-2006, 11:55
Yeah thanks pal. I know that (lepto, platy, meso...). I was giving you options... consider it a grammatical mistake.

The point is, whether you believe it or not, you and I probably agree on 98+% of this crap. Save possibly your stance on standardized vs. unstandardized (correlation vs. covariation) or maximum likelihood vs. least squares assumptions. I agree with your statements from a research standpoint. The funny thing is that those who haven't spent much time in stats/methods don't realize that these debates can be just as intense as the "hammock vs. tent" or even the dreaded "purist" debate.

The point I'm trying to make--one that Chris made to me a while ago (but it took a while to accept)--is that there are already too many numbers for most people to spend much time on. Hikers understand that their mileage may vary. Confidence intervals, while very useful in gaining more precision (most of the time extremely desirable), in this case will simply obscure the value of these numbers to most people (i.e. they don't want to know that there is a 95% chance of them making it from Springer to the Georgia border in 7.25 to 8.45 days) they just want the best guess, for which the mean (or median) should suffice. Yes the pace for april starters may be different from march or feb starters for only the first 2 sections... but map man didn't ask for a critique. He didn't even ask that it be placed in the articles section. Such a detailed criticism without prompting will only make it less likely for people to share potentially valuable information. I for one, think that in its present form it is definitely of value to the hiking community.



That set of numbers map man listed is complicated enough.

Alligator
02-13-2006, 13:51
Yeah thanks pal. I know that (lepto, platy, meso...). I was giving you options... consider it a grammatical mistake.
Your welcome.
...
The point I'm trying to make--one that Chris made to me a while ago (but it took a while to accept)--is that there are already too many numbers for most people to spend much time on. Hikers understand that their mileage may vary. Confidence intervals, while very useful in gaining more precision (most of the time extremely desirable), in this case will simply obscure the value of these numbers to most people (i.e. they don't want to know that there is a 95% chance of them making it from Springer to the Georgia border in 7.25 to 8.45 days) they just want the best guess, for which the mean (or median) should suffice.
Most estimates that people give are in the days range. As an example, 5-7 days to finish the Smokies. I'd want to know what this range is if I was a slow hiker, so I could plan on eating the last day. Of course YMMV, that's why a confidence interval or even a range is much better than a point estimate. I think most folks can understand 6 days give or take a day. Those interested, could look to the right of the estimate or ignore it. In a similar vein, the estimate of 167.8 days would be vastly improved by saying, it took them on average 168 days +/- 21 days. (I made the interval up.) This certainly would give an impression that there is a lot of variability. In particular, being risk averse, I would want the upper bounds. Then a hiker could have a conservative, reliable estimate as to how much time, money, and supplies are necessary for the journey.

[Note to Map man-Actually, what I think would be better is to take say the 10th and 90th percentiles for the time it takes to do a section, along with the median. A confidence interval for the mean still relates to the mean. But the 10th and 90 percentiles would give you a good idea of the range yet would exclude extreme outliers. It would also be distribution free.]

Yes the pace for april starters may be different from march or feb starters for only the first 2 sections...but map man didn't ask for a critique. He didn't even ask that it be placed in the articles section. Such a detailed criticism without prompting will only make it less likely for people to share potentially valuable information. I for one, think that in its present form it is definitely of value to the hiking community.

That set of numbers map man listed is complicated enough.
As far as I know, MM posted this in the Articles section. The forum where this thread is currently located is not the finished articles section. Once an article gets feedback, it gets elevated to the completed articles section. I don't see your view of MM's intent as being correct. Further, within limits, any topics placed on the site are open to discussion, it is a public forum. Now, if you could, please give MM a chance to speak if he so chooses. Thanks pal. I'm so happy I have a new buddy!

map man
02-14-2006, 01:35
I assumed there would be people here at WhiteBlaze with more knowledge of statistical methods than I have and I'm feeling pretty good that some of you are giving me some thoughtful advice on my proposed article. After thinking about what dje97001, Alligator and Tha Wookie in particular have had to say I've spent some time this evening calculating some medians to incorporate in the study. I've already edited my article to add the median for total days hiked and total zero days taken and I've calculated the median for days taken to hike each section, though I'm still debating how best to include those figures in the article. And it's getting too late in the evening for me to think clearly about it at the moment.

I also incorporated a suggestion of ALHikerGal to mention the gender breakdown of the 143 hikers in the study and that, too, I've already edited into the article. Ages of the hikers is impossible to know because just like here at WhiteBlaze, not everyone at Trailjournals chooses to reveal their age. I'm not going to break down the hiking rates by gender in the article because the number of female hikers in the study at this point is not high enough to make the numbers meaningful, I think.

I've got to be candid with Alligator and Tha Wookie -- I don't know how to calculate confidence intervals and though I know what an "outlier" is, I don't know the statistical methods for figuring out just how outlandish (wink, wink) an oddball bit of data needs to be to throw it out. And no, this is not an invitation for anyone to give me a crash course. I'm thinking over Alligator's idea for giving the 10th and 90th percentile values for hiking days per section as a way of dealing with extreme numbers at either end, because that is something I do know how to do. But I'm still thinking about it. I'll post more on this in the next day or two.

Finally, Topcat's interest in seeing my raw data is something I've also been thinking about. Right now the data is written out by hand (in very small print) on several tally sheets, but I've known all along that it would probably be a good idea to convert this stuff to an electronic spreadsheet of some kind, and this just provides extra motivation (but as of March 2008, I still haven't done it:rolleyes: ).

And by the way, I should mention that my original posting of the article was indeed in the "Articles Forum" because I intended from the beginning for it to be an article if it passed muster. But since three or four of the first posters said something like, "hey, this should go in the articles section," I can understand why it wasn't clear to some that that was my intent. Anyway, every post that I've seen that has made suggestions for the article has in my view been in the spirit of wanting to see the article be as good as it can be, and for that I'm thankful. Actually, the thing I feared most was that after my months of work the article might be greeted with utter indifference, and it's clear from all the responses and views the thread has gotten in a little over 24 hours that this isn't the case.

dje97001
02-15-2006, 05:15
It appears my perception was in error. I still stand by my statement that too many numbers will only confuse the issue--but I'm willing to accept that I am in the minority on this. So, since map man doesn't mind, feel free to edit away. Enjoy!

Austexs
02-15-2006, 05:32
Wow!

Great post, Map Man.

:cool:

Bilko
02-15-2006, 09:25
map man. Thanks for the work. As a section hiker I often looked at different journals and tried to figure out how long it took them to hike certain sections. I can actually see how my section hikes fit into a thru-hike. Your work allows us to see how long it took the people that were able to document their achievements. I enjoyed the study greatly. I liked the way you broke it into the 11 sections, I enjoyed looking at the tables and your explanations of how and why you counted days etc., the way you did.
How long did it take you? Did you make copies of all the journals? Did some of the journals seem unlikely to have occurred? Your next assignment.... find out common occurences that happen to people to drop out before the first section or by Fontana. My guess is improper food and dehydration. However, you may never actually find out the reason. Which may be best.

ARambler
02-15-2006, 18:11
0) I seem to be the only poster who has actually used your data. I was going to complain about all of the bandwidth wasted by those who say "the users are too stupid to use so much data" or "I'm so smart I can't use your data, unless you provide the gene sequence on chromosome 18 for each hiker who starts at Springer." However, I see the updates you have made so far are really good. Keep up the good work. I'll get back to how I've used and would like to use the data.

1) You have two types of very useful quantitative data: How people hike, and how people don't hike. My zero days were sporadic, and did not correlate well with your averages. So, as I posted earlier, I looked at my Hiked Days versus your hiked days. I calculated your hiked days by multiplying your mean number of days by (100-%zero)/100. I get essentially the same numbers by taking (miles/section)/your miles per hiked day. This is my calculation:

<TABLE style="WIDTH: 144pt; BORDER-COLLAPSE: collapse" cellSpacing=0 cellPadding=0 width=192 border=0 x:str><COLGROUP><COL style="WIDTH: 48pt" span=3 width=64><TBODY><TR style="HEIGHT: 12.75pt" height=17><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; WIDTH: 48pt; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" width=64 height=17>yourDays/sec </TD><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; WIDTH: 48pt; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" width=64>%Zero /sec</TD><TD style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; WIDTH: 48pt; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" width=64>HikeD/sec</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>7.95</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>5.50</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="7.5127499999999996" x:fmla="=A2*(100-B2)/100">7.51</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>7.71</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>4.60</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="7.35534" x:fmla="=A3*(100-B3)/100">7.36</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>24.34</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>13.00</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="21.175799999999999" x:fmla="=A4*(100-B4)/100">21.18</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>28.60</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>15.30</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="24.2242" x:fmla="=A5*(100-B5)/100">24.22</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>11.32</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>15.10</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="9.6106800000000003" x:fmla="=A6*(100-B6)/100">9.61</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>19.32</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>17.00</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="16.035599999999999" x:fmla="=A7*(100-B7)/100">16.04</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>12.32</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>13.80</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="10.619840000000002" x:fmla="=A8*(100-B8)/100">10.62</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>23.11</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>10.00</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="20.798999999999999" x:fmla="=A9*(100-B9)/100">20.80</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>9.60</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>10.20</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="8.6207999999999991" x:fmla="=A10*(100-B10)/100">8.62</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>9.79</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>10.00</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="8.8109999999999999" x:fmla="=A11*(100-B11)/100">8.81</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl25 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: windowtext 0.5pt solid; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num>13.77</TD><TD class=xl25 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent" align=right x:num>6.60</TD><TD class=xl25 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: windowtext 0.5pt solid; BACKGROUND-COLOR: transparent" align=right x:num="12.861179999999999" x:fmla="=A12*(100-B12)/100">12.86</TD></TR><TR style="HEIGHT: 12.75pt" height=17><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; HEIGHT: 12.75pt; BACKGROUND-COLOR: transparent" align=right height=17 x:num x:fmla="=SUM(A2:A12)">total 167.83</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num>12.00</TD><TD class=xl24 style="BORDER-RIGHT: #d4d0c8; BORDER-TOP: #d4d0c8; BORDER-LEFT: #d4d0c8; BORDER-BOTTOM: #d4d0c8; BACKGROUND-COLOR: transparent" align=right x:num="147.69039999999995" x:fmla="=A13*(100-B13)/100">147.69</TD></TR></TBODY></TABLE>


2) I like to use the hiking days data separately. Also, especially from a statistical point of view, you should not combine hiking and not-hiking without testing for independence. What does a simple plot of the number of zero days versus the number of days hiked look like? I would guess that the slope below the mean would be more than proportional, i.e. reducing the days hiked in half from 148 to 74 would reduce the zero days by more than from 20 to 10. (This would presumably be an extrapolation of the line.) However, at the upper end, The slope might be less than proportional. A hypothetical hiker doing 148+74 = 222 hiked days, might have to reduce the number of zero days to complete the hike before winter. Therefore, Table 4 could be off a little.

3) Similarly, I'm a little reluctant to build a Table 4b, for a "typical Hiking Days" number for 4, 5, 6, and 7 months. I would want to know whether the sub-130 day people started in good physical shape (and aggressive mentality) and sped up by the same or less percentage as others, or the sub-130 day people started the same or slightly faster and sped up significantly more than the average. I'd have similar concerns about 180+ day hikers speeding up in Maine to beat the snow. Note, "Table 4b" methodology worked very well for me.

4) Skewed distributions: Thanks for the detailed mean versus median data. It is interesting to see how the mean catches up to the median. I guess the data show that the people who start very late, have to catch up relative to the median.
It does not surprise me that the total days distribution is skewed to the left and consequently the median is higher than the mean. Similar arguments to 2) above would make me believe that it is significantly more likely that a hiker would finish in 122 days (168 - 46) than in more than 214 days. (This may just be that 4 month people brag more than 7 month people.) This feature of the data will make it difficult to use statistical tests that rely on normality for the probability of finishing in a given time. So what? Hikers should worry about being in the 20 % of those who finish, not worrying about being in the top 10 % (2 % of starters) or having enough time to be in the bottom 10 % of this 20%.

5) Outliers: I'm surprised outliers did not seem to be a concern to you. In 2005, Apple Pie left the trail in Erwin for about 50 days. This is over half of the LTB you report for the Fontana to Damascus section. Similarly, Stumpknocker took almost 365 days to hike the trail in 2004, but he hikes at a 4 month pace. (Hippy LS also did 360+ days but I don't think she had a complete TJournal. FB & Silver Girl took > 80 days off but their journal was not on Trailjournals.) I don't have much of a point to make about outliers, they are a part of life and a part of life on the AT. However, I think they are more a factor affecting zero days, and that's another reason for separating out zero days.

6) Variability:
a) By far the most common expression of variability is the standard deviation, or variance=std.dev. squared. It should be calculated using a spreadsheet of the standard deviation function on a programmable calculator. If you have to calculate it by hand:
Std.Dev^2 = [sum of each (day squared) - n*Ave^2]/(n-1) = [sum(Di*Di) - 2,957,525]/104; where Di = total days for each hiker, i, and 2,957,525 = 105 hikers*167.83*167.83 average days. For the Hiked Days it would be: Std.Dev^2=[sum(di*di) - 2,290,295]/104; Note, 105*147.69*147.69 = 2,290,295.
I hope you have the Excel function for sample standard deviation.
b) The easiest calculation for variability is range; just the longest minus slowest days. I believe one must assume a normal distribution to convert Range to (unbiased) variance (std.dev^2).
c) The other commonly used expression for variability is a confidence range. The most common range is a 95 % interval which for a normal distribution is about plus or minus 2 std. dev. from the mean. I recommend against using these confidence intervals with such skewed distributions. Note, the 95% confidence range means 2.5 % faster and 2.5 % slower. Since the normal assumption makes the estimates symmetric, the confidence interval is often expressed as Average +/- Interval/2. e.g. 168 +/- 30 days. for a std dev about 15 days.
d) You could report the actual % interval as a pseudo-confidence interval. Just figure out the number of days that 2.6 hikers were slower and another number which 2.6 hikers were faster. What has been proposed is reporting the lower and upper numbers of -10% and +90 %. For the section data, I think you will find it difficult to interpolate between whole days for the -10%/+90% number, which in my mind is an arbitrary, non-standard percentile, pseudo-confidence interval. The number might be highly dependent on how many people hiked through the section boundary on one day.
e) If you remove the variability associated with the zero days, you might be able to give a good representation for hiking variability just by reporting aggregate data. This data might also be easiest to understand and use in a statistic free way. I propose to aggregate the data for each section into five groups. Because the distances vary by such a large amount, the intervals for the groupings should also vary. I suggest that each of the five groups vary by m=1, 2, or 3 days. You would then report 8 values/section: g1.days, m, n.g1, n.g2, n.g3, n.g4, n.g5, Slow. I'm not sure whether the g1.days should be integer and the start of the interval. Assuming that it is, you would get numbers like:
5, 1, 12, 21, 32, 19, 11, 2. For the first section, 12 hikers would reach the GA line in 5.0 to 5.9 days, 21 hikers would reach the border in 6.0 to 6.9 days, 32 hikers in 7.0 to 7.9 days, 19 hikers in 8 to 8.9 days, 11 hikers in 9.0 to 9.9 days and 2 hiker over 9.9 days (optional). By calculation, 105-(12+21+32+19+11+2)=3 hikers less than 5.0 days. The relative distribution for the Damascus to Waynesboro will not be exactly the same, but if it was, the data would be reported as 17, 3, 12, 21, 32, 19, 11, 2. and the groupings would be: 17 to 19.9 days, 20 to 22.9 days, 23 to 25.9, 26 to 28.9 days, and 29 to 31.9 days. Slow hikers would look at this raw data and see 11 in 105 needed 8- 8.9 days food to reach the GA border and 23 to 25.9 days to get to Waynesboro, and would plan on packing this amount. (Hopefully, not all at once.)
f) I will be very interested if the variability is significantly different for the first couple and last couple of sections.
Rambler

domnokmis
02-15-2006, 20:07
While it can be beneficial in improving the research, it also can be perceived as really jerky (esp. to people who aren't in academia). It is always easier to critique a study than to conduct one yourself.I thought he was being pretty jerky, myself. And I'm not in academia, so you are obviously correct. Perhaps if I were more insulated from the practical, I could be more picky.

But as far as I can tell, he applied the study to something the author did not extend it to, then said, it can't be used for this purpose. Duh.

Besides, you can take ANY study and trash it as he did.

Different years might make a difference? Sure so the author coorelates them by year. Ah, but he didn't combine dry years and rainy years, did he. So he does. Ah, but one of his rainy years was really a dry year with a hurricane that skewed the numbers. So he puts it with dry years. Ah, but it was wet by definition of # of inches of rain. So the author puts it with the wet years. Ah, but it was a dry year with a hurricane.

So the author drops the year in question.

AH HA! Now you are selecting data!!!!!! Bad bad bad.

Jerky, as you said.

Great study, good to use to see how you hike is lining up with those who made it to the end.

Maybe sometime someone can make up a similar one about re-supply, if academicians can get over the use of caches and hiker boxes.

map man
02-15-2006, 23:42
(I'm planning to use this post as the one place where I answer questions about the procedures I used in my AT Hiking Rate study, as well as more detailed information about the resulting data [info that in my judgment might bog down the article], and my responses to suggestions from WhiteBlaze members with expertise in statistical methods.)

A couple of the questions so far have dealt with how I collected the raw data, so I will talk a little about that here. First, I did not have to read through every entry of every journal. I already mentioned in the article that I only bothered looking at the journals with at least 70 entries (the number of entries is included on the page that lists the journals for each year) and that eliminated most of the journals right away. When I did look at a journal the first thing I did was forward to the last entries to see if the hike ended at Katahdin and if it did I looked back at the first entry to see if it started at Springer, and if it did I quickly scrolled through the listing of dates for journal entries to see if there were any gaping gaps and if there were I quickly looked at the entries on either side of the date gap to see if that meant that trail was skipped or a section of the hike was not detailed thoroughly enough for my study (and these quick steps eliminated a whole lot more journals), and only then did I go back to the journal start and begin tallying day by day things like zero days and the dates that landmarks were passed.

When doing this, if again the journal keeping was not thorough enough to reveal the info I wanted to collect, most of the time this revealed itself fairly quickly and I could move on to the next journal. Sometimes I would get a long way into the hike before the journal omitted info for a section and when this happened I would just have to shrug my shoulders and move on.

I kept two tally sheets for each journal. On one tally sheet I wrote down three things for each landmark a hiker reached: the date the landmark was reached, the number of days passed since the last landmark and the cumulative number of days for the whole hike up to that point. It looked something like this:

NAME OF HIKER ~~~ GEORGIA BORDER ~~~~ FONTANA etc. etc.
John Doe (March 1)......March 8 (8) [8]...............March 19 (11) [19]........
Jane Doe (March 12)....March 18 (6.7) [6.7]........March 25 (7.3) [14]......

I would write very small so I could get an entire year's worth of hikers on one sheet of paper. On the second tally sheet I would keep track of zero days taken in each section. It would look something like this:

NAME OF HIKER ~~ GEORGIA BORDER ~~~~~~~ FONTANA
John Doe.................1,1.....(2) [0] {2}..................1,4,1...(2) [4] {6}.....
Jane Doe..........................(0) [0] {0}..................1........(1) [0] {1}.....

In this case John Doe took 2 one day breaks in the first section and then a one day, four day, and one day break in the next section. The numbers in the various brackets are, respectively: total days taken in short term breaks in that section, total days taken in long breaks in that section, and grand total of zero days for that section. I would only fill these bracketed numbers in when I had gotten to the end of that hiker's journal.

Now for those who want more detail about the distributions of the data, here are some illustrations in the form of primative histograms. In each case I set the bin boundaries that the data are divided into before tabulating the data to try to prevent bias. First, here's the distribution of the days taken to complete the AT. On the left are the ranges for number of days and the number in parentheses that follows is the number of hikers who fall in that range (in the illustrations that follow if there are any outliers that are not practical to illustrate, I list them without graphics on the tail of the data they belong in):

(The following five illustrations have been updated in March 2008 to include the 2001 through 2007 hiker classes.)


ILLUSTRATION 1 -- Days to Complete AT

080-089 (01): X
090-099 (01): X
100-109 (03): XXX
110-119 (06): XXXXXX
120-129 (06): XXXXXX
130-139 (05): XXXXX
140-149 (14): XXXXXXXXXXXXXX
150-159 (19): XXXXXXXXXXXXXXXXXXX
160-169 (30): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
170-179 (32): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
180-189 (19): XXXXXXXXXXXXXXXXXXX
190-199 (21): XXXXXXXXXXXXXXXXXXXXX
200-209 (10): XXXXXXXXXX
210-219 (04): XXXX
220-229 (01): X
230-239 (01): X


This is the distribution of the days actually spent hiking the AT (covering at least one tenth of a mile), excluding zero days:


ILLUSTRATION 2 -- Hiking Days to Complete AT

080-089 (01): X
090-099 (02): XX
100-109 (07): XXXXXXX
110-119 (06): XXXXXX
120-129 (12): XXXXXXXXXXXX
130-139 (21): XXXXXXXXXXXXXXXXXXXXX
140-149 (40): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
150-159 (34): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
160-169 (30): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
170-179 (14): XXXXXXXXXXXXXX
180-189 (05): XXXXX
190-199 (01): X


This is the distribution of the total number of zero days taken during the course of hiking the AT:


ILLUSTRATION 3 -- Total Zero Days Taken

00-04 (08): XXXXXXXX
05-09 (25): XXXXXXXXXXXXXXXXXXXXXXXXX
10-14 (20): XXXXXXXXXXXXXXXXXXXX
15-19 (38): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
20-24 (35): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
25-29 (18): XXXXXXXXXXXXXXXXXX
30-34 (13): XXXXXXXXXXXXX
35-39 (06): XXXXXX
40-44 (05): XXXXX
45-49 (00):
50-54 (02): XX
55-59 (00):
60-64 (01): X
65-69 (01): X
70-74 (01): X


Here's the distribution of days devoted to Short Term Breaks (zero days of only one or two days in duration):


ILLUSTRATION 4 -- Zero Days Taken in Short Term Breaks

00-01 (03): XXX
02-03 (06): XXXXXX
04-05 (09): XXXXXXXXX
06-07 (15): XXXXXXXXXXXXXXX
08-09 (25): XXXXXXXXXXXXXXXXXXXXXXXXX
10-11 (25): XXXXXXXXXXXXXXXXXXXXXXXXX
12-13 (24): XXXXXXXXXXXXXXXXXXXXXXXX
14-15 (19): XXXXXXXXXXXXXXXXXXX
16-17 (12): XXXXXXXXXXXX
18-19 (12): XXXXXXXXXXXX
20-21 (12): XXXXXXXXXXXX
22-23 (04): XXXX
24-25 (03): XXX
26-27 (01): X
28-29 (02): XX
37 (01)


Here's the distribution of days devoted to Long Term Breaks (zero days of at least three straight days):


ILLUSTRATION 5 -- Zero Days Taken in Long Term Breaks

00-00 (44): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
03-04 (31): XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
05-06 (27): XXXXXXXXXXXXXXXXXXXXXXXXXXX
07-08 (11): XXXXXXXXXXX
09-10 (14): XXXXXXXXXXXXXX
11-12 (11): XXXXXXXXXXX
13-14 (09): XXXXXXXXX
15-16 (05): XXXXX
17-18 (04): XXXX
19-20 (02): XX
21-22 (03): XXX
23-24 (05): XXXXX
25-26 (02): XX
27-28 (01): X
40 (01)
43 (01)
49 (01)
61 (01)

Alligator suggested I might calculate the 10th and 90th percentile figures for the number of days to complete each section as one tool to try to judge whether the distribution of hiking times was screwy for this population of hikers (thorough journal keeping thru-hikers). I've calculated them and here is a table with the mean days to complete each section, the median, and the range of days taken to complete for each section excluding the fastest and slowest 10 percent of hikers for those sections. In addition, I'm including the range of days to complete each section for the hypothetical 4 and 7 month hikes (a fast hike and a slow one) that are referenced in Table 4 of the article. I created these hypothetical hikes in the first place by the statistically crude method of a simple ratio (if a seven month hiker took 1.271 times longer to hike the whole hike than the mean of 168.4 days then that hiker would take 1.271 times longer to hike each section too). In table 4 in the article I just give cumulative day totals rounded off to whole days but in this table I use section totals rounded to tenths. I'm including them here to show they're so darn similar to the 10, 90 percentile figures:

(Tables A, B, C, and D are for the years 2001-2005 only; Table E is for 2001-2007)


Table A -- Range of Days to Complete Each Section

MEAN ~ MEDIAN ~MID 80% ~~~4#-7# ~~~~SECTION
7.95.........8.0........5.7-9.7..........5.8-10.1........Springer-Ga. Border
7.71.........7.7........5.7-9.7..........5.6-9.9..........Ga. Border-Fontana
24.34........24.........19-31...........17.7-31..........Fontana-Damascus
28.60........28........21.5-36.........20.8-36.5........Damascus-Waynesboro
11.32........11..........8-14............8.2-14.4........Waynesboro-Harpers
19.32.......18.5......14.3-23..........14.0-24.7.......Harpers-DWG
12.32.......12.5.......8.6-15...........9.0-15.7........DWG-Kent
23.11........23........18.3-29.........16.8-29.5........Kent-Glencliff
9.60..........9..........6.5-13...........6.9-12.2........Glencliff-Gorham
9.79.........9.9........7.3-12...........7.2-12.5........Gorham-Stratton
13.77........14.........11-16...........10.0-17.5.......Stratton-Katahdin
167.83......172.......137-197.........122-214.........For entire AT


Alligator felt that stating a realistic range for each section hiked, excluding the extremes on either end, might be a better way to state typical progress for TJKs in my study, as this might be more useful for hikers than just a single, simple flat number whether that number be the mean or the median (I hope I'm correctly stating his opinion, here). I'm thinking that hikers can get an idea of these kinds of realistic ranges by looking at the Table 4 hypothetical hike values for each section and see very, very similar ranges to those given in the 10, 90 % percentile calculation, but in a form that's more intuitive for them. If readers with concerns about statistical methods want more depth than I'm giving them in the article proper, I can always mention at the end of the article that some of that information is available in this thread which will be possible to access through the "Released Articles Forum," if the day comes when this article is accepted. This is an idea I'm entertaining anyway, and I'll be curious to see what others think.

ARambler and Alligator suggested I take a closer look at "Hiking Days," so here is a table similar to the one above, but for Hiking Days (excluding zero days) instead of Total Days. The table includes mean hiking days to complete each section, median, the entire range of days TJKs took for each section (least days and most), and the "Middle 80%" range figure excludes the ten percent of hikers taking the most and least days to hike:


Table B -- Range of Hiking Days to Complete Each Section

MEAN ~~MEDIAN ~~RANGE ~~~~~MID80%~~~SECTION
7.52.........(7.7)........(3.6-11.3)........(5.5-9)..........Springer-Ga. Border
7.36.........(7.3)........(3.3-10)..........(5.7-9.3)........Ga. Border-Fontana
21.17........(21)........(11.4-28).........(17-26)..........Fontana-Damascus
24.22........(25).........(15-33)...........(20-28).........Damascus-Waynesboro
9.61.........(10).........(5.9-15)..........(7.2-12).........Waynesboro-Harpers
16.04........(16)........(9.8-22.4)........(13-19)..........Harpers-DWG
10.62........(11).........(6-13.5)..........(8.3-13)........DWG-Kent
20.80........(21).........(11-28)..........(16.4-25)........Kent-Glencliff
8.62.........(8.5)........(4.2-12)...........(6-11)..........Glencliff-Gorham
8.81..........(9)..........(4.1-15)...........(7-11)..........Gorham-Stratton
12.87........(13)..........(7-20)...........(10.7-15).......Stratton-Katahdin
147.64......(150).......(85-199).........(121-171).......For entire AT


I need some time to digest a lot of what is in ARambler's second post before I respond to it, but Table A, above, might shed some light on a question he had in his first post. He wondered if the distribution of days to hike the last section spread out from previous sections as some hikers got VERY determined to get the hike over with while others wanted to linger so the experience wouldn't come to an end. I know what he means from reading people's journals. But to my surprise the range figure in Table A that shows the days to hike the last section seems to show the opposite. The percentage difference between the 10th percentile hiker and 90th percentile hiker in that section is smaller than in any other section of the AT.

LostInSpace asked about the relationship between start date and length of time to hike the AT. The following table breaks TJK's into five groups based on starting date, but unlike the distributions above, these dates I did tally before deciding how to group them. There were several breaks in the distribution of start dates on the calendar where it seemed natural to place boundaries defining the groups. It shouldn't be surprising that thru-hikers starting later in the season took less time to complete the trail given that with Baxter State Park closing in mid-October there was little choice. The quickness of the very earliest group, though, takes a little explaining. It's my belief, based on anecdotal evidence, that a lot of novice hikers cluster around the March 1 and March 15 dates for one reason or another, while many hikers leaving before this, a group knowing they have a lot of true winter hiking ahead of them, are more likely to be veteran hikers and it seems likely that as a group veteran hikers might take less time to complete the trail than novice hikers. That's one idea, anyway. In Table C I give both the mean and median number of days for each group to complete the trail (the number of hikers in each group is given in parentheses after the date range):


Table C -- Time to Complete Grouped by Start Date

DATE RANGE ~~~~~~~~~~~ MEAN ~~~~~~~~ MEDIAN
Feb. 27 or before (13)..............165.0....................157
Feb. 28-March 5 (19)...............177.6....................177
March 6-March 17 (30).............170.0....................174
March 18-April 8 (29)...............164.7....................172
April 9 or after (14).................159.1....................166. 5


Some asked about the relationship between number of days to complete the AT and zero days taken, specifically for the very fastest and slowest hikers. So I split the TJKs into six groups based on their percentile ranking for days to complete the AT: the fastest 10% of hikers, the hikers in the 70th to 90th percentile group, 50th to 70th, 30th to 50th, 10th to 30th, and the slowest 10% hikers. I looked at how long these groups took to complete, the number of hiking days (HDs), zero days, zero days taken in Short Term Breaks (STBs), and Long Term Breaks (LTBs). I also computed the percentage of the total days to complete that were taken in the three zero day categories:


Table D -- Relationship Between Time to Hike and Zero Days

PERCENTILE ~DAYS ~~HD's ~~ZERO DAYS ~~STB's ~~~~~LTB's
90-100...........113.9....106.2.....7.7 (6.8%).......6.5 (5.7%).....1.2 (1.1%)
70-90............147.9.....136.0.....11.9 (8.1%)......8.2 (5.6%).....3.7 (2.5%)
50-70............165.4.....144.9.....20.5 (12.4%)....13.9 (8.4%)....6.6 (4.0%)
30-50............175.5.....154.2.....21.3 (12.1%)....14.0 (8.0%)....7.3 (4.2%)
10-30............188.9.....165.5.....23.4 (12.4%)....14.6 (7.7%)....8.7 (4.6%)
0-10..............208.0....167.0.....41.0 (19.7%)....19.0 (9.1%)....22.0 (10.6%)


Table D shows that the difference between the very slowest group of thru-hikers and the next slowest bunch was all in the number of zero days they took. There's hardly any difference in hiking days at all. On the other hand, that big group of hikers ranging between the 10th and 70th percentiles were consistent about the percentage of zero days they took (between 12.1 and 12.4%), so the significant difference in this big block of mainstream thru-hikers tended to be the "miles per hiking day," and not frequency of zero days.

Finally, here's a chart comparing the six different years in the study. Keep in mind, the study population for any one year is pretty small, so I don't think the values for any given year are nearly as reliable as the total values for the whole range of years, 2001 to 2007 ("HD" in the following chart means Hiking Days):


Table E -- Hikes Grouped by Year of Hike

YEAR ~MEAN ~MEDIAN ~HD MEAN ~HD MED~ZERO MEAN ~ZERO MED
2001.....173.9.....170.0.......149.3........146.0. .........24.6............24.0
2002.....175.1.....176.0.......153.7........152.0. .........21.4............22.5
2003.....167.5.....168.0.......148.3........151.0. .........19.2............21.0
2004.....169.9.....175.0.......149.4........150.0. .........20.5............17.0
2005.....158.3.....163.5.......139.8........142.0. .........18.6............17.0
2006.....169.7.....171.0.......148.3........149.0. .........21.5............20.0
2007.....164.9.....164.5.......145.9........147.5. .........19.0............16.5
Total.....167.7.....170.0.......147.5........149.0 ..........20.3............19.0

If the numbers for 2005 are representative, and not a quirk of the small number of TJKs involved, my speculation from having read so many journals is that the monsoons that hit New England around the time August was turning into September, and that pretty much lasted through the rest of the hiking season, prevented some late season hikers from completing the AT who might have finished their thru-hike in a more typical year. These late season hikers are by nature going to tend to have a higher number of days to complete their hike, so since this group might be fewer than in a normal year, this might account for why 2005's average "number of days to complete" total could be lower than a typical year.

LostInSpace
02-16-2006, 00:15
Do the data show the start date as a significance factor?

Alligator
02-16-2006, 00:35
2) I like to use the hiking days data separately. Also, especially from a statistical point of view, you should not combine hiking and not-hiking without testing for independence. What does a simple plot of the number of zero days versus the number of days hiked look like? I would guess that the slope below the mean would be more than proportional, i.e. reducing the days hiked in half from 148 to 74 would reduce the zero days by more than from 20 to 10. (This would presumably be an extrapolation of the line.) However, at the upper end, The slope might be less than proportional. A hypothetical hiker doing 148+74 = 222 hiked days, might have to reduce the number of zero days to complete the hike before winter. Therefore, Table 4 could be off a little.

Yes. Are the zero days and hiking days correlated? Probably at least weakly positive.

4) Skewed distributions: Thanks for the detailed mean versus median data. It is interesting to see how the mean catches up to the median. I guess the data show that the people who start very late, have to catch up relative to the median.
It does not surprise me that the total days distribution is skewed to the left and consequently the median is higher than the mean.

That doesn't mean its skewed. A single observation can create the difference. Imagine a nice symmetrical distribution where the mean and median coincide. Now, take one observation just left of the median and move it to the far left. It won't alter the shape of the distribution to any great extent, yet it will drag down the mean.

5) Outliers: I'm surprised outliers did not seem to be a concern to you. In 2005, Apple Pie left the trail in Erwin for about 50 days. This is over half of the LTB you report for the Fontana to Damascus section. Similarly, Stumpknocker took almost 365 days to hike the trail in 2004, but he hikes at a 4 month pace. (Hippy LS also did 360+ days but I don't think she had a complete TJournal. FB & Silver Girl took > 80 days off but their journal was not on Trailjournals.) I don't have much of a point to make about outliers, they are a part of life and a part of life on the AT. However, I think they are more a factor affecting zero days, and that's another reason for separating out zero days.

6) Variability:
a) By far the most common expression of variability is the standard deviation, or variance=std.dev. squared. It should be calculated using a spreadsheet of the standard deviation function on a programmable calculator. If you have to calculate it by hand:
Std.Dev^2 = [sum of each (day squared) - n*Ave^2]/(n-1) = [sum(Di*Di) - 2,957,525]/104; where Di = total days for each hiker, i, and 2,957,525 = 105 hikers*167.83*167.83 average days. For the Hiked Days it would be: Std.Dev^2=[sum(di*di) - 2,290,295]/104; Note, 105*147.69*147.69 = 2,290,295.
I hope you have the Excel function for sample standard deviation.
b) The easiest calculation for variability is range; just the longest minus slowest days. I believe one must assume a normal distribution to convert Range to (unbiased) variance (std.dev^2).
A rough guide is to divide the range by four to get an estimate of the population standard deviation. This is based on normality.
c) The other commonly used expression for variability is a confidence range. The most common range is a 95 % interval which for a normal distribution is about plus or minus 2 std. dev. from the mean. I recommend against using these confidence intervals with such skewed distributions. Note, the 95% confidence range means 2.5 % faster and 2.5 % slower. No, a 95% confidence interval for the mean is saying that if you took a sample of size n repeatedly, 19 times out of 20 the mean would be in that confidence interval. It refers to where the mean lies, not what observations are in the tails. Are you missing a chromosome or what?

Since the normal assumption makes the estimates symmetric, the confidence interval is often expressed as Average +/- Interval/2. e.g. 168 +/- 30 days. for a std dev about 15 days.

A 95% confidence interval for the mean would be mean+/- 2(sigma/n^0.5). In words, the mean plus or minus 2 times the standard deviation divided by the square root of the sample size. The value sigma/n^0.5 is referred to as the standard error. It is entirely reasonable to assume normality, as the distribution of the sample mean converges to the normal with large sample sizes. In this case, it is 105. It hasn't been demonstrated what the distributions for the numbers of hiking days for the sections or the whole trail looks like. They may not be noticeably/significantly skewed.
d) You could report the actual % interval as a pseudo-confidence interval. Just figure out the number of days that 2.6 hikers were slower and another number which 2.6 hikers were faster. What has been proposed is reporting the lower and upper numbers of -10% and +90 %. For the section data, I think you will find it difficult to interpolate between whole days for the -10%/+90% number, which in my mind is an arbitrary, non-standard percentile, pseudo-confidence interval.
It is arbitrary. It is the idea behind a trimmed mean, where the influence of outliers has been removed. If one were to take the 10th, 50th, and 90th percentiles in each section, the slowest and fastest hikers would be excluded, the median would serve as the the measure of central tendency, and no distributional assumptions would be violated. 5, 50, and 95 could be used also.
e) If you remove the variability associated with the zero days, you might be able to give a good representation for hiking variability just by reporting aggregate data. This data might also be easiest to understand and use in a statistic free way. I propose to aggregate the data for each section into five groups. Because the distances vary by such a large amount, the intervals for the groupings should also vary. I suggest that each of the five groups vary by m=1, 2, or 3 days. You would then report 8 values/section: g1.days, m, n.g1, n.g2, n.g3, n.g4, n.g5, Slow. I'm not sure whether the g1.days should be integer and the start of the interval. Assuming that it is, you would get numbers like:
5, 1, 12, 21, 32, 19, 11, 2. For the first section, 12 hikers would reach the GA line in 5.0 to 5.9 days, 21 hikers would reach the border in 6.0 to 6.9 days, 32 hikers in 7.0 to 7.9 days, 19 hikers in 8 to 8.9 days, 11 hikers in 9.0 to 9.9 days and 2 hiker over 9.9 days (optional). By calculation, 105-(12+21+32+19+11+2)=3 hikers less than 5.0 days. The relative distribution for the Damascus to Waynesboro will not be exactly the same, but if it was, the data would be reported as 17, 3, 12, 21, 32, 19, 11, 2. and the groupings would be: 17 to 19.9 days, 20 to 22.9 days, 23 to 25.9, 26 to 28.9 days, and 29 to 31.9 days. Slow hikers would look at this raw data and see 11 in 105 needed 8- 8.9 days food to reach the GA border and 23 to 25.9 days to get to Waynesboro, and would plan on packing this amount. (Hopefully, not all at once.)
What you are describing are the bins of a histogram. A histogram would be easier to follow. A boxplot for each section of the number of hiking days it took would be even better.

Alligator
02-16-2006, 00:57
I thought he was being pretty jerky, myself. And I'm not in academia, so you are obviously correct. Perhaps if I were more insulated from the practical, I could be more picky.
I'd consider your having your head surrounded by your ass to be well insulated from the practical. How's that for jerky?
But as far as I can tell, he applied the study to something the author did not extend it to, then said, it can't be used for this purpose. Duh.
I didn't apply the study to anything. I expressed reservations about where it could be applied. Duh.
Besides, you can take ANY study and trash it as he did.
No, I didn't trash it. I suggested that there could be bias present due to the way the sample was selected.
Different years might make a difference? Sure so the author coorelates them by year. Ah, but he didn't combine dry years and rainy years, did he. So he does. Ah, but one of his rainy years was really a dry year with a hurricane that skewed the numbers. So he puts it with dry years. Ah, but it was wet by definition of # of inches of rain. So the author puts it with the wet years. Ah, but it was a dry year with a hurricane.

So the author drops the year in question.

AH HA! Now you are selecting data!!!!!! Bad bad bad.
...
No, what the author could try is to see if the the means by year are similar or dissimilar. If they are dissimilar, this would certainly suggest that grouping was inappropriate. But hey, you know what, time is so static and such an unimportant factor in science that you might as well just throw it out. I mean nothing ever changes with time right? I'm sure that 3 weeks of rain, a heavy March snow, and unseasonably warm temperatures have absolutely no effect on a hiker's progress. And these things happen year in and year out exactly the same too.

Alligator
02-16-2006, 08:44
BTW domnokmis, is it Frosty I'm speaking with or Mrs. Frosty?

ARambler
02-16-2006, 15:16
Map man:
Thanks for your analysis and follow-up.
1) I take it from your methodology that you have not calculated the actual hiked days/section for any individual hiker. You just have just subtracted the total zero days from the total days to get Hiked days. So, you will need to start from your raw data to analyze the non zero day hiking rates. I think it would still be useful to plot zero days versus total days (including zeros). Not sure how to present it in this forum.

2) The easiest way to get a picture of non-zero data is to look at the non-zero hiking day Range for each section. You would need to go down both tables and subtract the zero days from the total days and compare that to the interim min and max for that section. Note, you already have the average number of non zero days for each section (my earlier post recalculated it from your %zero and mean data).

3) I found your Mid 80% data very interesting. I assume you are saying 10 to 11 hikers were below this range and 10 to 11 hikers above this range. For the entire AT, the lower bound, 137 is 31 days below the original mean and the upper bound, 197 is 29 days above the mean. I don't know how to statistically test this, but the distribution (without the 10% tales) seems only slightly skewed (about the original mean). Also, the sum of the section ranges gives an "upper and lower time" of 125.9 and 208.4 days. So, the same 21 hikers were not in the tails for all of the sections. (What I would expect.)

Alligator
Thanks for your comments.
1) I'm generally reluctant to trim the data to reduce outliers. In this case, I think outliers may be created by long stretches of zero days. So, although it is arbitrary and I don't know how to handle it statistically, your mid 80% range seems good to me.

2) If map man can provide the total range per section for the non-zero day data, I will be more willing to assume a normal distribution. What tests would you want to do before you make that assumption? If we are at least tentatively willing to assume the data is normal, I think we would divide the range by about 5 (for a large set of 105 data) to get an estimate standard deviation.

3) I'm sure I used the term "confidence interval" in a non standard manner. There are only one or two sample sizes of interest, 105 for the Trail journals data and possibly 1 for the readers expected value. I'm not sure what you are talking about a sample size, n.

4) This following quote is wrong but I'm sure you misunderstood. Statisticians must always be careful about confusing cause and affect. I said I guessed the total distribution was skewed to the left, e.g. more hikers would complete the trail in less than 137 days (31 less than the mean), than would hikers complete the trail in 199 days which is 31 days greater than the mean. This seems true based on map man's data, but I'm not so sure it's statistically significant. None-the-less, for all distributions that I have seen, distributions skewed to the left have a mean is less than the median. Do you have any real life counter examples? What criteria do you use to conclude the parent distribution is skewed?



That doesn't mean its skewed. A single observation can create the difference. Imagine a nice symmetrical distribution where the mean and median coincide. Now, take one observation just left of the median and move it to the far left. It won't alter the shape of the distribution to any great extent, yet it will drag down the mean.

...

What you are describing are the bins of a histogram. A histogram would be easier to follow. A boxplot for each section of the number of hiking days it took would be even better.

Thanks for the support on Histogram data.
Rambler

Alligator
02-16-2006, 17:43
Map man:
Thanks for your analysis and follow-up.
1) I take it from your methodology that you have not calculated the actual hiked days/section for any individual hiker. You just have just subtracted the total zero days from the total days to get Hiked days. So, you will need to start from your raw data to analyze the non zero day hiking rates. I think it would still be useful to plot zero days versus total days (including zeros). Not sure how to present it in this forum.
I too would seek to remove the zero days. Personally, I would hypothesize that zero days are a function of days hiked plus some highly random amount. I think it could be hard to model zero days because not only are they used for rest, washing clothes, and resupply, but for *** occurences, nice camp areas, pink blazing, etc.

Alligator
Thanks for your comments.
1) I'm generally reluctant to trim the data to reduce outliers. In this case, I think outliers may be created by long stretches of zero days. So, although it is arbitrary and I don't know how to handle it statistically, your mid 80% range seems good to me.

I don't like to take out outliers either, I prefer to accomodate them. If I was felt like giving myself a headache, I might try some M-estimators or other robust techniques.

2) If map man can provide the total range per section for the non-zero day data, I will be more willing to assume a normal distribution. What tests would you want to do before you make that assumption? If we are at least tentatively willing to assume the data is normal, I think we would divide the range by about 5 (for a large set of 105 data) to get an estimate standard deviation.

I would be interested in comparing the empirical data to theoretical distributions through Q-Q plots. If the data is compared with a normal distribution, based on the shape of the plot, skewness and heavy light tails can be demonstrated. Examination of boxplots will also suggest distribution. Shapiro-Wilks, Anderson-Darling, Lilliefors, other goodness-of-fit tests will give a numerical answer. Generally, many of the statistical procedures I use are robust to departures from normality. When examining residuals, I'm generally satisfied with "good" adherance to a normal probability plot. A lot of the numerical tests are not perfect anyway.

For whatever group you wish, if you want an estimate of the standard deviation, just use s, the sample standard deviation. I only mentioned the range/4 method because you had mentioned equating the range to the s.d.

3) I'm sure I used the term "confidence interval" in a non standard manner. There are only one or two sample sizes of interest, 105 for the Trail journals data and possibly 1 for the readers expected value. I'm not sure what you are talking about a sample size, n.

I was using the general formula for computing a confidence interval for the mean. You stated the confidence interval as approximately +/-2 times the s.d. Follow this link. http://davidmlane.com/hyperstat/B7483.html
What was missing is you left out the division by the square root of n, the sample size. Here n=105.


4) This following quote is wrong but I'm sure you misunderstood. Statisticians must always be careful about confusing cause and affect. I said I guessed the total distribution was skewed to the left, e.g. more hikers would complete the trail in less than 137 days (31 less than the mean), than would hikers complete the trail in 199 days which is 31 days greater than the mean. This seems true based on map man's data, but I'm not so sure it's statistically significant. None-the-less, for all distributions that I have seen, distributions skewed to the left have a mean is less than the median. Do you have any real life counter examples? What criteria do you use to conclude the parent distribution is skewed?
I agree with you that a skewed left distribution will have a mean less than the median. The converse is not necessarily true for sample data. Having a mean less than the median does not always mean skewness. Due to random variation, it is entirely possible to have a normal distribution where the mean and median do not coincide. Even more so if you are measuring in discrete units like days and taking means down to the tenths or hundredths place. In fact, I entirely expect the mean and median to not be exactly the same in a sample. In Map Man's data, four of the sections have means less than the median, the other seven above. The overall mean is less than the median.

What criteria do you use to conclude the parent distribution is skewed?
Compare in a Q-Q plot to a standard normal, as stated previously. Probably compare it to any theoretical symmetric distribution and it ought to show skewness. The way the plot deviates will tell you how the distribution is different: left and right skew, heavy and light tails. Honestly, I always have to look up the shapes as a I forget which is which. Alternatively, guess the parameters correctly for a hypothesized skewed distribution, and look for a match on the Q-Q plot. That's much harder though.

Do you have any real life counter examples? See attached. The data are red maple tree heights from the same plot. I mostly need to check residuals for normality, so I didn't have anything specific handy. This was however, the second data set I pulled. The data are normal by Anderson-Darling goodness of fit, but the mean is 69.49 and the median 70. The slight variations between the mean and median can be more pronounced in smaller data sets.

...

I too would seek to remove the zero days. Personally, I would hypothesize that zero days are a function of days hiked plus some highly random amount. I think it could be hard to model zero days because not only are they used for rest, washing clothes, and resupply, but for *** occurences, nice camp areas, pink blazing, weddings, funerals, etc.

Sorry to keep you in the crosshairs there Map Man. I would be happy to look at the hiking days per section in order to answer some of my own questions. What I would be interested in doing is to plot the number of hiking days by sections and comparing the distributions across years and across start dates. Really just a visual check. If you want that is, it's your data.

map man
02-16-2006, 22:43
From this point forward I'm planning to post the detailed responses I have to member's questions and suggestions that deal with further elaborating the data or how better to employ statistical methods for this study at Post #28 of this thread. But I still want to hear from and respond to anybody who has anything to say about the article at all.

And speaking of responses, although I've been trying to address people who have specific suggestions to improve the article, there's a long list of people who've been letting me know that they enjoyed reading the study, so I want right now to say, "thank you."

Jack Tarlin
02-16-2006, 22:54
I think this is all fascinating stuff; thanks for working on this and sharing it with us.

map man
02-19-2006, 20:02
I've updated post #28 of this thread by editing into it some new info that some WhiteBlaze members requested.

ARambler
02-19-2006, 23:00
Thanks for the continued updates.
Rambler

FiveWay
02-20-2006, 22:22
Great work. Thanks for the taking the time. As I finish my prep work for my 06 Thru-hike this help me to look at what I had planned and what to expect. FiveWay

map man
03-20-2006, 22:34
I'm curious now that I've made several edits and additions to the main article over the last few weeks whether people think the article "works." Is it useful? Is the information presented in as clear a fashion as possible? Is it accessable -- that is, does it avoid being too complicated and avoid only appealling to statistics nerds like myself? I've arrived at the idea of referring readers to the article thread (specifically, post #28) for a more in-depth discussion of the data. Does that seem like a good idea? For those who've brought up issues concerning the statistical methods used, have your concerns been addressed in either the article or post #28? And finally, does the article in its present form meet the standards that an article here at WhiteBlaze ought to meet?

Now that the weather is nicer here in my neck of the woods (Iowa), I'm out hiking on weekends instead of messing around with the raw data used in the study like I was this winter, but I'd still like to hear from people if you have any suggestions at all to make the article better.

sdoownek
03-21-2006, 04:38
I always find it iteresting to see how mathematicians do things.

To that end, I note that there are several of you that have commented on this thread that seem rather, to use a term previously mentioned, "jerky". To those people, I would ask that you read the following commentary:

http://80below.com/archives/114-You-guys-are-tired-of-looking-at-Schroedingers-dead-ones..html

While it doesn't directly apply to the discussion at hand, it does apply to the people willing to defame and unreasonably question the author.

Just a thought. Or a slight shift, if you will.

joel137
03-27-2006, 23:38
Just for amusement and comparison I thought I'd post a section hikers equivalent data

Winding Stair Gap -> Amicolola 10 days/115.4mi
Winding Stair Gap -> Hot Springs 14days/167.5mi
Hot Springs -> Pearisburg 26 days/343.5mi
Pearisburg -> Harpers Ferry 28 days/ 379.4mi
Harpers Ferry -> US 7 MA 35 days/ 499.4mi
US 7 MA -> Katahdin 51 days/ 646.1mi

Broken up in the same segments as listed for the thrus

Springer to GA border 8 days
GA border to Fontana 7 days
Fontana to Damascus 21 days
Damascus to Waynesboro 29 days
Waynesboro to Harpers Ferry 11 days
Harpers Ferry to DWG 19 days
DWG to Kent 12 days
Kent to Glencliff 22 days
Glencliff to Gorham 9 days
Gorham to Stratton 10 days
Stratton to Katahdin 14 days

The above is actual hiking days, I only did one zero day in damascus, so depending on how you want to count an extra day might be added there.

roxy33x
03-30-2006, 16:28
I am going to be doing an overnight hike with my husband and two dogs and I was wondering if anyone had any recommendations in the NC/TN area or anywhere near... I live in Charlotte. It just needs to be dog friendly... Preferebly on the AT.
Thanks

cutman11
03-31-2006, 00:53
Absolutely a great piece of work, map man. I am also a section hiker but have not completed the entire trail yet to add my section data, but I can say that the numbers generated correlate very well with my own hiking log as far as hiking days from GA to PA, where I ended last fall. I would be interested, if its not too much trouble to generate, a table of the location and frequency of the zero days, with some sort of "probability" that a zero would be taken at a certain zero day location, ie, Damascus- 95% probability of a zero...Hiawassee 45% probability of a zero,etc, that sort of thing. Or list the zero locations from most to least frequent. I have been keeping a notation of "zero days" I would have taken if I had been thru hiking, just to give myself a sense of the total days I would have used if I had done the hike all at once instead of sectioning. It would probably be useful for future thrus to consider "most likely" zero stops when planning their thrus.

joel137
03-31-2006, 09:32
One observation from my one data point;)

regarding differences between sectioning and thruing

It would appear that sectioners are less prone to taking zero days.

My guess would be that sectioners are more on a schedule than thru's.

I only did one, and in my current second go around there have been no zero days, I coubt there will be any more while sectioning, barring medical concerns.

With luck I'll get the chance to thru when I retire, in ten years with luck of a different sort. I imagine I will do zero days in the course of that journey.

rickb
03-31-2006, 10:51
Wow, that is great stuff.

To ask for more would be gready, right? But I will anyway.

I'd be interested to know how different demographics within your sample conform to the averages. Are there any differences between how men and women hike? Is age a signifcant factor of zero days? That sort of thing. Do people starting in early March end up spending much more time on the Trail than those starting in April? Are the fastest starters more likely to finish?

I guess what I am saying is that I really enjoyed what you did and hated to see the end of the thread!

map man
03-31-2006, 14:53
To respond to some recent posts:

joel137, my guess is you're right that section hikers take fewer zero days than thru-hikers, but I have no objective numbers to back that up. I'd say that your long section hikes (I know one was over 600 miles long) while taking no zero days at all (just that one in Damascus) are a little outside the norm, I would guess, even if section hikers in general do take fewer zero days.

rickboudrie, the demographic breakdowns you're talking about I can't really do right now with my data. I can't break the hikers down by age because not everyone at trailjournals.com reveals their age. (edit insert: And as of March 2008 there are only 26 female hikers in the study and that's too low still to include a gender breakdown in the main article) However, there is some info in post #28 in this thread about the relationship between start date and days to complete that you asked about. The group leaving in early March did take longer than those leaving in April or later, but again the problem is that once I start breaking the hikers into five or six groups based on anything the sample numbers start getting pretty small (I believe there were somewhere around 19 in that group that I'd define as "early March") to make reliable generalizations.

cutman 11, I think looking at the percentage of thru-hikers who take zero days in each trail town along the way, and comparing each town that way, would be a cool thing to know, and it had not occured to me until you mentioned it. The way I have the data recorded, though, I can't pull those numbers out right now (I could tell you how many zero days each hiker had in the DWG to Kent section, for example, but not how many for just DWG itself). I'd need to go back to each of the 173 journals in my study and figure it out. But when cold weather descends again next winter, and I hang up my hiking shoes for the year, and I have the time again to take on the task, that's something I will consider doing.

cutman11
04-01-2006, 01:00
thanks for the reply map man, and in a sense, the result of my request would be the defacto vote of which is the most popular "trail stop" !!!
Will look forward to your data when youre able to get to it.

Vi+
04-01-2006, 11:03
Map man,

Thank you for providing the results of your analysis.

You “... decided to (calculate) how long the typical (Thru-Hiker) takes to hike the AT (section by section).”

You gathered nominal data, crunched the numbers, and then reported some of your mathematical deductions as hundredths of a day. Some users have overlooked the humor inherent in appraising six months in 2.4 hour segments.

I never considered being able to determine exactly where I and everyone else would stand at every moment of every day. You’ve presented some very handy information.

Thank you, again.

SGT Rock
04-27-2006, 13:39
I'll get this one moved over someday as well ::eek:

nhalbrook
06-14-2006, 23:04
If there is info in the data to identify those who are seasoned hikers and those who are not, would be interesting to see what that separation shows in terms of zero and hiking days.

Tramper Al
06-15-2006, 10:54
Mapman,

This is great stuff - thank you for your efforts.

I too am well trained in and occupied in the practice of the methods that have been used and could be used to analyze such data. I am also well aware of the issues concerning sampling and generalizability here.

My only suggestion is with regard to "outliers". I think you should take note of your stongest, loudest, most zealous and pompous critic, and ignore him.

Thanks again, very interesting.

Alligator
06-15-2006, 12:17
...
I too am well trained in and occupied in the practice of the methods that have been used and could be used to analyze such data. I am also well aware of the issues concerning sampling and generalizability here.
...


It's OK to take the cheap shot Al, that one was pretty good. Whomever you are speaking to is probably laughing. If I were listening in to this discussion between well trained professionals though, I'd certainly want to know why you believe it's ok to gloss over the sampling and generalizability issues here.

I'd also be wondering Al, given your experience, what do you think about the mean differences in hiking times for the years 2001-2005 given in Table E, post #29. Like, do you think there are any significant differences between the years and how that might affect generalizations. I have some thoughts on the subject, but I'm a little bit shy so I'll let you go first. Let's hope that the zealous, pompous, LOUD, and STRONG critic doesn't pop in in the interim.

Alligator
06-16-2006, 10:12
What's up Doc, can't answer the questions? They're actually very relevant. Doubts are surfacing regarding your expertise:-? .

Alligator
06-30-2006, 12:48
I didn’t want to leave this hanging. Initially, I raised the issue that year could be an important factor regarding hiking time. I was accused of being quite a number of things. Whatever. Here’s the crux of the problem that I see. If there are significant year to year variations, it could be very inappropriate to average across years. I have returned to this issue because the information regarding years was not initially included and was added later.

I took a moment to graph out the hiking day means by year. I put these into two figures. One figure uses a narrow range of days on the y-axis, the other I used a wide range to include the whole range of hiking days reported. The mean is plotted as a solid black line throughout. Looking at either figure, it appears that there are differences between years, most notably 2002 and 2005. There’s just about a 14 day difference between those two years, which is slightly less than 10% of the overall grand mean of 147.6 days. That’s a strong suggestion to me that there are temporal variations. Now, it could be that these two years are just random fluctuations. This would happen if the overall variance of the observations was wide. It’s hard to say without the entire data. If this had been a random sample, this could be tested with a simple ANOVA or a nonparametric test if the group variances were not equal. And if you wanted to discount the non-random selection procedure, it could be tested anyway.

Call me a stickler, but I’d want to know the whys of this potential difference before I started averaging over years.

map man
07-10-2006, 21:53
Some observations on how or if this article can be used for planning or gauging a thru-hike, now that I've had a few months to reflect on it:

First off, it's hard to know if the group I studied, thorough journal keepers (TJKs), are representative of all NOBO thru-hikers. You could make the case that maybe people so disciplined about journal keeping are disciplined in other ways too and more likely to get to Katahdin quicker than normal. Maybe all the time spent at computer terminals or typing away at pocket-mail devices would slow them down compared to other hikers. It's hard to know.

So I'd be hesitant to look at the "average" number of days in Table 1 in the article that it took TJKs to get to each landmark and conclude when comparing myself to this group that I was a fast or slow hiker. I wouldn't want hikers taking twenty days to get to Fontana instead of the 15.8 average days in Table 1 to get discouraged about their chances of completing their thru-hike. The same thing goes for those who happen to take more than the "average" of around twenty zero days on the way to Katahdin.

What interested me most when I set out to do this study was to figure out how hiking rates varied from one section of the trail to another. So I think it's more pertinent to know, for example, that the "average" NOBO TJK covers about 60% more miles per hiking day in the Damascus to Waynesboro section than in the Springer to Georgia border section. The exact numbers, sixteen miles per hiking day versus ten, aren't as useful for planning or gauging purposes as the underlying proportionality is.

And I can't see many reasons why this proportionality between sections would be significantly different for my study group, TJKs, than it would be for the entire NOBO thru-hiking population (though I can't prove that statistically). So what I think is the most useful and reliable are the numbers in Table 2 in the article, miles per day and miles per hiking day, when looked at for the underlying proportionality and not as exact predictive numbers. Likewise, the numbers in Table 4, where progress is projected for four, five, six and seven month hikes based on those proportions between sections, should also prove useful, I think.

Noting Alligator's reluctance to group hikers from different years and different starting times I decided to compare the progress of TJKs, breaking them into groups based on: 1) the year they hiked; 2) the starting date; 3) the speed of the hike (looking at the sixteen fastest hikers, for instance, who actually did average four months to complete the trail, etcetera). I computed the amount of time for the mean hiker to reach each landmark as a percentage of the total time to hike the whole trail. I computed the same thing for each of the sub-groups based on the three criteria I mentioned and I looked at the total days to hike as well as the "hiking days" (excluding zero days). So I ended up with six separate tables. I'm not going to reproduce all six here, but instead will show the one table where sub-groups varied the most from the overall mean percentages, to show just how closely even the most widely varying sub-groups compared to the overall mean.

The following table shows five different groups based on start date. I used the same grouping I did in Table C in post #28. The earliest group started Feb. 27 or before; group #2, Feb. 28-March 5; group #3, March 6-17; group #4, March 18-April 8; the latest group, April 9 or later:

(Table includes only hikers from years 2001-2005)


Table F -- Percentage of Total Hiking Time to Reach Each Landmark, by Start Date

Earliest~Group2~~Group3~~Group4~~Latest~~Mean~~Lan dmark
4.8%.......4.8%......4.9%.......4.7%......4.4%.... ..4.7%....Georgia border
9.4%.......9.3%......9.7%.......9.2%......8.7%.... ..9.3%....Fontana
24.3%.....24.4%....24.2%.....23.3%.....22.8%....23 .8%....Damascus
41.0%.....41.3%....41.5%.....40.8%.....38.8%...40. 9%...Waynesboro
46.8%.....47.8%....48.8%.....47.6%.....45.6%....47 .6%....Harpers Ferry
60.2%.....58.8%....60.1%.....58.6%.....57.6%....59 .1%....DWG
67.1%.....66.5%....67.3%.....65.9%.....65.2%....66 .5%....Kent
80.6%.....80.1%....80.9%.....79.9%.....79.3%....80 .2%....Glencliff
86.1%.....86.2%....86.3%.....85.5%.....85.7%....86 .0%....Gorham
91.8%.....92.0%....92.0%.....91.5%.....92.0%....91 .8%....Stratton
100%......100%.....100%......100%......100%.....10 0%.....Katahdin


I put in bold the single number in all six comparisons that varied the most from the mean. TJKs leaving April 9 or later took 38.8% of their hike to get to Waynesboro while mean TJKs took 40.9% of their hike to get there. A very small difference. This late starting group took a little over five months to complete the trail and if you compute how many days the "typical" five month hiker takes to get to Waynesboro -- sometime on the 63rd day -- you find that a "typical" five month hiker leaving April 9 or later would get there between two and three days earlier. Again, a very small difference. And keep in mind, that is the single greatest variance of all the numbers I looked at for start year, start date, or hiking speed -- a lot of numbers.

So for the purpose of figuring out "typical" hiking progress based on proportions between sections, I think it's legitimate, and even desirable, to lump all 173 TJKs together (and now my "Alligator-sense" is tingling -- I've got a feeling I'm going to hear from him on this one).

It was a useful exercise to go through, though. As small as the variances were, I did find out that start date made for a slightly greater variance than start year, and that both of these definitely showed more variance than that based on the speed of the hike. Not surprisingly, "hiking days" (excluding zero days) showed less variance from the means when comparing these sub-groups than total days did.

It would be better to be able to sample all AT hikers instead of relying on the self-reporting of hikers keeping journals at trailjournals.com, but how would one do it? Just imagine a modern-day Marlin Perkins and his buddy, Jim (for those of you old enough to remember "Wild Kingdom"), bringing down thru-hikers at Springer with tranquilizer guns and fitting them with radio collars or GPS devices and then watching the hikers wobble off down the trail as the tranquilizer wears off. Just picture it with me. Am I the only one here warped enough to find that entertaining?

Mountain Dog
07-15-2006, 17:40
Very interesting article and it can be used by a lot of us. Thanks for all of the good work.

Alligator
08-01-2006, 15:54
...
I'm not going to reproduce all six here, but instead will show the one table where sub-groups varied the most from the overall mean percentages, to show just how closely even the most widely varying sub-groups compared to the overall mean.
...
Theoretically, the usual accepted practice is to test whether the group means are equal. Then interest generally moves to which are different. Do the overall F-test, then use some multiple comparisons procedure to figure out which groups are different from which. This differs from what you have done because you have compared the groups to the mean. The differences are more pronounced when comparing the min proportion to the max.

Further, a more complex problem is that every section is correlated with the last, since the overall measurement is a proportion. I'm not trying to pull some mumbo jumbo here either, a multivariate stats class would help a lot. The proportions for each section are measured on the same hiker. There are some number of hikers in each group, with multiple measurements on each hiker. This creates a multivariate analysis. There would be a group mean vector. This group mean vector would have nine elements, representing the mean proportions for each section. The 10th section group mean proportion is redundant, because the proportions all sum to 1. Still with me:) ? Then, if one really wanted to know if they were different, one might use a multivariate test such as a multivariate analysis of variance MANOVA. Then univariate tests at for each section group mean. Technically, what you are interested in is if the profiles are nearly the same. See the chart in the attached spreadsheet. Most of these profiles do look the same, based on the scale of the graph. However, there is some difference between early and late starts, see the table. The differences are not the same for all sections. This means the profiles are not parallel, indicating a somewhat different pattern to the groups.

Other differences:
Group 3 took longer to reach HF then did early or late groups.
The latest group is taking the least amount of tim to reach Waynesboro, while the other groups are fairly close to one another. This suggests the late group is moving a bit faster, possibly feeling early pressure to move along or maybe better weather. I'm not sure.

I'm not bringing in the complex topic of MANOVA to cloud the discussion. What I want to point out is that the profiles could have been completely out of sync, like a sine and cosine wave, yet the manner in which you approached looking for differences, the univariate approach, could have completely missed a situation of that nature.

Normally, I'd test the data first, then look for the differences. It avoids data snooping:-? . For this response variable, given the small differences, I think you have a pretty good case. That is the differences in proportions are small enough to not worry about the lump grouping.

BUT, you did change the response variable to a proportion in order to compare the groups. What you did was to in effect change the topic of discussion, answer the question as applied to a different measure, get the answer you wanted, then innocently presented this as a form of rebuttal to my original caution about grouping the TJK's:sun . My argument centered on the total number of hiking days by years, and how the groups appear different. Lump grouping appears reasonable for one measure, but not the other. OK?

Jack Tarlin
08-02-2006, 13:05
I'm not remotely an expert on mathematics or statistics, so I'm not going to comment on Map Man's supposed methadology, biases, weaknesses, etc.

What should be mentioned, tho, is that in my opinion, the information he's providing re. how long it takes the average person to hike a certain section, is, in my opinion, quite accurate information indeed. There are perhaps one or two places where I think the "typical" thru-hiker (if indeed such a thing exists) might in fact require a bit more time than MM's figures indicate, but on the whole, these are very minor quibbles. However he obtained or came by these figures, I happen to think they're pretty accurate, and therefore, I think this article can be very useful for folks in the planning/preparation stage of their trips, especially those who are attempting to figure out their approximate itinerary, schedule, etc. This is particularly useful information for folks who are wondering if they're giving themselves enough time for their hike; or folks who need to be in a specific place at a specific date and are wondering how much time they'll need to get there, etc.

I think this is a great piece of work and a very worthy contribution to Whiteblaze.

StarLyte
08-02-2006, 13:48
This is incredible. I'm actually going to print this out to study it. Thank you for posting it.

cutman11
08-02-2006, 14:10
Hey B Jack, since you have weighed in on this most interesting thread, I would ask you for a prediction of the results of my request made earlier in the thread to map man regarding most common zero day stops along the trail. I had the impression 20 - 25 zeros were likely the most common, but which towns will have the highest likelyhood for being a zero day? He indicated he would try to generate the data next winter when he has time to go over the journals again, but from one who has your breadth of experience, what would you predict his data will show?

Jack Tarlin
08-02-2006, 15:49
Wow.

In re. to zero days, everyone is different. Some folks go thru the whole Trail and take hardly any; some have as many as 30 to 50.

I'm not sure what the average would be.

But off the top of my head, people seem to take more in the first half of their trip, partly because they seem to need them more, and partly becase there are more facilities. Of course, there are other factors, like one's budget, whether one is a partier, etc.

So here goes.....

*Some folks will take a day off as soon as Neels Gap or Helen, but these are usually people who are out of shape, are hurting, etc.

*Quite a few folks take a full day off in Hiawassee, especially if the weather has been rough (Keep in mind that the earlier one starts a Northbound thru-hike, the greater the likelihood of encountering bad weather, and the more days off one is likely to take.)

*A lot of folks have been zero-ing in Franklin, especially since folks like Ron Haven have made itsuch a hiker-friendly town.

*Some folks will zero in Fontana Dam, tho the R&R facilities (lodging options, restaurants, etc., are limited).

*Many folks take a day off in Gatlinburg, especially if they've had a rough stretch in the Smokies.

*I'll zero at Standing Bear Farm as it's such a cool place.

*Lots of folks zero in Hot Springs, not because there are a lot of facilities (there aren't) but mainly because it's such a friendly little town.

*A great many people zero at Erwin, especially if they stay at Miss Janet's.

*Some zero at Kincora, but most continue and take time off in Damascus.

After arrival in Virginia, people seem to take less time off-----they're healthier, fitter, have lighter packs, and are covering distances between towns sooner. Also, the weather is more co-operative and folks simply seem to need less rest time.

*Some folks, but not a lot, will zero in Bland or Pearisburg. Likewise, Daleville/Troutville. But many won't take a full day off til Waynesboro which is spread out, but has great facilities. Before Waynesboro, some hikers will take some down time at Rusty's, but lots fewer than did in years past.

*Some folks, but not a lot, zero in Front Royal or Harpers Ferry.

By now, for most folks, it's getting close to the 4th of July. Most hikers take some time off at this point, especially if they have friends or relatives that live nearby. Many take time off in Trail towns (Duncannon or Delaware Gap; others may go into such places as Philly or New York).

*Zeros in New Jersey, New York, or Connecticut are few and far between for a lot of folks. Facilities are limited and more expensive than ones in the South.

*In Massachusetts, some folks will take some time off in Great Barrington, Lee, or Dalton, especially if it's been really hot. I zero in Williamstown because I really like the place.

*Not a whole lot of stops in Vermont, tho some will take time down in Bennington, Manchester Center, or especially Rutland.

*People don't zero in Hanover as much as they used to, now that the Dartmouth dorms have pretty much stopped taking in hikers. But it's still a great place to take some rest. Of course, I'm biased, as I live here.

*Some, but not many folks zero at the wonderful hostel in Glencliff, or in Noth Woodstock, especially if the weather has been rough.

*Many folks zero in Gorham, as they've just left the White Mountains and are about to hit the toughest parts in Maine.

*Lots of folks take time off in Andover, as the hostels there are great.

*People might overnight, but in most cases, not zero in Stratton, Rangeley, or Caratunk, unless the weather's been bad or they're a bit banged up.

*About fifty per cent of hikers will take a last zero day in Monson before hitting the last stretch.

Hope this answers some of your questions.

But off the top of my head, around 20-25 seems about right for most folks. It works out to roughly one day off a week for most folks.

bfitz
08-02-2006, 17:01
I kind of took it like a work week, with one or two nero or zeros per "week". Mabye a one week "vacation" (trip home or something...theres often a wedding or a funeral or the like that has to be attended...) as on weekends at home, these times were for partying and resting. I'd say 30 or so zero days is just about right especially if preceded by "nero days" plus slack packing etc. It's important to have a good time and not wear yourself out, or else you'll lose motivation. Some of those places mentioned by jack might suck you in for several days of slackpacking and mental regeneration ... don't resist an offer of a good thing or a slackpack too much...they may not pass your way later, so take them when offered/available! # 1 priority is to enjoy the hike! Some forget to do this....

Alligator
08-02-2006, 17:50
I'm going to jump on the bandwagon for a moment. I was thinking that Map_man's proportionality argument was very insightful. It lead me to thinking some and I came up with an interesting formula. Before I state it, I don't want any credit for it as, without reading through this thread for the 10th time, I think am simply rephrasing/rearranging some of Map_man's ideas.

Here's the formula. Take the amount of time it takes to hike from Fontana Dam to Damascus and multiply it by 7 (6.9 to be more exact). That's how long the hike will take. If you like it and it works call it the M&M formula.

Map_man presents the mean percent of time in each section in his post 56 Table F. After I plotted the profiles for the start date groups, I really wondered why the groups were proportional like that. Since Map_man said the years and hiking speeds were too (nearly), it would mean that something else was potentially the main agent.

I took the miles for each of his sections and figured out what percentage of the trail the section accounted for. For instance, from Springer to the GA border is 77.3 miles, and (77.3/2175)x100~3.6%. I deconstructed his table F to % time in each section, instead of cumulative % time in each section. The mean % of time spent by TJK's in the GA border section is 4.7%. I did this for all of the sections and it's in the percentage sheet in the attached xls file. If you look at chart 2 in this file, you will see that TJK's mean % hiking time nearly mirrors the % of miles in the section. Early on it seems that hikers spend a little more % of their time in GA then there are % miles. That's completely understandable. Then a near parallel, with some change in the Glencliff to Gorham, and Gorham-Stratton sections. Map_man mentions one of these in his first post.

The correlation between mean % time in a section and % of miles in a section is 0.987. That's nearly 1:1. So, I picked the Fontana-Damascus section mean % time in section-14.5%, % of miles in section-13.7 because it is early in the hike, but has allowed enough time for hikers to get their trail legs. 100/14.5~6.9 or 7.

Now I'm taking a foot off the bandwagon for a moment. In table C, post 29, the difference between group 2 hikers (dates) and late hikers (date) is 18 days. This is a second observation about potential group differences. The first was the difference in completion times by year. It was 14 days.

Did this have a big influence on what I stated above. I doubt it. Why? There's a strong relationship between the mean percentage of time spent in a section and the % of miles in the section. Since it's percentages and not the actual hiking rate, year to year and start date differences in number of days is not important. Any bias I protested will be incorporated into an average number of hiking days which becomes an average hiking rate, which determines the time in a section, but the time becomes a percentage. I don't see any bias mucking things up.

Expansion of the formula. Multiply the number of days spent in the Fontana-Damascus section by 6.9. That's the total hike time. Then, either use the mean % of time spent by hikers spent in that section, or simply find the percentage of miles in the next section as a % of the total trail and multiply this by the hike time.

Example. 20 days to get from Fontana to Damascus. Total hike 6.9x20=138 days. How long to go 217 miles? (217/2175)x138~14 days.

Two feet off the bandwagon. In order to prove this conclusively, a random sample would surely help;) .

HapKiDo
08-06-2006, 12:33
My hiking partner and I plan a different strategy on our 2007 Thru Hike. We'll be hiking by "hours" rather than miles. It should prove interesting and maybe helpful for the Thru Hikers in following years. We plan to start hiking six to eight hours a day and increase the time hiked until we're hiking twelve or thirteen hours a day. What I propose is that we won't need zero days because we won't be having 'high mileage' days to wear us out. We'll take a long middle of the day break to eat our big meal and rest and recuperate and then hike the remaining 'time' for that day.

This also means that we have to be cognizant of where water and camping spots are and maybe can't be used in the Smokies due to the restrictions there.

I'll post an article after our Thru Hike explaining hours & mileage and how it all worked out. (Possibly our data will be very close to the information in this article.)

map man
08-08-2006, 23:30
I got back this weekend from a week's hike on the Superior Hiking Trail and saw there had been a number of new replies to this article.

First off, thank you Jack Tarlin for the very kind words. If the hiking rates in my study for the various sections really do come pretty close to what you've seen through all your years of hiking and observation, I'm really encouraged. Thanks again.

Alligator, I have indeed taken your reservations about grouping together different years and starting times for the total time to hike different sections and the entire trail, and pretended they applied to the proportions between sections for these different years and starting times. You caught me red-handed;)! However, I will say that it was largely your concern about the discrepencies (for instance, the fewer days it took my 2005 group to thru than my other years) that was on my mind when I decided to check to see if the proportionality between sections really held up for the various sub-groups based on start date, year, and hiking speed.

Your interest in looking at the time to hike a particular section of the trail (you used Fontana to Damascus) to project a likely total time to hike the whole trail is indeed something I've thought about. In fact, I'd even thought about Damascus as a useful place for a hiker to think about this, although I was thinking in terms of the total time from Springer to Damascus (since all hikers seem to have the date they started from Springer pretty firmly in their minds, while other intermediate dates often get a little hazy). If the entire trail up to Damascus was used the multiplier would be 4.2 instead of your 6.9 (based on Fontana to Damascus). Of course, if any hiker needed to take two weeks away from the trail at Hot Springs, both of our numbers would end up out of whack. You and ARambler have mentioned more than once that this sort of thing is why using calculations based on "hiking days" rather than total days would be a more accurate barometer.

And this illustrates the balancing act that's at play in the article. On the one hand is the statistical integrity of the numbers in the study. On the other is the intuitiveness, understandability and usefulness of the numbers in the study. So in this case, a hiker keeping track of "hiking days" would definitely be more likely to be able to project accurately total hiking days for the whole hike based on the hiking days for a section like Fontana to Damascus, instead of using total days with all the variability that zero days bring to that number. But of course at any given point on the trail most hikers know what date it is and know what date they started their thru-hike, but it seems like most hikers are less likely to be able to remember exactly how many zero days they've taken, especially the further along the trail they get. That's one reason I use total days and not "hiking days" in Table 1 in the article, the table that shows the number of days TJKs took to get to various landmarks on the trail.

I realize you didn't even raise the "hiking days" vs. total days issue when you talked about projecting hiking time for the entire AT based on hiking time for a section. It was just something I've been thinking about.

Finally, cutman 11 and Jack Tarlin are making me more and more curious about what trail towns will prove to be the most popular for zero days when I do go back to the journals and figure that out. My money is on Damascus to top the list.

Jack Tarlin
08-09-2006, 11:41
I agree that Damascus is at the top of the list for zero days, and would be even if one didn't count days taken off for Trail Days. It's simply a great place to take some down time.

Following Damascus, I'd say Hiawassee, Franklin, and Gatlinburg are way up there, partly because they come so soon in the trip, partly because there are lots of facilities/lodging/dining options, and partly because a lot of folks are wet, tired, and kind of banged up when they get there.

Erwin is way up there, too, espececially if you stay at Miss Janet's. For a lot of folks, this is a tough place to leave.

Pearisburg gets more folks than you think, especially if the weather's been bad.

Waynesboro gets more popular every year.

Duncannon is more popular than it used to be, especially since Pat and Vicki took over running the Doyle.

Still not a lot of zeros in the mid-Atlantic states or Southern New England as the towns are small, facilities few and far between, and the hotels expensive.

More folks are starting to zero in Dalton and Rutland now that there are places for them to stay.

Fewer folks seem to stick around Hanover anymore, for the simple reason that there's no cheap place to lodge since the dorms essentially stopped talking in hikers. Too bad. In the old days, folks would frequently layover here for two or three days.

Likewise, Gorham isn't as popular as it once was. Most folks stay one night.

The big surprise, seeing as how tough it is, is that most folks don't zero much in Maine. The most popular spot for a day off in Maine is the Cabin in East Andover, tho most folks don't actually zero there, but instead slackpack for a few days and return to the hostel in mid-afternnon. Also, a lot of folks are getting on a tight budget once they're this far north, and while many folks wouldn't mind taking a day off in Stratton or Rangeley, they simply can't.

And lastly, of course, a lot of folks stay an extra day in Monson, especially now that the word on the new folks at Shaw's boarding house is overwhelmingly positive.

And as far as zero days not actually spent in towns, I'd say the NOC, Fontana Dam, and Kincora are at the top of the list.

(And my own personal favorite is a zero spent in the middle of nowhere, say OverMountain Shelter or Grayson Highlands Park).

bfitz
08-09-2006, 18:45
I think I zeroed or slacked in each of the places you mentioned, Jack.

map man
08-30-2006, 21:36
Cutman11 suggested looking at the group of hikers in this study and figuring out which towns were the most popular to take zero days in. I was originally going to wait until the hiking season was over to tackle this, but I came back from a one week hike in early August with badly blistered feet, so I wasn't going to get anymore hiking done soon, anyway, so I set to work looking at the 173 journals in the study and compiled this list. So first off, here are the towns on the AT, in order of popularity, where the largest percentage of these NOBO thru-hikers (classes of 2001 thru 2007) took their zero days (this list is confined to the towns where at least 20 percent of hikers zeroed):

84%.....Damascus VA
61%.....Hot Springs NC
53%.....Pearisburg VA
50%.....Waynesboro VA
48%.....Harpers Ferry WV
45%.....Erwin TN
45%.....Gorham NH
43%.....Fontana NC
38%.....Delaware Water Gap PA
35%.....Daleville (and Roanoke etc.) VA
31%.....Monson ME
29%.....Hanover NH
24%.....Duncannon PA
23%.....Hiawassee GA
23%.....Franklin NC
22%.....Manchester Center VT
20%.....Gatlinburg TN

More details on how I counted these towns: If someone got off the trail in Front Royal and hitched a ride back to Damascus to attend Trail Days, I credited both places with zero days. This differed from how I counted zero days in my original study. In that study, when figuring out how long it took to hike each section of the trail, all these zero days would have been counted in the Waynesboro to Harpers Ferry section since that's where the hiker got off the trail. When counting places popular for zeroing it seemed wrong, though, to neglect counting either town. If that hiker left the trail in Front Royal to tour Washington DC for a couple days, however, for obvious reasons I don't consider Washington a "trail town" so the only town credited with zero days would be Front Royal.

Also, in my original study I don't count it as a zero day no matter how few miles were walked on the trail. So if a hiker credited herself with hiking just a few tenths of a mile to get from one part of Fontana to another, for example, I didn't consider that a zero day. In this project, however, since that hiker would have spent at least one complete day in Fontana, I chose to count it as a zero day for Fontana.

OK, so here's a longer list that shows the percentage of zero days for each trail location that had at least 5 percent of hikers zeroing there. This list goes from south to north, following a NOBO on the trail. For most locations I give a town name and the place where the trail was exited, but in some locations in the White Mountains, for instance, there were places where at least 5 percent of hikers got off the trail but it's difficult to generalize about the town where the zero was spent. It's just a place, like Crawford Notch, where hikers exited the trail at a road crossing.

Neels Gap GA (US 19, 129), 7%
Helen, Unicoi Gap GA (GA 75), 6%
Hiawasee GA (US 76), 23%
Franklin NC (US 64), 23%
Nantahala Outdoor Center NC (US 19, 74), 13%
Fontana NC (NC 18), 43%
Gatlinburg TN (US 441), 20%
Standing Bear Farm NC (NC 284, I 40, Waterville School Road), 8%
Hot Springs NC (US 25, 70), 61%
Erwin TN, 45%
Elk Park NC, Roan Mountain TN (US 19E), 5%
Kincora Hostel, Laurel Fork Lodge TN (Dennis Cove Road), 14%
Damascus VA, 84%
Troutdale VA (VA 16), 6%
Sugar Grove VA (VA 16), 5%
Atkins VA (US 11), 8%
Bland VA (US 21/52), 6%
Pearisburg VA (US 460), 53%
Catawba VA (VA 311, 624), 10%
Daleville, Roanoke, etc. VA (US 11, 220), 35%
Waynesboro VA (US 250, I 64), 50%
Front Royal VA (US 522), 17%
Harpers Ferry WV, 48%
Pine Grove Furnace State Park PA, 7%
Boiling Springs PA (PA 174), 6%
Duncannon PA, 24%
Port Clinton PA, 17%
Palmerton PA (PA 873), 10%
Delaware Water Gap PA, 38%
Vernon NJ (NJ 94), 8%
Arden NY (NY 17), 5%
Bear Mountain NY, 10%
Pawling NY (County 20), 5%
Kent CT (CT 341), 15%
Salisbury CT (CT 41), 6%
Great Barrington MA (MA 23), 8%
Upper Goose Pond Cabin MA, 6%
Dalton MA, 19%
North Adams MA (MA 2), 5%
Bennington VT (VT 9), 5%
Manchester Center VT (VT 11, 30), 22%
Killington, Rutland, Inn at Long Trail VT (US 4), 17%
Hanover NH, 29%
Glencliff NH (NH 25), 11%
Franconia Notch NH, 13%
Crawford Notch NH, 11%
Pinkham Notch NH, 10%
Gorham NH (US 2), 45%
Andover ME (East B Hill Road, South Arm Road), 16%
Rangeley ME (ME 4), 14%
Stratton ME (ME 27), 17%
Caratunk ME (US 201), 6%
Monson ME, 31%
Baxter State Park and vicinity ME (Abol Bridge, Millinocket etc.), 12%

There is one place I'd like to mention -- the top zero day location serviced by no roads at all. In other words, the only way you get to them is by hiking. Upper Goose Pond Cabin MA had 6% of hikers zeroing there for its beauty alone.

Again, thank you to cutman11 for suggesting the idea of using my earlier research to trace zero days to particular trail towns.

Jack Tarlin
08-30-2006, 21:54
Very interesting stuff.

Query: The Damascus figure seems high. Does this include Trail Days zero days or does this refer to people who took time off in Damascus NOT connected to Trail Days?

map man
08-30-2006, 22:12
Jack, the Damascus figure includes both. If a hiker zeroed in Damascus in the normal course of the hike, that's counted. If hikers interrupted their hike to travel to Damascus for Trail Days, that's counted in the Damascus total too (I only count Damascus once, though, for each hiker). Same goes for other trail towns that sometimes get hikers traveling from a different point on the trail to stay at that trail town (Gorham gets hikers from different road crossings in the Whites; Waynesboro has hikers travel there from Rusty's and other places, etc.).

map man
08-30-2006, 22:42
Jack, as near as I can tell from my notes listing the zero day locations for each hiker, if only the zero days taken in the normal course of the hike were counted for Damascus (excluding trips from points elsewhere on the trail for Trail Days) the figure would be 71% instead of 85%. The balance (14%) spent zero days in Damascus for Trail Days, making a special trip, but did not zero there when they passed through Damascus in the normal course of their hike. (The numbers above were calculated when I had only the 2001-2006 info at my disposal.)

mweinstone
09-14-2006, 18:08
sounds great, keep up the good work

iamscottym
11-23-2006, 03:07
I was referred to this site from bper, seeking AT info- in preparation for an 07 thru. I must say I am duly impressed! I do have a couple angles I would like to cover in regards to all these stats. Also, please bear with me if I missed something, as I'm an engineer(Northwestern University, M.E.)- and have not been exposed to stats since high school.

I saw that someone mentioned breakdowns by hiker age, and hiking hours- and that this information was not available. There are many categories along these lines, though probably not available, that would be equally significant- such as pack weight, fitness level(though probably too subjective to measure), caloric intake(big eater, or the ramen noodle bunch), etc.

However, there is a potentially useful category yet untapped- delta elevation/ hiking day. I think this might shed light on the progession of the fitness of the hikers better than miles/day..mainly because elevation is related to work (in the physics sense- PE=mgh) and totally independent of horizontal distance. Granted, this neglects differences in trail conditions...blah blah blah, but obviously we cant (nor would likely be more useful..to the average hiker) factor in the infintismal number of variables.
To me, this would be most beneficial for training for the AT- since I could then take my gps to the local hills and know my own delta H for the day.

I'm sure there's a way to figure out the exact delta h with mapsource software- and I have some- I just haven't figured out how to do that yet(just got it- for my garmin 60csx). If someone else is already proficient with it and would like to crunch the numbers, that'd be great. Otherwise, I'm sure I'll get to it.

-Scott

map man
11-23-2006, 12:21
Hey Scott,:welcome to WhiteBlaze and I'm glad you found this article interesting. Now, to address what you are saying about tracing elevation gain and loss on various sections of the trail in order to get a more complete understanding of the changes in miles per hiking day in this article for the different trail sections (specifically, how much the increase in miles hiked in the early sections of the trail might be due to increased fitness, and how much might be due to a flatter trail):

I know that elevation change info is out there -- otherwise the organizations that create maps for the various trail sections could not create the graphic elevation profiles they include with their maps. And I know the GPS info is out there and could be found in multiple places, I'm guessing, by googling "GPS" and "Appalachian Trail" or other related search terms. Of course, in order to get entirely accurate elevation change numbers the GPS equipment would need to be very exact and every single point on the trail where falling ground changed to rising ground and vice versa would need to be logged -- one heck of a lot of data points! -- and I'm just not sure if it's been done that thoroughly yet. Others here on WhiteBlaze know a lot more about this than I do, and there have been several threads discussing GPS and mapping software.

The form I would find most useful for this information to take would be elevation gain and loss, in number form (not elevation profile) between every AT shelter and road crossing on the trail. It could be useful in planning and also useful to refer to while hiking when deciding whether to hike on to that next shelter toward the end of the day. If that info could be put into a format like the AT Data Book -- or crammed into the Data Book itself (I don't care how small they had to make the print!) -- I would be in hiker nerd heaven.

Now, applying that info to this study: you've already noted that there are other factors as well as elevation change (one that I've found to be a big deal is the rockiness/rootiness of the trail -- makes hiking on flats much harder and, ironically, makes hiking on significant slopes often easier), but elevation change undeniably affects hiking progress and is easier to quantify than those other various and sundry factors, so it makes sense to look at that. In fact, if the elevation change info was in detailed format and one easy for me to understand, I'd be interested in applying it to my study. My article as it stands deals with two variables: time and horizontal distance (with the chronology of the order one hiked the sections thrown in as well -- after all, I did limit the study to NOBOs). In simple terms, looking closely at elevation gain would add a third variable: vertical distance taking the form of elevation change both up and down.

If any WhiteBlazers know if this elevation info already exists in the detail I'm talking about, please post a link here. And Scott, thank you for bringing this up -- it might just give me something more to work on!

Edit: on rereading your post I see that part of your interest is in knowing how much elevation rise and fall there is on the trail in order to help in training. I believe that for the entire AT the elevation gain per mile is somewhere between 200 and 250 feet per trail mile (some sections more, some less) -- I'm sorry I can't remember where I saw this or the exact number -- I'm guessing others here can direct you to that exact number. Anyway, this means that over the course of a thru-hike one is likely to average between 2000 and 4000 vertical feet a day, depending on hiking speed. I know that when I was preparing to hike the Superior Hiking Trail I hiked in a local state park that had a known elevation change of 150 feet from valley floor to rim (and it was tough here in central Iowa to find even that much!). I made a point of going up and down the valley between 15 and 20 times with a full pack so I would get around 2500 feet in elevation change, and I did this on a few weekends before I hiked. I think it really helped me.

Jim Adams
11-23-2006, 13:07
what about legal zero days---everybody knows that it is illegal to hike on thursdays!

great info.

geek

iamscottym
11-23-2006, 14:41
Mapman,

I'm pretty sure there's a way in my mapsource software to follow a predetermined route (ie: trail) and log the elevation change. I know my gps logged elevation change for the trip when I was out in SNP this fall.
I'm working on figuring out the delta h between your landmarks now.

Oh, and I'm just as interested in the numbers, for the numbers' sake, as you are- I just gave 'training' as a use for the numbers, so I didnt feel quite so nerdy. Oh well, I'm over it now. :)

map man
11-23-2006, 15:02
You know, it occurs to me now that for purposes of this article, the most useful thing for me to know is vertical gain/loss per mile for the eleven distinct sections I chose. If I recorded that info in one of the tables in the article, say Table 2 where miles per day and miles per hiking day are listed, that would let people judge for themselves how much they thought trail ruggedness (in terms of vertical rise and fall) was influencing the changing distances hiked per day in the different sections, and how much was increasing fitness. Do I understand you correctly, Scott, that you think that might be something that can be calculated using your map software?

iamscottym
11-29-2006, 00:30
Map man,

As of yet I haven't been able to figure out how to do it, but I just got this gps and software, and haven't had one before so I don't really know what I'm doing. I'll keep working on it though.

map man
02-11-2007, 15:02
I just finished updating this hiking rates article, recalculating all the values after having added in the data for 2006 hikers. That means the info is now based on six years' worth of hikers (a total of 143 now) keeping journals at Trailjournals.com. I've also updated most of the supplemental information contained in Post #28, and the info on which towns hikers tend to spend their zero days contained in Post #69. There are no dramatic changes in the results, though some of the figures for the percentage of zero days hikers spent in a particular trail section did change some.

And I now have a new bit of information: since there are now over twenty journal keepers in the study who are female, and over twenty journals for a male and female hiking and journaling together, I thought I would go ahead and break the hiking rates down by gender, as a few members have been requesting. (The following numbers have been updated since to incorporate the 2001 through 2007 hiking classes.)

The 121 male hikers have taken a mean 164.3 days to complete with 19.8 zero days.
The 26 female hikers have taken a mean 175.2 days to complete with 20.3 zero days.
The 25 M/F couples have taken a mean 176.6 days to complete with 22.1 zero days.

The median values for all of these groups are very close to the means.

The number of hikers in the latter two groups is still low enough that I'm not confident enough about the results to include a gender breakdown in the main article, but I figured I'd mention it here for you members who have been curious.

And thank you to attroll for enabling the edit function for the article and my posts on this thread so I could get all these changes done.

terrapin_too
02-11-2007, 15:32
Well, I'm sure happy to see that my remaining AT miles are mostly in the section with the highest MPD. :)

nhalbrook
06-28-2007, 07:40
Great article.

In reading trail journals have noticed many hikers make note of their progress by 100 mile "mile stones". If the data lends itself to a reasonable tabulation of time to hike each 100 mile segment this would be interesting.

map man
06-29-2007, 00:08
Nhalbrook, I can't resist questions like these. The time periods I'm going to list are based on some assumptions, and you can decide if they're reasonable. One, that the 143 hikers in my study are reasonably representative of typical NOBO completing thruhikers. Two, that if hikers in the study average a certain number of miles per day in a section -- for example, the Damascus to Waynesboro section -- that the average will be about the same throughout the entire section (and not be greater after Daleville than before, for example -- this assumption is likely to be a little off, but shouldn't throw the number of days off by more than a fraction of a day, I think). You see, the form the data is in will not let me figure out exactly where each hiker is at the 200 mile post, for example, but instead tells me how many days it took each hiker to get to Fontana (at about mile 162) and Damascus (about mile 459). So, with those things said, here's my best estimate. I list the number of days to hike each 100 mile section and include the cumulative total in parentheses.

Mile 0 to 100 -- 10.2 days (10.2)
100 to 200 -- 8.6 days (18.8)
200 to 300 -- 8.2 days (27.0)
300 to 400 -- 8.2 days (35.2)
400 to 500 -- 7.9 days (43.1)
500 to 600 -- 7.5 days (50.6)
600 to 700 -- 7.4 days (58.0)
700 to 800 -- 7.5 days (65.5)
800 to 900 -- 7.1 days (72.6)
900 to 1000 -- 7.0 days (79.6)
1000 to 1100 -- 7.1 days (86.7)
1100 to 1200 -- 7.2 days (93.9)
1200 to 1300 -- 7.3 days (101.2)
1300 to 1400 -- 7.2 days (108.4)
1400 to 1500 -- 7.1 days (115.5)
1500 to 1600 -- 7.2 days (122.7)
1600 to 1700 -- 7.1 days (129.8)
1700 to 1800 -- 7.7 days (137.5)
1800 to 1900 -- 9.5 days (147.0)
1900 to 2000 -- 8.7 days (155.7)
2000 to 2100 -- 7.3 days (163.0)
2100 to end -- 5.4 days (168.4)

wudhipy
08-08-2007, 05:51
:clapWhat a great piece of work......somehow the intimidation is gone and there is a sence of possibility. You have made a great abstract almost tangible.
Thank you.

see ya in the woods

Wudhipy

Route Step
08-08-2007, 21:40
Reminds me of the saying "You can always beat a dead horse".
Thanks for the info Map Man. I'll uses it on a lower level then some of the folks on this site.

cutman11
08-08-2007, 21:43
Fascinating that with the exception of the first 100, when getting hiking legs, and the end, when finishing in maine, thru hiking the AT is mainly about just persisting in putting the time in (7 days) to do the miles (100mi). Essentially, if you can do the first 200, its basically a mental game to persist to finish the last 1900, one 100mile week after another.....

Salvadore L. Vagabon
10-04-2007, 06:32
:banana This is pretty damn sweet. I think it is awesome you took the time to do this and then posted it, and though some are doubting and sayin its unusable, come on, it is a great place to start the process of planning to do the AT. Great Job and thanx Map Man

-Salvadore L. Vagabon

Johnny Swank
10-24-2007, 19:59
Wow - what a cool study you've got going there. Just eyeballing things, your numbers pretty much jive with what I'm finding in my study. I've got about 500 folks or so in this sample spread over almost 40 years, so it'll take a while to wade through things.

Reading the pissing match on the first couple of pages was a good reminder of why I'm 90% sure I'm leaving my PhD program for good though. That kind of pointless hashing is mind-numbing, and ultimately for naught. I don't know how many conference presentations I've sat through and listened to 3 people argue about a silly footnote or model. I'd rather at least argue about dogs or something like that!

Again - great stuff. Enjoyed it.

map man
10-24-2007, 21:03
Thanks, Johnny Swank. I'm glad you are finding it useful and apparently matching up some with what your large population of hikers is reporting. Others too have commented on some of the contentious posts in the early part of the thread, but one thing that's important to note is that, though some of those posters with conflicting ideas about the underlying statistical methods did some barking at each other, none of them directed any personal stuff at ME. Some of those guys had a lot more training in statistics than I have (I'm thinking of Alligator, ARambler, dje97001 and Tha Wookie) and some of the things they had to say led me to crunch some more numbers and improve the article, I think.

I hope that when you've done some more work on it that you'll share some of the results here at WB, even if one of the original motivations for the research (the PhD) goes by the boards. (One thing to keep in mind, though, is that if YOU know in your heart that what you're doing is important, whether it be research or new approaches in your field, it gets easier to tune out quibblers -- it might be worth it to keep plugging away.)

Since you've got a sample that spans so many years I'd be particularly curious to know if the way people are hiking the AT has changed with time (as equipment got lighter and information about hiking the AT got more easy to access with the internet revolution). Anyway, good luck with it and thanks again.

_terrapin_
10-24-2007, 21:11
Since you've got a sample that spans so many years I'd be particularly curious to know if the way people are hiking the AT has changed with time (as equipment got lighter and information about hiking the AT got more easy to access with the internet revolution). Anyway, good luck with it and thanks again.

Personally I find that a very interesting topic -- changes in hiking habits, regimes, etc. over time. Of all my hiking/thru-hiking books, Roland Mueser's study (of the class of '89) is one of my favorites. Interesting too how some things haven't changed at all. ;)

Johnny Swank
10-25-2007, 11:39
I've had to put things on the back burner since The Gathering to work on the book, but I'll be posting interesting nuggets on our blog as they come in. There's enough in that dataset to keep me busy for years it I let it. Hope to birth out a book on the results in about 24 months, but we'll see.

Good job again on your study. Really enjoyed it.

BearII
12-23-2007, 00:06
ok, I'm a newbie, but I'm a veteran traveler - having traveled tens of thousands of miles around the USA on my motorcycle. So now I want to try a bit of hiking, AT in particular. Wow, was I ever surprised that the same type of pissing matches that occur on bikes sites happens on hiker sites!!!! Geez, I was hoping that of all people "real" hikers, feet to the trail types, would have finally figured out that is all ABOUT THE FRIGGIN JOURNEY - stop the insanity please!!!!!!!!!!!

If there is one thing I've learned in traveling all over, the journey is so fascinating that all this other stuff, is well, nonsense. No offense map man, but really your initial post was more than enough. This entire statistical BS is just that - BS!!!!!!!!! It's what we really want/need to get away from!!!<O:p</O:p

Focus on the journey, grow with it, learn from it, who really cares if you hiked 10 miles from Springer vs 11.4. Yeah, I know so many need to "plan", NO you don't, enjoy it, hike it, work it, let "obstacles" be OPPORTUNITIES for the "trail magic" to occur. Geez folks, I really thought hikers would have this down much more than bikers (well certainly more than HD types but maybe not as much as real bikers). It is ABOUT THE JOURNEY!!!!!!!!!

map man
01-06-2008, 21:14
A little over a year ago iamscottym asked about elevation gain and loss on the AT, wondering how it correlated to the hiking speeds that I reported in this article. Well, I have now calculated the elevation change for many sections large and small on the AT and published it in "AT Elevation Gain and Loss, by Section," in the articles forum (and hopefully it will be moved to the articles section on the front page in the coming weeks). So here is a table showing average elevation gain and loss per mile for each of the eleven sections in this article. In this table MPHD is Miles Per Hiking Day, EGPM is Elevation Gain Per Mile (expressed as feet per mile), and EGPD is Elevation Gain Per Day (expressed as feet per day). So in the two elevation categories the number "3100" would mean a hiker had gone 3100 feet up and also 3100 feet down.

EGPD~~EGPM~~MPHD~~~SECTION
3100.......307.......10.1........Springer - Georgia border
3310.......276.......12.0........Georgia border - Fontana
3780.......270.......14.0........Fontana - Damascus
3970.......248.......16.0........Damascus - Waynesboro
3610.......215.......16.8........Waynesboro - Harpers Ferry
2350.......139.......16.9........Harpers Ferry - DWG
3160.......196.......16.1........DWG - Kent
3600.......231.......15.6........Kent - Glencliff
4090.......353.......11.6........Glencliff - Gorham
4250.......335.......12.7........Gorham - Stratton
2890.......198.......14.6........Stratton - Katahdin
3470.......236.......14.7........Entire Trail

People can decide for themselves how much the changing distances hiked each day is due to increasing fitness (or increased breaking down of knees and other body parts in latter stages:D ) and how much to changing ruggedness of the trail.

excuses
01-06-2008, 22:31
Just goes to show, people always asking me how is the trail? My answer "up and down".
Thanks for the big view.

1Pint
01-06-2008, 22:41
Well, I have now calculated the elevation change for many sections large and small on the AT

Thanks for all your work Map Man. This stuff is really interesting.

emerald
01-06-2008, 22:56
I assume the data includes both northbound and southbound hikers. It would be interesting to be able to compare northbound and southbound averages for the same sections.

longwe tru
01-06-2008, 23:13
Some of this thread sounds like the math professor of "Numbers" Way way over my head...but I find this thread very helpful...as I just spent the last two days trying to figure out where I would be when...how...and what to do with maildrops.

Thank You

map man
01-06-2008, 23:48
Shades of Gray, the sections in post #93 are long enough that the difference in feet/mile doesn't amount to much in NOBO vs. SOBO, which is why I just went with the average. All sections have ten feet or less difference between the two, except for the Georgia border to Fontana section which is a shorter section and has a big elevation difference between end points (Georgia border is around 2000 feet higher than Fontana). NOBO experiences 264 feet of elevation gain per mile while SOBO has 287, for an average of 276. Still not a huge difference.

In the article I wrote on elevation gain I do include both NOBO and SOBO in the tables dealing with shorter sections.

Edit: I just realized you may have meant to talk about the differences in hiking rates in SOBO and NOBO, rather than the difference in elevation gain. If that's the case, the hiking rates study only uses NOBOs. There are still fewer than 10 SOBO thru-hike journals at trailjournals, as of the last time I checked, that are detailed enough to use for this study, and that number isn't high enough yet to get a representative sample, I think. I would some day like to compare the two, though, when there are enough SOBOs to compare.

map man
04-06-2008, 19:04
I've updated the "AT Hiking Rates, Section by Section" article to incorporate the class of 2007, so it now includes 173 hikers from the classes of 2001 through 2007. I've also updated most of the illustrations and tables in Post #28, and Post #69 showing the popularity of various trail towns for taking zero days, and Post #80 breaking down the numbers by gender, and Post #93 comparing sections for hiking speed and trail ruggedness.

The numbers just don't change much now as I add recent hiker classes. For instance, the mean number of days to hike in my original study of 105 hikers from 2001-2005 was 167.8 days. Now after having added 68 more hikers from 2006 and 2007, that mean number is 167.7. I'm not certain I'm going to keep updating the article in future years.

rasudduth
04-07-2008, 00:13
Thanks mapman. This is all very interesting. Hopefully soon there will be enough SOBO data to do that analysis. MEGA on!

UHFox
04-30-2008, 18:30
This is great information to have. Thanks for all of the time that you put into it. I think that this data is valuable for section hikers as well, at least near the southern end of the AT, where the hikers in the study may have still been getting 'trail tough'.

TRIP08
05-15-2008, 21:15
Thanks.

mtt37849
10-08-2008, 01:02
Great study, can't believe someone has done this.. Goodjob man.