We are conducting routine maintenance on portfolio manager. We'll be back up as soon as possible. Thanks for your patience.

On Back-Tests (Part 1)

Is back-testing useful?

Samuel Lee 28.11.2013
Facebook Twitter LinkedIn

This article first appeared in the Morningstar ETFInvestor – September 2013.

As they are often used, back-tests are merely a legal way of fabricating a statistically bogus history of outperformance and implicitly taking credit for it. I don’t think I’m being too cynical. Most back-tested strategies I’ve seen are problematic. The worst claims are from newsletters and trading-software providers, who can say almost anything without legal percussions under the aegis of the First Amendment. I’ve seen claims of “low risk” 30%-plus monthly returns (or 2,230% annualized), though most typically keep their back-tests in the range of 20%–50% annualized in a sad attempt to maintain a semblance of believability. It appears index providers and ETF sponsors have also produced some unbelievable back-tests.

Last year, Vanguard published a study on exchange-traded funds tracking back-tested indexes. The authors, Joel M. Dickson, Sachin Padmawar, and Sarah Hammer1, looked at a sample of equity indexes with at least five years of back-tested history and five years of live performance. In the five years prior to index live date, the indexes averaged 12.25% excess returns above U.S. equity market; five years after live date, they averaged negative 0.26%. Most ETF index back-tests are garbage, in other words.

The Vanguard study’s results aren’t all that surprising (and not just because it’s from Vanguard, a zealous advocate of low-cost cap-weighted indexes). Many back-tested indexes in its sample were cooked up at the behest of ETF sponsors who wanted a fund to cover “hot” market segments. Few providers are willing to launch an ETF invested in a loathed asset class. While Vanguard’s study didn’t check whether strategy indexes on average beat the market after their live dates, I think the study’s main points would remain unchanged. We have a massive back-test in the actively managed equity mutual fund industry. Most studies have found that an equity fund’s historical outperformance doesn’t persist in a statistically detectable fashion. If there’s one verity that can be both relied on and reliably ignored, it’s that past performance does not guarantee future results.

Back-tests are so often useless because there are many more false than true relationships in a typical data set. It’s easy to tweak an experiment to discover a “statistically significant” relationship that is in fact nonexistent. This data-snooping bias is one of the reasons why many—if not most—published research findings are false.2

Nothing logically precludes back-tests from being a useful way to uncover truth. Back-tests are an application of induction, a method of reasoning that, crudely stated, derives universal principles from specific observations. (I think it’s safe to say induction works. It’s valid to argue the sun will almost certainly rise tomorrow because it’s done so every day for some 4.5 billion years.) Then again, you can lump together two of anything together if you go to a high-enough abstraction—a lump of coal and Richard Simmons’ hair can both be categorized as belonging to the class “things that can be burned for fuel,” but using the latter to power your home is impractical and probably a bad idea.

Reasonable investors derisive of back-tests acknowledge that in theory back-tests can illuminate something true about the world, but believe in practice back-tests can rarely be relied on to do so. The Vanguard study, by finding that past performance does not reliably predict future returns of asset classes, joins a chorus of studies by many independent researchers demonstrating the same. A skeptic might conclude back-tests are to induction what Richard Simmons’ hair is to the category of things that can be burned for fuel.

The problem with that argument is too many successful investors are or were back-testers. Benjamin Graham, father of value investing and Warren Buffett’s mentor, devised trading rules based on studies of what would have worked in the past. Early in his career Buffett applied Graham’s back-tested rules to identify “cigar butts,” statistically cheap stocks that had assets that could be sold for more than they could be bought for. Another successful Graham acolyte, Walter Schloss, achieved 20% gross annualized returns over several decades “selecting securities by certain simple statistical methods…learned while working for Ben Graham.”3 Ray Dalio, founder of Bridgewater Associates and arguably the most successful macro investor alive, uses back-tested strategies to run Pure Alpha, an unusual fund that only executes fundamental quant models. Mathematician James Simon’s Medallion Fund, a quantitative, fast-trading strategy, has earned 35% annualized returns after fees since 1989. In a real sense, any investor who observes a historical pattern has engaged in back-testing.

But what separates a good back-test from a bad one? Or, more generally, what separates a valid induction from an invalid one in the financial markets?

Most investors make bad inductions. I think it’s in large part because amateur investors calibrate their hurdles for accepting a proposition based on what’s worked for them in their everyday lives, leaving them susceptible to finding patterns in noise. For example, many investors see a three-year record of outperformance as evidence a manager is probably skilled.

More demanding investors want five or 10 or more years of performance data before coming to a conclusion. They’re all wrong (though the more-demanding ones are less so). Historical returns by themselves are rarely enough to reliably identify a skilled manager or a valid back-tested strategy. Markets are so random that blindly sifting by performance-related metrics for “winners” will give you a group dominated by lucky investors/strategies.

The most successful investors operate under a model, or an ensemble of them. They do not determine an asset’s attractiveness in a vacuum. They have strong opinions on how humans behave, how institutions operate, how an asset’s value is derived, the processes governing asset prices, and so forth. Their beliefs are reasonable and at least to some extent touch truth—otherwise they wouldn’t work. (Some claim just having an investing discipline and sticking to it ensures success. I couldn’t disagree more. If you believe in flat-out untrue things and stay the course, you will suffer.) With valid models, successful investors can extract information beyond what’s encapsulated in the numbers.

The typical investor often doesn’t have a well-articulated, reality-based model. He focuses on recent returns and too readily accepts propositions based on inadequate or faulty evidence. I spill much ink on philosophy and process in this article as a corrective.

In part 2 of this article, we will further explore the qualities that distinguish good from bad back-tests.

1 Joel M. Dickson, Sachin Padmawar, Sarah Hammer. “Joined at the Hip:

ETF and Index Development.” Vanguard 2012.

2 John P. A. Ioannidis. “Why Most Published Research Findings Are False.” PLoS Medicine 2005.

3 Warren E. Buffett. “The Superinvestors of Graham and Doddsville.”

 

Samuel Lee is an ETF strategist with Morningstar and editor of Morningstar ETFInvestor

Facebook Twitter LinkedIn

About Author

Samuel Lee  Samuel Lee is an ETF strategist with Morningstar and editor of Morningstar ETFInvestor

© Copyright 2024 Morningstar Asia Ltd. All rights reserved.

Terms of Use        Privacy Policy          Disclosures