*A version of this article was published in the August 2013 issue of Morningstar ETFInvestor.*

In the article of “Understanding Factors”, I briefly explained the three major risk factors that determine most assets' behavior: economic growth, inflation, and liquidity. These three ingredients are why stocks, bonds, and many alternative strategies have higher expected returns than cash.

In this article, I'm going to talk about the standard linear factor model. The linear factor model attempts to answer a simple question: Can a return stream be broken down into explainable and unexplainable parts?

Most studies looking at mutual fund manager performance use linear factor models to disentangle the roles of skill, luck, and risk in producing a given manager's return stream.

Roughly speaking, a linear factor model creates the best-fitting custom benchmark for an asset. It provides additional information, such as the constitution of the benchmark, how nicely the benchmark fits the asset's returns, by how much the asset beat the benchmark, and how likely these findings are attributable to chance.

My goal for this primer is to provide a practical understanding of factor models. I try to keep the math as simple as possible, but some familiarity with statistics helps a lot.

**The Case of Fidelity Magellan**Exhibit 1 is a scatter plot of the monthly returns over cash of Fidelity Magellan (FMAGX, US-domiciled mutual fund) (plotted on the vertical or y-axis) and the U.S. stock market (plotted on the horizontal or x-axis) from May 2003 to March 2013.

The graph shows a strong linear relationship between the market and Magellan: When the market is up, Magellan is up about the same amount; when the market is down, the fund is down the same amount.

The data suggest there is a strong fundamental relationship between this fund and the market. And indeed there is: Magellan is a well-diversified basket of U.S. stocks and therefore is exposed to the same broad macroeconomic risks as the U.S. stock market.

**A Brief Statistical Detour**The factor models we're interested in use linear regression, a statistical method that fits a line through two sets of data, a dependent variable that we want to "explain," and one or more independent or explanatory variables (which, naturally, do the explaining). Fidelity Magellan is our dependent variable, and the stock market is our explanatory variable. We could use the fund as the explanatory variable and the market as the dependent variable, but that would be getting our causation backward: Changes to Magellan don't move the markets; markets move Magellan. So we always begin with a fundamental story before setting up the scaffolding around the numbers.

The fitted regression line through the scatter plot provides an estimate of the true relationship between the market and this fund.

Recall from high school geometry that the equation for a line follows the form

y = mx + b,

where m is the slope term (a measure of the line's steepness) and b is the y-intercept term (the spot where the line crosses the y-axis). By the conventions of financial statistics, the intercept term b is denoted α, Greek for alpha, and the slope term is denoted by β, Greek for beta, and the terms are rearranged such that

y = α + βx,

where y is the asset's excess return (asset return minus the risk-free or cash rate), and x is the market's excess return (market return minus the risk-free rate). Keep in mind these are periodic returns--daily, weekly, monthly, and so on, though monthly returns are most commonly used. In order to make y and x more explicit, they'll henceforth be renamed R-Rf and Mkt-Rf, respectively, where R is the fund's monthly return, Rf is the risk-free rate of return, and Mkt is the S&P 500's return:

R-Rf = α + β*(Mkt-Rf)

The linear regression procedure finds the values of α and β that produce the best-fitting line. (A technical note for the curious: The line is fitted to minimize the total sum of the squares of the vertical distances between the data points and the line.) In this case, the procedure estimates the following:

R-Rf = -0.35+1.16(Mkt-Rf)

This equation is straightforward to interpret; for each percentage-point change in the U.S. stock market's monthly excess return, Fidelity Magellan's monthly excess return is predicted to move in the same direction by 1.16 percentage points, minus 0.35 percentage points. For example, the equation predicts that if the market is up 10% one month, the fund will be up 10%*1.16 – 0.35%= 11.25%.

The equation tells you two things. First, Magellan's return pattern can be replicated by simply leveraging the S&P 500 by 1.16. Second, Magellan underperformed that simple leveraged strategy by 0.35 percentage points a month, or 4.2 percentage points a year (0.35%*12).

In finance jargon, the fund's beta to the market was 1.16, and its annual alpha was negative 4.2%. Sound familiar? That's because this is what experts mean when they talk about beta and alpha.

Of course, Magellan could've been unlucky. Recall that the linear regression is an estimate of the true relationship between this fund and the market. The true relationship could actually be something like y = 0.50 + 1.20x, for example, and the period we looked at captured some extreme data points that skewed our estimates of alpha and beta. Of course, no one knows the true relationship; statistical methods can provide only informed guesses as to what it actually is.

The statistical uncertainty of the alpha and beta terms is quantified by the p-value, which indicates the percent chance of obtaining as extreme of a value if the alpha or beta were zero. An abbreviated form of the regression output looks like this:

where the intercept is the monthly unexplained return, or alpha, and Mkt-Rf is the market factor. The intercept's p-value shows that there's only a 1% chance an outcome this bad or worse could've occurred due to luck, assuming the fund's true alpha is zero (that is, the fund's managers had no skill). The t-statistic is another way of expressing the p-value (and favored over the p-value; however, its interpretation isn't as intuitive, so I'm glossing over it). Finally, the R^{2} is a measure of fit. An R^{2} of 100% indicates that the model perfectly fits the data. A value of 0% indicates it's completely unrelated to it. The regression's R^{2} is very high, indicating it does a good job explaining Fidelity Magellan's monthly returns.

*In part 2 of this the article, we will be using a multifactor model and discuss the use and abuse of factor models.*