Chapter 2: Markets as Data Objects
A market lives in your computer as a time-indexed Pandas object. In this chapter you go from raw closing prices all the way to the statistics that drive every risk and performance report:
- Prices, raw vs adjusted close, simple and log returns, and the rules for aggregating returns across horizons.
- Cumulative gross returns (equity curves), resampling daily → monthly, and rolling statistics.
- Volatility with \(\sqrt{T}\) scaling, drawdowns via the running maximum, Sharpe ratios with proper annualisation, cross-asset correlation, and the subtle but unavoidable volatility drag that pulls compound returns below the arithmetic mean.
These are the gateway from raw return series to portfolio-level diagnostics.
Chapter Introduction
Where you’ll see this: every time you open Yahoo Finance, Robinhood, or a TikTok video screaming “this stock is up 200%!”, you are looking at the output of a pipeline that started exactly where this chapter starts — with a long, sorted list of (time, price) rows. By the end of the chapter you will be able to build that pipeline yourself in Python, and (more importantly) to spot when somebody else’s pipeline is lying to you.
Finance is one of the oldest data-intensive industries on earth. From the merchants of Venice in the 14th century, who kept ledger books of grain and silver prices, to a quant (a quantitative analyst — someone who builds math/code-driven trading strategies) on a 2026 trading desk staring at a 5-millisecond tick stream, the underlying object has not changed: a time-stamped sequence of prices. What has changed is the toolkit. The Venetian merchant had quills, ink, and arithmetic. You have Python, pandas (the standard Python library for working with tables of data), and a laptop that can scan a hundred years of daily prices in milliseconds.
This chapter is the first of two on markets as data objects. The premise is simple but consequential: every price quote a market produces — every trade, every closing price, every option chain (the menu of contracts that let you bet on a stock’s future price) — is just a piece of data tagged with a time stamp. Once you accept this view, finance stops being mysterious and becomes a problem of time-series engineering (the discipline of cleaning, aligning, and transforming data that is indexed by time). The rest of the course — risk, portfolios, factor models, sentiment, machine learning signals — is built on the foundations laid in this chapter.
The view from a real trading floor is unforgiving. At a hedge fund, the data layer is the source of more bugs than the model layer. A wrong dividend adjustment, a missing splits column, a date index that silently misaligns by one row — any of these will corrupt a backtest (a simulation of how a strategy would have done historically) in ways that no statistical test can detect until you have lost real money. The discipline you build here — always know what each column means, always check the index, always know whether prices are raw or adjusted — is not a textbook ritual. It is the difference between a profitable strategy and an embarrassing post-mortem.
Why “data objects” instead of “data”
The phrase data objects is deliberate. A pandas.Series of closing prices is not a passive list of numbers. It is an object that knows its own time index, its dtype, its name, and a battery of methods — .pct_change(), .rolling(), .resample(), .cumprod() — that encode decades of empirical finance practice into a single dot-method. The skill is less about memorizing formulas (most of them are one line of math) and more about learning to think in objects: every transformation produces a new object, every object carries its time index, and every plot you make is a window into that object’s state.
What you bring in, what you take out
You should already be comfortable with the material in Chapter 2: loading a CSV, examining df.head(), the difference between a Series and a DataFrame, basic boolean indexing, and df.plot(). This chapter adds the time dimension on top: dates as index, returns as differenced data, compounding as accumulation, volatility as standard deviation, and the \(\sqrt{T}\) scaling rule that lets you move risk between horizons. By the end you will have the vocabulary needed for Chapter 4, where we introduce portfolios as weighted combinations of these return series.
Table of Contents
- Prices and the time index
- Raw close vs adjusted close: dividends and splits
- Simple returns
- Log returns
- Simple vs log: when each one matters
- Aggregating returns over time
- Cumulative gross return and the equity curve
- Resampling: daily → weekly → monthly
- Rolling statistics: means and volatility
- Annualizing volatility: the \(\sqrt{252}\) rule
- Worked example: SPY end-to-end
- Exercises
Prices and the time index
Why this matters: before you can compute returns, risk, or anything else, you need to be able to load a price series and trust it. This section is about the rules for the underlying object — what a price series is, why the time index matters more than the prices themselves, and the kinds of silent bugs that destroy backtests before the model is even written.
What a price series actually is
Open a financial chart on any phone app and you see a wavy line moving from left to right. Strip away the chart and what remains is a table with two columns: a time stamp and a price. Everything else — moving averages (a smoothed running mean of the price), candlesticks (a chart style that shows open/high/low/close on each bar), volume bars, indicators — is derived from this minimal pair. Internally, a market data feed at any modern fund looks like exactly this: a long, sorted sequence of (timestamp, price) rows.
In pandas the natural home for such an object is a Series with a DatetimeIndex. A pandas Series is just a one-column table with row labels; a DatetimeIndex is the special kind of row label that holds actual dates instead of plain integers like 0, 1, 2. The Series stores the prices, the index stores the time stamps, and the two are kept in lock-step by pandas: when you slice the Series by date, the prices follow; when you reindex the Series to a new calendar, the prices align automatically. This tight coupling between values and time is the single most important property of a pandas time-series object.
Three practical implications follow.
First, chronological order is sacred. The whole machinery of returns, rolling windows, and resampling assumes the index is sorted. A common bug — and a costly one — is to feed pandas a Series whose dates are out of order; many methods will not raise an error, they will simply produce nonsense. Always call .sort_index() after loading a new source.
Second, the index defines the universe. When you compute P_t / P_{t-1} (read aloud as “today’s price divided by yesterday’s price” — but be careful what “yesterday” means), “\(t-1\)” does not mean “yesterday in the real world”; it means “the row immediately before \(t\) in the index”. If your data is sampled daily but skips weekends and holidays, then \(t-1\) for a Monday row is the previous Friday. If you accidentally resample to a calendar-day frequency, \(t-1\) becomes a Sunday with NaN (pandas’s marker for “missing value”) and every return becomes NaN. The arithmetic is unforgiving — the index, not the calendar, is the ground truth.
Third, the index enables time-aware methods. Series.rolling("22D"), Series.resample("1M"), Series.shift(1), and Series.asof() all rely on the index being a DatetimeIndex. Without it these methods either fall back to positional behavior or fail outright. The very first thing you do after loading a price file is verify that the index is a proper date type:
print(type(price.index))
# <class 'pandas.core.indexes.datetimes.DatetimeIndex'>Acquiring price data with yfinance
For the rest of this chapter we use yfinance, a lightweight Python package (an open-source library you install with pip install yfinance) that scrapes Yahoo Finance for end-of-day OHLCV bars — that acronym just means the five standard daily numbers a market reports: Open (first trade of the day), High (highest price), Low (lowest price), Close (last trade), Volume (number of shares traded). It also gives you corporate actions (dividends and splits, which we explain in the next section). It is free, fast enough for teaching, and adequate for prototypes. Real work at a fund uses paid feeds (Bloomberg, Refinitiv, Polygon, IEX), but the data structure — a date-indexed OHLCV table — is identical.
A minimal first call looks like this:
import yfinance as yf
import pandas as pd
import numpy as np
msft = yf.Ticker("MSFT").history(period="5000d", interval="1d")
msft.head()The result is a DataFrame whose index is a DatetimeIndex of trading dates and whose columns are Open, High, Low, Close, Volume, Dividends, and Stock Splits. The first five rows for MSFT look like this:
Open High Low Close Volume Dividends Stock Splits
Date
2006-04-26 00:00:00-04:00 18.903 19.008 18.847 18.917 39190000 0.0 0.0
2006-04-27 00:00:00-04:00 18.826 19.287 18.806 19.022 96509600 0.0 0.0
2006-04-28 00:00:00-04:00 16.914 17.102 16.753 16.858 591052200 0.0 0.0
2006-05-01 00:00:00-04:00 16.977 17.451 16.816 16.956 174800900 0.0 0.0
2006-05-02 00:00:00-04:00 17.095 17.451 16.683 16.760 190533500 0.0 0.0
A handful of details deserve attention. The index is timezone-aware (-04:00 means US Eastern time, the timezone of the NYSE — so pandas knows that 9:30 AM in this Series is not the same instant as 9:30 AM Hong Kong time). The Open/High/Low/Close numbers are reported to fractional cents, because Yahoo back-adjusts historical prices for splits and dividends — a subject we will return to in the next section. The Volume column is in shares traded, and Dividends and Stock Splits are non-zero only on corporate-action days (days when the company paid a dividend or split its stock).
What this gave us: a single DataFrame (a DataFrame is the pandas word for a multi-column table) where each row is one trading day and each column is one piece of information — exactly the shape we need.
Anatomy of an adjusted-close price series
Throughout this chapter we will treat the closing-price column as a stand-alone Series — one date label per row, one number per row. The diagram below sketches that object so you have a clear mental picture before we start computing returns.
When you read code in the rest of this chapter, keep this picture in mind: a single column of floats with a DatetimeIndex on the left. Every operation we apply — .pct_change(), .cummax(), .rolling(), .resample() — touches one of these two parts (the values, or the index), never anything else.
A 5,000-day window is roughly 20 trading years — enough to span at least one major regime change (the 2008 crisis, the 2020 pandemic crash, the 2022 inflation shock). When you measure volatility or test a strategy, the question is not what was the last year like but what kinds of years has this asset lived through. A one-year sample is a dangerous teacher.
Now plot the closing series — the canonical first chart in any financial analysis:
The trajectory of the curve — slow drift, sharp drawdowns, long recoveries — is the visual signature of equity returns. Notice that the level of the price is not what matters; the chart looks essentially the same whether the y-axis runs from $20 to $400 or from $2 to $40. What matters are the percentage changes between adjacent points, which are the actual quantity an investor experiences. That is the topic of the next several sections.
Beyond a single stock: ETFs, bonds, options
The same code pattern downloads any tradable asset Yahoo covers. ETFs (Exchange-Traded Funds) are the building blocks of most retail and many institutional portfolios because each ETF gives you a single-ticker bet on an entire basket — an index, a sector, a country, a bond segment.
qqq = yf.Ticker("QQQ").history(period="5000d", interval="1d") # NASDAQ-100
tlt = yf.Ticker("TLT").history(start="2020-01-01") # 20+ year Treasuries
lqd = yf.Ticker("LQD").history(start="2020-01-01") # IG corporate bondsA few ETFs you should know by name:
| Ticker | Asset class | Why it matters |
|---|---|---|
| SPY | S&P 500 | The benchmark for US large-cap equity |
| QQQ | NASDAQ-100 | Tech-heavy growth benchmark |
| XLK | S&P 500 Information Technology sector | A cleaner tech-only exposure |
| TLT | 20+ year US Treasuries | Long-duration safe-asset proxy |
| IEF | 7–10 year US Treasuries | Intermediate Treasuries |
| SHY | 1–3 year US Treasuries | Cash-like rate exposure |
| LQD | Investment-grade corporate bonds | Credit spread exposure |
| HYG | High-yield (junk) corporate bonds | High-credit-risk proxy |
For derivatives, yfinance also exposes option chains:
t = yf.Ticker("AAPL")
t.options # tuple of expiration dates
chain = t.option_chain(t.options[0])
calls = chain.calls # DataFrame of call contracts
puts = chain.puts # DataFrame of put contractsWe will return to options in Chapter 6. For now the point is structural: every tradable instrument fits into the same date-indexed-table mold. Once you can manipulate a daily close Series, you can manipulate the entire investable universe.
yfinance is not a production data source
yfinance scrapes Yahoo, which means its data is best-effort: sometimes prices are missing, sometimes a ticker is silently delisted, sometimes a corporate action is applied incorrectly. For coursework and prototyping it is fine. For real money use a vendor with service-level guarantees (Bloomberg, Refinitiv, Polygon, IEX, Norgate). Even then, always spot-check.
Raw close vs adjusted close: dividends and splits
Where you’ll see this: when an Instagram post shows a chart of “Apple before and after the 2020 split” and claims investors “lost 75% overnight”, they are confusing raw with adjusted prices. Nobody lost a cent — Apple just multiplied each share into four. This section is the antidote: it tells you exactly what to look for so you never get tricked by the same bug.
The single most common source of silent error in equity analysis is confusing raw close with adjusted close. You must always know which one you are holding.
Imagine a friend gives you a HK$100 bill, then later asks for it back and hands you ten HK$10 bills instead. You haven’t gained or lost anything — but if you only tracked the biggest single bill in your wallet, it would look like you went from HK$100 to HK$10, a “90% loss”. Stock splits create that exact illusion in raw price data. Adjusted prices undo the illusion so the percentage changes match the true experience.
What corporate actions do to a price series
A company can change its share price for two reasons that have nothing to do with investor demand. First, it may pay a cash dividend — a per-share cash payment to shareholders (think: “the company takes some of its cash and mails it out to whoever owns the stock”). On the ex-dividend date (the cutoff day for who gets paid), the share price drops by roughly the dividend amount, because that cash is no longer inside the firm. Second, it may declare a stock split (or reverse split) — the number of shares is multiplied, and the per-share price is divided to match. Example: in a 4-for-1 split, every 1 share becomes 4, but the price drops to 1/4 of what it was. Your total holding is worth exactly the same; only the unit changed (like splitting a HK$100 bill into ten HK$10 bills).
Both events create mechanical “drops” in the raw price line that have no economic meaning for a buy-and-hold investor (someone who just owns the stock and doesn’t trade it). The investor either gets the dividend as cash or ends up holding more shares after the split. A naive return computation on raw prices will record these drops as losses, which is wrong.
The fix is to use adjusted close prices: a synthetic (artificially constructed) series in which historical prices are scaled so that the percentage change between any two adjacent adjusted closes equals the true total return — the price change plus any dividends paid (treated as if you immediately reinvested them) — that a buy-and-hold investor would have earned over that interval.
The construction is straightforward. Let \(P_t\) be the raw close at date \(t\), \(D_t\) the dividend paid at \(t\) (zero on non-dividend days), and \(s_t\) the split ratio at \(t\) (1 on non-split days, e.g. 2 for a 2-for-1 split). Define the adjustment factor that propagates backward in time:
\[ a_t = a_{t+1} \cdot \frac{P_{t+1} - D_{t+1}}{P_{t+1}} \cdot \frac{1}{s_{t+1}}, \qquad a_T = 1. \]
The adjusted close is then \(P_t^{\text{adj}} = a_t \cdot P_t\). The exact algorithm depends on the vendor, but the goal is universal: adjusted close returns should equal total returns.
Which one does yfinance give you?
Recent versions of yfinance set auto_adjust=True by default in yf.download(), and Ticker.history() likewise returns prices that are already split- and dividend-adjusted. The Close column in the DataFrame above is therefore already an adjusted close. The legacy Adj Close column is no longer separately reported.
The default has flipped at least twice in yfinance’s history. Never assume — print the first few rows, inspect the Dividends and Stock Splits columns, and convince yourself the Close column has been adjusted before computing any return.
A quick diagnostic for adjustment is to look at a known split day for a major stock — for example, Apple’s 4-for-1 split on 31 August 2020. The unadjusted close fell from about $499 to $129; the adjusted close shows a smooth percentage change of roughly +3%, the true one-day total return.
Dividend, split, and total return defined
To keep terminology clean:
- Price return between \(t-1\) and \(t\) is \(P_t/P_{t-1} - 1\), computed on raw close.
- Total return is the price return plus dividends received, treated as if reinvested at the close on the ex-date.
- Adjusted close return is computed as \(P_t^{\text{adj}}/P_{t-1}^{\text{adj}} - 1\) on the adjusted series, and approximates the total return.
For equity strategy research, you almost always want total returns — dividends are a real cash flow and a meaningful fraction of long-horizon equity returns (about 2% per year of the long-run ~7% real US equity return historically). The shortcut is to use adjusted close throughout and never compute returns on raw prices unless you have a specific reason.
Simple returns
Why this matters: every time a finance app says “AAPL +1.2% today”, that 1.2% is a simple return. This is the most common number in all of finance — but as you’ll see in two sections, it has one quirky property (it doesn’t add up cleanly across time) that forces quants to invent a second kind of return.
A return is “how much wealthier you got, expressed as a fraction of what you started with”. If your $100 grew to $102, the return is +2% — i.e. $2 of profit divided by the $100 you put in. That’s it. Everything below is just notation for this one idea.
Definition
Below, \(P_t\) is just shorthand for “the price on day \(t\)”, and \(R_t\) is “the return on day \(t\)”. Don’t memorise the symbols — just remember the picture: today’s price compared to yesterday’s. Given a price series \(P_t\), the simple return over one period is
\[ R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1. \]
It is the percentage change in price between \(t-1\) and \(t\). A 1% return means \(P_t = 1.01 \, P_{t-1}\) (today’s price is 1.01 times yesterday’s); a \(-2\%\) return means \(P_t = 0.98 \, P_{t-1}\). Simple returns are how brokers report performance, how regulators define disclosures, and how investors intuitively reason about gains and losses.
In pandas: .pct_change() and the manual form
Pandas exposes this transformation in two equivalent forms. The first is .pct_change() (read it as “percent change” — it’s literally the method that does what its name says), which directly returns the simple return:
qqq["simpleR"] = qqq["Close"].pct_change()The second writes out the arithmetic explicitly, which is useful when you want to see what is happening:
qqq["simpleR"] = (qqq["Close"] - qqq["Close"].shift(1)) / qqq["Close"].shift(1)Both produce identical output. .shift(1) slides the entire series down by one row — so qqq["Close"].shift(1) puts yesterday’s closing price on today’s row, lining up “yesterday” and “today” side by side so we can subtract. The first row is NaN because there is no prior price to shift in.
Historical vs forward returns
Two related quantities show up constantly in practice and deserve clean naming.
Historical return at \(t\) is computed from \(t-1\) to \(t\): it is the return you would have earned over the period ending at \(t\). This is what .pct_change() gives you.
Forward return at \(t\) is computed from \(t\) to \(t+1\): it is the return you would earn over the next period, conditional on holding from the close at \(t\). Forward returns are constructed with .shift(-1):
qqq["H"] = qqq["Close"].pct_change() # historical: t-1 -> t
qqq["F"] = qqq["Close"].shift(-1) / qqq["Close"] - 1 # forward: t -> t+1The distinction matters whenever you build predictive models. The dependent variable in a return-prediction model is almost always a forward return — what will the next-period return be, given features observed at or before time \(t\)? Using historical returns as a target by mistake leaks information from the future into the model and produces backtests that look spectacular and lose money live.
A model that uses today’s return as both a feature and a target will achieve near-perfect in-sample \(R^2\) and zero out-of-sample profit. The bug is always the same: aligning the target to the wrong date. Make it a habit to give forward-return columns a clearly distinct name (fwd_ret_1d, y_1d, target_1d) so they never get confused with historical returns.
A quick numerical sanity check
What this gave us: a tiny three-row example where we can see by hand that .pct_change() produces exactly the formula’s output — useful sanity check before trusting it on millions of rows.
Anatomy: from prices to returns
The diagram below shows the two columns side by side. The price column has five values; the returns column has only four — the first row is NaN because there is no \(P_{t-1}\) to subtract from. This one-row offset is the single most common source of confusion when students start computing returns.
Read the diagram from left to right. Each return on the right is built from two prices on the left — the one on the same row, divided by the one on the row above. The very first row has no row above it, which is why returns always start with a NaN. Whenever you join a returns column back onto a price table, expect the first row to be missing and decide on purpose whether to drop it or to fill it.
Two observations: returns are unitless (they are ratios — they have no dollar sign, no percentage sign attached, they’re just a number), and a \(+2\%\) followed by a \(-1\%\) does not return you to the starting price — after you think about it for a moment, you can see why: the \(-1\%\) is applied to the new (post-gain) price of $102, not to the original $100. You end at \(100 \cdot 1.02 \cdot 0.99 = 100.98\), slightly above 100. This non-symmetry of gains and losses is the entry point to log returns.
Log returns
Why this matters: simple returns are intuitive, but they don’t add up cleanly over multiple days — to combine them you have to multiply, which is annoying in statistics and finance models. Log returns fix this: they turn multiplication into addition, which is why every statistical model in this course (and most quant papers) is written in log-return space.
A log return is just a re-labelling of the simple return that makes the math nicer. Think of it like switching from Celsius to Kelvin: same physical reality, different scale that happens to make formulas cleaner. For tiny daily moves (under a couple of percent), the log return and the simple return are visually identical.
Definition
Below, \(\ln\) is the natural logarithm — the log to base \(e \approx 2.718\). If you remember log from high-school as base 10, just know that “natural log” is the version mathematicians prefer because of how it interacts with calculus and exponential growth. The log return (also called continuously compounded return) over one period is
\[ \ell_t = \ln\!\left(\frac{P_t}{P_{t-1}}\right) = \ln P_t - \ln P_{t-1}. \]
It is the natural logarithm of the gross return \(1 + R_t\) (the “gross return” is just “1 + the return”, e.g. a +3% return has gross return 1.03 — the factor by which your wealth was multiplied). Equivalently:
\[ \ell_t = \ln(1 + R_t), \qquad R_t = e^{\ell_t} - 1. \]
For small returns, \(\ln(1 + R) \approx R - R^2/2 + \cdots\), so \(\ell_t \approx R_t\) to first order. At a daily frequency, where typical returns are well below 1%, the two are numerically very close — within a few basis points (a basis point, or “bp”, is 0.01% — the standard finance unit for “a tiny amount”).
In pandas
qqq["logR"] = np.log(qqq["Close"]) - np.log(qqq["Close"].shift(1))
# equivalently:
qqq["logR"] = np.log(qqq["Close"] / qqq["Close"].shift(1))A typical tail of the joint output:
Ticker QQQ simpleR logR
Date
2026-03-05 608.91 -0.003013 -0.003017
2026-03-06 599.75 -0.015043 -0.015158
2026-03-09 607.76 0.013356 0.013267
2026-03-10 607.77 0.000016 0.000016
2026-03-11 607.69 -0.000132 -0.000132
Notice how simple and log returns agree to four decimal places when the magnitude is small, and start to differ noticeably (in the fourth decimal) at \(\pm 1.5\%\).
Inside the green band — the typical daily move for an equity — the curve and the 45° line are visually indistinguishable, so simple and log returns are interchangeable in routine work. Outside the band the curve bends below the line: a \(+30\%\) simple return is only a \(+26\%\) log return, and a \(-30\%\) simple return is a \(-36\%\) log return. The asymmetry — losses look worse in log space than in simple space — is exactly why compound returns drag below the arithmetic mean (a fact we return to later in the chapter).
What this gave us: a chart that visualises the divergence between simple and log returns — and importantly, it shows that for the everyday ±3% range you’ll see in daily stock data, the two curves are basically the same.
Why log returns exist at all
For an asset that compounds continuously, \(\ell_t\) has two properties that simple returns lack.
Time additivity. If you hold the asset from \(t\) to \(t+k\), the log return over the whole window is the sum of the per-period log returns:
\[ \ell_{t \to t+k} = \ln\!\left(\frac{P_{t+k}}{P_t}\right) = \sum_{j=1}^{k} \ell_{t+j}. \]
This is the central reason quants love log returns: aggregating across time is addition, not multiplication. Means, sums, OLS regressions, time-series models — everything in classical statistics assumes additive structure, which simple returns do not have.
Symmetry around zero. A \(+10\%\) simple return is not the inverse of a \(-10\%\) simple return — they leave you with \(0.99\) of your original capital, not \(1.00\). A \(+10\%\) log return is the inverse of \(-10\%\). This symmetry is convenient when modelling.
Why simple returns still exist
For all the elegance of log returns, simple returns dominate one situation: portfolios. If you hold three assets with weights \(w_1, w_2, w_3\) that sum to 1, the portfolio’s one-period simple return is
\[ R_p = w_1 R_1 + w_2 R_2 + w_3 R_3. \]
A weighted sum of simple returns gives the portfolio’s simple return exactly. The analogous identity is not true for log returns: in general,
\[ \ell_p \neq w_1 \ell_1 + w_2 \ell_2 + w_3 \ell_3, \]
except as a first-order approximation.
The rule of thumb most practitioners settle on:
| Use simple returns when… | Use log returns when… |
|---|---|
| Combining assets into a portfolio (cross-section) | Aggregating one asset across time |
| Reporting performance to a client | Building statistical models on returns |
| Computing weighted averages | Computing means, OLS, Sharpe ratios on long data |
| Anything that says “percent gain” | Anything that says “log-normal”, “Brownian” |
In practice both columns often coexist in a research dataframe, and you switch fluently between them.
Aggregating returns over time
Where you’ll see this: “this stock returned 10% per month for the past year — so 120% per year, right?” Wrong, and the gap between 120% and the true answer is exactly what this section unpacks. Aggregating returns is also where most spreadsheet errors happen in finance internships, because the rules feel obvious until you actually try them.
This is where the simple-vs-log distinction earns its keep.
Simple-return compounding
If you hold an asset for \(k\) periods with simple returns \(R_1, R_2, \ldots, R_k\), the gross return over the whole window is the product of the gross per-period returns:
\[ 1 + R_{1 \to k} = (1 + R_1)(1 + R_2)\cdots(1 + R_k) = \prod_{j=1}^{k} (1 + R_j). \]
The net cumulative return is \(R_{1 \to k} = \prod (1+R_j) - 1\). In pandas:
cum = (1 + r).prod() - 1 # net cumulative return over the windowLog-return summation
For log returns the same window gives
\[ \ell_{1 \to k} = \sum_{j=1}^{k} \ell_j, \]
and to convert back to the cumulative gross return: \(1 + R_{1 \to k} = e^{\ell_{1 \to k}}\).
cum = np.exp(lr.sum()) - 1The two computations produce identical answers up to floating-point rounding — they are just two ways of writing the same algebra. The choice between them is purely about which form is easier to manipulate in the surrounding code.
Worked numerical example
Suppose a stock has three daily simple returns: \(R_1 = +2\%\), \(R_2 = -1\%\), \(R_3 = +1.5\%\).
Cumulative gross return (simple form):
\[ 1 + R_{1\to 3} = 1.02 \cdot 0.99 \cdot 1.015 = 1.02490. \]
So the three-day return is \(+2.49\%\), not \(+2.5\%\). The shortfall (\(-0.01\%\)) is the convexity drag from the loss day.
Equivalently in log form, \(\ell_j = \ln(1 + R_j)\):
\[ \ell_1 = 0.019803, \quad \ell_2 = -0.010050, \quad \ell_3 = 0.014889, \] \[ \ell_{1\to 3} = 0.024642, \quad e^{0.024642} - 1 = 0.02490. \checkmark \]
The two routes agree. The log form makes it obvious that the sign of \(-1\%\) is the only source of drag; if all three returns were \(+2\%\) the cumulative would beat \(3\times 2\% = 6\%\) by a small amount, a phenomenon known as positive compounding.
The arithmetic mean of \(\{+2\%, -1\%, +1.5\%\}\) is \(0.833\%\). Compounded over 3 days, \((1.00833)^3 - 1 = 2.52\%\) — close to but not equal to the true cumulative \(2.49\%\). The arithmetic mean of returns is not the per-period return that would generate the observed cumulative. The quantity that does is the geometric mean, which is the per-period equivalent of the cumulative product. For investment-performance reporting, geometric means (or equivalently, annualized log-return means) are the honest measure.
Cumulative gross return and the equity curve
Why this matters: the equity curve is the single chart every fund manager, retail investor, and YouTube finance influencer puts at the top of their pitch — it’s a visual answer to “if I had given you $1, what would I have now?” Learning to build one yourself (and read one critically) is the most important visual skill in this course.
An equity curve just answers the question “how much would $1 have grown to, day by day?”. Every up-tick is a profitable day; every down-tick is a losing day. The dramatic-looking shape of a stock chart is mostly an equity curve in disguise.
The equity curve
If you put one dollar into an asset at \(t = 0\) and reinvest all gains, your wealth at \(t\) is
\[ W_t = \prod_{j=1}^{t}(1 + R_j), \]
where \(W_0 = 1\) by convention. The series \(\{W_t\}\) is called the equity curve (or sometimes the cumulative gross return). It is the single most informative chart in performance analysis: rising stretches are profit, falling stretches are drawdowns, and the steepness encodes the rate of compounding.
In pandas there are two one-liners, mirroring the simple/log split:
eq_simple = (1 + r).cumprod() # from simple returns
eq_log = np.exp(lr.cumsum()) # from log returnsBoth produce the same \(\{W_t\}\) up to floating-point error (tiny rounding errors that come from computers storing decimals in binary; harmless here). cumprod() is short for “cumulative product” — it walks down the column multiplying as it goes, so the value in row \(t\) is the product of all values from row 1 through \(t\). cumsum() (“cumulative sum”) does the same with addition. These are the two “accumulator” methods you will use constantly in performance analysis.
Code: build an equity curve
The two curves are visually indistinguishable, which is the point: they are the same object expressed two ways. In practice you pick whichever form composes more cleanly with the rest of your code. For instance, when you have a mix of cash periods (return = 0) and invested periods, np.log1p(r).cumsum() handles zeros without precision loss, while (1+r).cumprod() is easier to read.
The four panels are progressively more processed views of the same return series. Panel (a) is the raw atom — a histogram of daily returns. Panel (b) compounds those returns into a wealth path via cumprod. Panel (c) overlays the running maximum, the bookkeeping needed for drawdown. Panel (d) is the wealth path expressed as a percentage shortfall from that peak — the underwater chart — which is the picture investors actually care about because it shows depth and duration of pain simultaneously.
Reading the equity curve
Three quantities you can eyeball off any equity curve:
- CAGR (compound annual growth rate). If the curve runs from \(W_0 = 1\) at date \(t_0\) to \(W_T\) at date \(t_T\), and the number of years is \(\tau = (t_T - t_0)/365.25\), then
\[ \text{CAGR} = W_T^{1/\tau} - 1. \]
Maximum drawdown. At each point, the drawdown is \(W_t / \max_{s \leq t} W_s - 1\) — i.e., how far below the running peak you are. The minimum of this series over the whole sample is the worst peak-to-trough loss the investor would have lived through.
Time underwater. The fraction of dates on which the equity is below its previous peak. A strategy with a 10% drawdown that recovers in two weeks feels very different from one with a 10% drawdown that takes three years.
We will compute all three in Chapter 4.
Resampling: daily → weekly → monthly
Why this matters: academic research papers almost always work in monthly returns, while traders almost always work in daily (or faster). To read either literature, you need to be able to convert between them — and the conversion has one small trap that catches almost every beginner.
resample is just the pandas way of saying bucket these timestamps into wider windows. Daily → monthly means “for each calendar month, collapse all the daily rows inside it into a single monthly row”. The only question is how you collapse them: take the last price? Sum the returns? Average something? The choice depends on what the column means.
The two flavors of frequency conversion
A daily series can be aggregated to weekly, monthly, quarterly, or annual frequency. Pandas has one universal method — .resample(rule) — that handles this, but what you put inside it depends on whether you are aggregating a price or a return.
For a price series you usually want the last observation in the period: the closing price at the end of the week or the month is what an investor would have realized. Use .last():
monthly_close = price.resample("1M").last()
weekly_close = price.resample("1W").last()For a return series you want the compounded return over the period. Each monthly return is the product of the daily gross returns inside the month, minus one:
monthly_ret = (1 + daily_ret).resample("1M").prod() - 1The two operations are not interchangeable. Taking the last daily return of the month is a one-day return at month-end — it has nothing to do with the monthly return.
Common frequency rules
| Rule string | Meaning |
|---|---|
"B" |
Business day |
"1W" |
Weekly (default: Sunday-end) |
"1W-FRI" |
Weekly, anchored to Friday |
"1M" |
Calendar month end |
"BM" |
Business month end |
"1Q" |
Calendar quarter end |
"1Y" or "1A" |
Calendar year end |
For US equity work, "BM" (business month-end) is the most natural choice, because the last trading day of the calendar month is what an investor would actually transact on.
Code: daily → monthly, two ways
What this gave us: a side-by-side comparison showing both routes produce identical monthly returns — proof that they’re algebraically the same. The two columns agree to many decimal places, as they should: \(\prod (1 + R_j) = P_{\text{end}}/P_{\text{start}}\) is an algebraic identity (the product of all the daily gross returns inside a month equals the end-of-month price divided by the start-of-month price — they’re literally the same number). The route you choose is a matter of which intermediate object you want to keep — sometimes you need the monthly price (e.g. to plot it), sometimes only the monthly return.
Why monthly?
Monthly returns are the most common research frequency in academic finance for three reasons. Noise. Daily returns are dominated by microstructure noise (bid-ask bounce, intraday flow), monthly returns less so. Macro alignment. Most macroeconomic series — CPI, unemployment, GDP, factor returns — are released monthly or less often, so monthly is the natural join frequency. Sample size. A 50-year monthly sample is 600 observations, comfortably enough for cross-sectional regressions; a 50-year daily sample is 12,600, which sounds bigger but provides less independent information per observation.
For trading, the choice is the opposite: higher frequency means more independent decisions per year and (if your edge is real — i.e. you genuinely have a small probabilistic advantage over the market) a higher Sharpe ratio. Daily and intraday data dominate quant trading research. The course will keep both perspectives alive.
Rolling statistics: means and volatility
Why this matters: “the market is more volatile than usual” — how would you actually check that? You need a moving (rolling) estimate of volatility that updates each day. Rolling statistics are also how every technical indicator on TradingView is built — the 50-day moving average, the Bollinger Bands, RSI, all of them.
A rolling window is like looking at the data through a fixed-width sliding picture frame. Today, you look at the last 22 days. Tomorrow, you slide the frame one day to the right and look at the last 22 days from tomorrow’s vantage point. The “statistic” inside the frame (mean, std, whatever) updates each time.
Rolling windows in pandas
A rolling window slides a fixed-size window across the time index and computes a statistic at each step. Pandas exposes this through .rolling(window) followed by an aggregation method (an aggregation method is just a function that collapses many numbers into one — mean, std, max, min, sum, etc.):
ma_22 = price["Close"].rolling(22).mean() # 22-day moving average
sd_22 = ret.rolling(22).std() # 22-day rolling sample stdThe first 21 values of each output are NaN because the window is not yet full — by default min_periods equals the window length.
The choice of window length is partly conventional. 22 trading days is the standard for a monthly window (a calendar month averages ~21 trading days). 63 days is a quarter, 252 days is a year. Always state explicitly which convention you are using.
Rolling mean: the moving average
A rolling mean of returns gives a slow, smoothed estimate of the local drift. A rolling mean of prices gives a smoothed trajectory beloved of technical analysts:
The longer the window, the smoother the line, and the more lagged it is relative to fast price moves. This trade-off — smoothness vs lag — is the single most important design choice in any technical indicator. A 22-day MA captures monthly trends but reacts quickly; a 252-day MA defines the long-run trend but turns slowly.
Rolling volatility: the standard deviation of returns
Volatility is just statistics jargon for “how wild are the daily moves?”. Technically it’s the standard deviation of returns — but you can think of it as the typical size (positive or negative) of a daily wiggle. A volatility of 1% means “on a normal day, the price moves by roughly ±1%”. The Greek letter \(\sigma\) (sigma) is the standard symbol for it.
For a return series, rolling(window).std() computes the sample standard deviation over the window. The word “sample” just means we’re estimating from observed data, as opposed to knowing the “true” underlying value. This is the empirical analog of \(\sigma\), the most common single-number summary of risk for an asset.
vol_22 = ret.rolling(22).std()
vol_22.plot(title="22-day rolling volatility of daily returns")A typical equity (stock) series has a daily standard deviation in the neighborhood of \(0.5\%\)–\(2\%\). It is not constant in time — periods of calm (vol ~0.5%) alternate with periods of crisis (vol > 3%). This phenomenon, called volatility clustering — turbulent days tend to come in groups, like aftershocks after an earthquake — is one of the empirical regularities every financial model must accommodate.
Why \(\sigma\) measures risk
The intuition is mechanical. If returns are roughly symmetric around zero, then the standard deviation tells you the typical size of a deviation from the mean — both upward and downward. A 22-day vol of \(0.01\) implies a one-day shock of about 1% is normal; a \(0.02\) shock is roughly two standard deviations, and a \(0.05\) shock would be a 5-sigma event under a normal distribution.
The reality is messier — return distributions are fat-tailed, with more extreme events than a normal distribution predicts — and we will refine the risk measure in Chapter 4 with Value-at-Risk and Expected Shortfall. For now, \(\sigma\) is the right starting point.
Position sizing with rolling vol
A practical application: suppose you have $1M of capital (the money you have available to invest) and want to allocate it to QQQ such that your portfolio’s daily standard deviation does not exceed 1% ($10,000). If today’s rolling 22-day vol of QQQ daily returns is \(\sigma = 0.012\), then the dollar position size \(\$X\) (the amount you actually put into the trade — could be smaller than your capital) satisfies
\[ \$X \cdot \sigma = \$10{,}000 \implies \$X = \$10{,}000 / 0.012 = \$833{,}333. \]
Equivalently, you would deploy about 83% of your capital. When QQQ vol spikes to \(0.025\), the same risk budget (the amount of daily wiggle you’ve decided to tolerate) would require shrinking the position to $400,000 — 40% of capital. This style of position sizing, called volatility targeting (sizing your position up when markets are calm and down when they’re stormy, to keep your daily risk roughly constant), is one of the simplest and most powerful risk-management tools in quant trading.
Annualizing volatility: the \(\sqrt{252}\) rule
Where you’ll see this: every hedge fund factsheet and every Bloomberg terminal reports volatility per year, but the raw calculation almost always happens on daily data. The conversion uses one famous number — \(\sqrt{252}\) — and applying it incorrectly is the most common mistake in student finance projects.
Means add up linearly with time (a 0.04% daily mean over 252 days is roughly 10% per year), but volatilities grow more slowly — only with the square root of time. The deeper reason: daily wiggles partially cancel each other out, so a year of random walks doesn’t accumulate 252× the daily noise, only about \(\sqrt{252} \approx 15.9\times\).
The rule
If you have an estimate of \(\sigma_{\text{daily}}\) — the standard deviation of daily returns — and you want \(\sigma_{\text{annual}}\), the rule is
\[ \sigma_{\text{annual}} = \sigma_{\text{daily}} \cdot \sqrt{252}. \]
The 252 is the number of US trading days in a year (it varies slightly across calendars; people use 252 for US equities, 256 for many futures markets, 260 for FX). Square-root, not linear: this is the famous distinguishing feature of how variance scales with time.
Why \(\sqrt{T}\) and not \(T\)?
The intuition is that returns over different days are approximately independent. Independence is the key assumption: variances of independent random variables add. If \(r_1, r_2, \ldots, r_T\) are independent with common variance \(\sigma^2\), then
\[ \text{Var}(r_1 + r_2 + \cdots + r_T) = T \sigma^2, \]
and therefore
\[ \sigma(r_1 + \cdots + r_T) = \sqrt{T} \sigma. \]
In contrast, expected values add linearly: \(\mathbb{E}[r_1 + \cdots + r_T] = T \mu\). Hence:
\[ \mu_{\text{annual}} = 252 \, \mu_{\text{daily}}, \qquad \sigma_{\text{annual}} = \sqrt{252} \, \sigma_{\text{daily}}. \]
The combination is the annual Sharpe ratio:
\[ \text{SR}_{\text{annual}} = \frac{\mu_{\text{annual}}}{\sigma_{\text{annual}}} = \frac{252 \mu_{\text{daily}}}{\sqrt{252} \sigma_{\text{daily}}} = \sqrt{252} \cdot \text{SR}_{\text{daily}}. \]
Sharpe scales with \(\sqrt{T}\) also — a strategy with a daily Sharpe of 0.06 has an annualized Sharpe of about \(0.06 \cdot \sqrt{252} \approx 0.95\).
Some practitioner numbers to memorize
| Asset | Approx. daily \(\sigma\) | Approx. annual \(\sigma\) |
|---|---|---|
| Short-term Treasuries (SHY) | 0.05% | 0.8% |
| Investment-grade credit (LQD) | 0.3% | 5% |
| 20-yr Treasuries (TLT) | 0.8% | 13% |
| S&P 500 (SPY) | 1.0% | 16% |
| NASDAQ-100 (QQQ) | 1.3% | 20% |
| Bitcoin | 3.5% | 56% |
These are order-of-magnitude figures over multi-year windows; the realized vol any given year can deviate substantially. The point is to develop a sense of scale — an equity allocation with 5% annualized vol is suspect (probably hedged or stale-priced); a fixed-income strategy with 30% annualized vol is taking equity-like risk.
The \(\sqrt{T}\) rule assumes daily returns are i.i.d. In reality they exhibit volatility clustering (vol today predicts vol tomorrow) and modest return autocorrelation (especially negative at the daily horizon for individual stocks, and positive at the monthly horizon for momentum). The \(\sqrt{T}\) approximation is good enough for back-of-envelope work; for production risk models, GARCH and realized-volatility models do better.
Worked example: SPY end-to-end
Where you’ll see this: SPY (the ETF that tracks the S&P 500, the most-traded fund on earth) is the standard “first analysis” object in quant finance. Pasted-together versions of the script below run on every desk every morning. If you can build this end-to-end from scratch, you have the core of a quant intern’s daily workflow.
We close the chapter with a full end-to-end example that exercises every concept introduced above: load SPY daily prices, build a daily return series, resample to monthly, plot the equity curve, and compute rolling volatility — all in a single self-contained script.
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 1. Download adjusted-close SPY prices.
spy = yf.Ticker("SPY").history(start="2010-01-01")["Close"]
spy.name = "SPY"
# 2. Daily simple returns.
daily_ret = spy.pct_change().dropna()
# 3. Monthly returns (compound the daily returns within each month).
monthly_ret = (1 + daily_ret).resample("BM").prod() - 1
# 4. Equity curves from monthly and daily returns.
eq_daily = (1 + daily_ret).cumprod()
eq_monthly = (1 + monthly_ret).cumprod()
# 5. Rolling 22-day annualized volatility.
vol_22d = daily_ret.rolling(22).std() * np.sqrt(252)
# 6. Summary statistics.
mu_d, sd_d = daily_ret.mean(), daily_ret.std()
SR_ann = (mu_d * 252) / (sd_d * np.sqrt(252))
print(f"Daily mean return: {mu_d:.5f}")
print(f"Daily std (vol): {sd_d:.5f}")
print(f"Annualized mean: {mu_d*252:.4f}")
print(f"Annualized vol: {sd_d*np.sqrt(252):.4f}")
print(f"Annualized Sharpe: {SR_ann:.3f}")
fig, axes = plt.subplots(2, 1, figsize=(9, 6), sharex=True)
eq_daily.plot(ax=axes[0], color="#1a4d80", linewidth=1.0, label="Daily-compounded")
eq_monthly.plot(ax=axes[0], color="#c43d3d", linewidth=1.3, label="Monthly-compounded")
axes[0].set_title("SPY equity curve, starting at \$1")
axes[0].legend(); axes[0].grid(True, alpha=0.3)
vol_22d.plot(ax=axes[1], color="#7a3f9e", linewidth=0.9)
axes[1].set_title("SPY rolling 22-day annualized volatility")
axes[1].set_ylabel("Annual vol")
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()A live version of the same pipeline, with a synthetic SPY stand-in so it runs offline:
A few features to read off the output.
The two equity curves — daily-compounded and monthly-compounded — line up almost exactly, because compounding is associative: aggregating daily returns into monthly first and then compounding gives the same wealth as compounding daily throughout. The monthly curve is just a downsampled version of the daily one.
The volatility series shows the two stress periods clearly as spikes. In the calm middle years annualized vol sits around 12–16%, in line with the long-run SPY number. During the engineered crises it briefly exceeds 40%. The shape of this curve — long quiet stretches punctuated by sharp peaks — is universal across markets.
At any equity-strategy hedge fund, the script above is the first thing a researcher runs on any new ticker — usually as a Jupyter notebook cell (Jupyter is an interactive Python environment, the standard tool for exploratory data work). The equity curve plus rolling vol plot is the standard “first card” in a strategy review deck. If a researcher cannot reproduce these numbers from scratch, they cannot do the rest of the work. Drill these patterns until they are reflex.
Summary
The big ideas of this chapter, condensed:
- A market price series is a time-indexed pandas Series. The index is the ground truth; everything else is computed from it.
- Use adjusted close prices (split- and dividend-adjusted) for return calculations. Always check whether the data is adjusted before computing anything.
- Simple return \(R_t = P_t/P_{t-1} - 1\), in pandas:
price.pct_change(). Use for portfolios and reporting. - Log return \(\ell_t = \ln(P_t/P_{t-1})\), in pandas:
np.log(price/price.shift()). Use for time aggregation and statistical modelling. - Cumulative gross return:
(1+r).cumprod()ornp.exp(lr.cumsum()). This is the equity curve. - Resampling:
.resample("BM").last()on prices,(1+r).resample("BM").prod()-1on returns. - Rolling stats:
.rolling(n).mean(),.rolling(n).std(). Standard windows are 22 (month), 63 (quarter), 252 (year). - Annualize volatility with \(\sigma_{\text{annual}} = \sqrt{252} \, \sigma_{\text{daily}}\). Mean returns scale by \(252\); Sharpe scales by \(\sqrt{252}\).
The next chapter extends this machinery to portfolios — weighted combinations of return series — and to the matrix language of covariance, correlation, and diversification.
Exercises
Download AAPL daily history for the last 5,000 trading days with yfinance. Then:
- Print the
dtypeof the index and confirm it is aDatetimeIndex. - Print the first and last dates, and the number of rows.
- Plot the
Closecolumn. On the same axes, overlay a 252-day moving average. - Identify the day of the 4-for-1 split on 31 August 2020. What value does the
Stock Splitscolumn take on that date? What does theClosecolumn do across that date — does it jump or is it adjusted smooth?
Using the AAPL series from Exercise 1, construct two columns: simpleR = Close.pct_change() and logR = np.log(Close).diff(). Then:
- Plot both as histograms on the same figure. How different do they look?
- Compute and report
simpleR.mean(),logR.mean(),simpleR.std(),logR.std()over the full sample. - Verify numerically that \(\ell_t = \ln(1 + R_t)\) for the most recent 5 observations. Compute the maximum absolute discrepancy across the entire series.
Continuing with AAPL:
- Build
eq1 = (1 + simpleR).cumprod(). - Build
eq2 = np.exp(logR.cumsum()). - Plot them on the same axes. They should be visually identical.
- Compute
(eq1 - eq2).abs().max()to verify the equivalence numerically. What order of magnitude is the discrepancy, and where does it come from? - Report the final wealth from $1 over the full sample, and the implied CAGR over the period.
From the AAPL daily return series:
- Compute monthly returns two ways: (a) resample the price with
.resample("BM").last()and takepct_change(); (b) resample the return with(1 + ret).resample("BM").prod() - 1. Confirm they agree. - Report the monthly mean and standard deviation of returns. Compare to the daily mean and standard deviation scaled by 21 and \(\sqrt{21}\) respectively (since there are ~21 trading days per month). How close is the empirical scaling to the i.i.d. prediction?
- Plot monthly returns as a bar chart. Highlight in red any month in which the absolute return exceeds 10%. How many such months are there?
Suppose you manage $1M and target a daily portfolio standard deviation of $10,000 (1% of capital). Using AAPL:
- Compute a 22-day rolling standard deviation of daily returns.
- For each date, compute the dollar position size that would have hit the 1% vol target.
- Plot the time series of position sizes (in $). On which dates was the position smallest (i.e. when was AAPL most volatile)?
- What is the average position size over the sample? How does it compare to the $1M cap?
- Bonus: clip the position size at $1M (you cannot lever beyond your capital in this exercise). What fraction of dates is the position constrained at the cap?
Before computing returns on any new dataset, work through this checklist:
- Is the index a
DatetimeIndex?print(type(df.index)). - Is the index sorted ascending?
df.index.is_monotonic_increasing. - Are there missing dates inside the range?
pd.date_range(df.index.min(), df.index.max(), freq="B").difference(df.index). - Are prices adjusted for splits and dividends? Spot-check a known split date.
- Are there
NaNvalues in the price column?df["Close"].isna().sum().
A return series computed on a dataset that fails any of these checks is suspect — and finance has a long tradition of strategies that looked profitable until someone re-ran the analysis with a properly cleaned input.
Chapter Introduction
Where you’ll see this: the second half of this chapter is what every “fund report” you’ll ever read is really made of. When CNBC shows a graphic with bullet points “annual return: 12% — volatility: 18% — max drawdown: -25% — Sharpe: 0.6”, those four numbers are exactly what we now compute, by hand, in Python.
From Returns to Risk
Chapter 3 stopped at the construction of returns. Returns are the raw object — a clean, dimensionless number per period — and they are nearly useless on their own. A stock that returned \(+0.4\%\) yesterday could be a quiet dividend payer or a leveraged technology name caught in a temporary pause. Two funds that delivered the same \(12\%\) annual return could have travelled there along radically different paths: one ground steadily upward, the other careened through a \(40\%\) drawdown and a near-bankruptcy event before recovering. The investor who finds out only at year-end what happened in between has been flying blind.
This chapter is about the diagnostics that make a return series legible. We will compute four numbers — volatility, drawdown, Sharpe ratio, correlation — that together transform a column of returns into a defensible risk profile (the standard summary of “how risky is this thing, in what ways?”). Each is one line of pandas code in real work. Each carries a non-trivial amount of statistical theory underneath. The combination is what every institutional report, every hedge-fund tear sheet (a one- or two-page fund performance summary), and every robo-advisor dashboard rests on.
Why These Four Statistics, in This Order
There is a deliberate logic to the sequence. Volatility is the simplest one-number summary of risk: the standard deviation of returns. It captures short-term fluctuation. It is symmetric in gains and losses, and it scales with the square root of the horizon — a fact we will see force itself onto every annualization choice you ever make. Drawdown captures something volatility cannot: persistent loss. A portfolio that drifts down 30% over six months and stays there has caused real pain to its holders, even if its daily standard deviation is unremarkable. Drawdown is the metric an end investor actually feels in their stomach. Sharpe ratio is the per-unit-of-risk return, the cleanest comparator across strategies of different scale. Correlation is the bridge from one asset to two — and from two assets to a portfolio. Correlation is what diversification trades in. Without it, the entire portfolio-construction industry has no language.
We close with two ideas that connect the dots: the volatility drag that makes geometric return strictly below arithmetic return for any risky asset, and a complete worked example — a 60/40 SPY/AGG portfolio analyzed end-to-end.
Returns describe outcomes. The four statistics in this chapter describe the path that produced those outcomes.
Two assets can have identical mean returns and tell completely different risk stories.
Volatility and the Square-Root-of-Time Rule
Why this matters: “this fund has 12% volatility” — what does that even mean, and how do you tell whether that’s risky? This section makes the number concrete and gives you a mental yardstick (e.g. SPY ≈ 16%, Bitcoin ≈ 55%) so you can immediately sanity-check any volatility claim you see.
Volatility is the typical size of a price wiggle. If a fund has 20% annualised volatility, then in a typical year its return will be roughly within ±20% of its mean. Higher volatility = wilder ride. The Sharpe ratio (later in the chapter) is essentially “return per unit of this wiggle”.
Definition
The starting point is simple. Given a series of period returns \(r_1, r_2, \ldots, r_T\) (just “returns on day 1, day 2, …, day T” — capital \(T\) is the total number of days in the sample), the sample volatility is the sample standard deviation:
\[ \hat{\sigma} = \sqrt{\frac{1}{T-1}\sum_{t=1}^{T}(r_t - \bar{r})^2}, \]
where \(\bar{r} = \frac{1}{T}\sum_{t=1}^{T} r_t\) is the sample mean return (the average daily return). The \(T-1\) in the denominator is the standard Bessel correction — a tiny statistical adjustment that fixes a bias when you estimate a standard deviation from a sample (you lost “one degree of freedom” by using the sample mean instead of the true mean). In pandas, this is what returns.std() returns by default — and the choice matters: NumPy’s np.std() defaults to \(T\) in the denominator instead. For any return series you will care about in this course, \(T\) is large enough that the difference is cosmetic, but you should know which library uses which convention before you reproduce someone else’s number.
Why Standard Deviation, Not Variance?
Variance has the wrong units. If returns are reported in decimal form, variance is in \(\text{decimal}^2\), which is uninterpretable. Standard deviation has the same units as the returns themselves — a daily volatility of \(0.012\) means “a typical daily move is roughly \(1.2\%\).” This makes it directly comparable to the return itself, and that comparability is exactly what powers the Sharpe ratio later in the chapter. Variance is what the math uses internally (it adds linearly when returns are independent); volatility is what the human reports.
The Square-Root-of-Time Scaling Rule
Almost every volatility number you will read in a research note, a fund factsheet, or a Bloomberg terminal is annualized. The raw computation, however, is almost always done at the daily frequency, because that is the highest-frequency clean data most investors have. The conversion from daily to annual uses a single, simple, frequently-misapplied rule:
\[ \sigma_{\text{annual}} = \sigma_{\text{daily}} \cdot \sqrt{252}. \]
The factor \(\sqrt{252}\) is not arbitrary. It comes from an assumption: if daily returns are independent and have the same variance each day, then variance of the \(T\)-day sum is \(T\) times the one-day variance, and standard deviation is \(\sqrt{T}\) times the one-day standard deviation. The U.S. equity market trades roughly \(252\) days per year (every business day minus holidays), so a one-year horizon corresponds to \(T \approx 252\) in this formula.
More generally, for any aggregation factor \(k\):
\[ \sigma_{k\text{-period}} = \sigma_{\text{1-period}} \cdot \sqrt{k}. \]
Monthly returns annualize by \(\sqrt{12}\). Weekly returns annualize by \(\sqrt{52}\). Daily returns annualize by \(\sqrt{252}\). Hourly returns over a \(24/7\) market like crypto annualize by \(\sqrt{24 \cdot 365}\).
The square-root rule assumes independence and identical distribution of returns across periods. Both fail in the real world:
- Autocorrelation. Returns exhibit small but non-zero autocorrelation at daily frequencies and substantial autocorrelation at longer horizons (momentum, mean reversion).
- Volatility clustering. Big moves cluster — a \(-3\%\) day is more likely to be followed by another large-magnitude day than the i.i.d. assumption allows. GARCH models exist precisely to model this departure.
In practice, \(\sqrt{T}\) scaling is still the industry default — it is wrong in the small, defensible in the large, and almost always the number you will be asked to compare to.
A Worked Numerical Example
Suppose a stock has a daily volatility of \(\hat{\sigma}_{\text{d}} = 0.015\), i.e. \(1.5\%\) per day. Then:
- \(\sigma_{\text{weekly}} = 0.015 \cdot \sqrt{5} \approx 0.0335\) (about \(3.35\%\) per week)
- \(\sigma_{\text{monthly}} = 0.015 \cdot \sqrt{21} \approx 0.0687\) (about \(6.87\%\) per month)
- \(\sigma_{\text{annual}} = 0.015 \cdot \sqrt{252} \approx 0.2381\) (about \(23.8\%\) per year)
A \(23.8\%\) annualized volatility is a fair description of a typical large-cap U.S. equity over the last twenty years — for reference, SPY itself has run between \(13\%\) and \(22\%\) annualized depending on the regime, with a long-run average near \(16\%\).
The blue bars are the volatilities measured directly on resampled return series at each horizon; the red bars are what the \(\sqrt{k}\) rule predicts from the daily number alone. They line up closely because the simulated returns are genuinely i.i.d. The “×” annotations above each pair are the volatility ratios relative to daily — \(\sqrt{5} \approx 2.2\), \(\sqrt{21} \approx 4.6\), \(\sqrt{252} \approx 15.9\) — exactly the multipliers practitioners memorise. In real data the agreement is approximate (volatility clustering distorts it), but the qualitative shape is universal: variance adds with time, so standard deviation grows with \(\sqrt{T}\).
Volatility in Code
The pattern is so common in practice that you should memorize the line:
ann_vol = ret.std() * np.sqrt(252)It is the one-liner that converts a raw return series into the number every risk report displays.
Pitfalls When Annualizing
Three mistakes recur even among experienced practitioners.
Pitfall 1 — Wrong calendar factor. The factor \(252\) is for U.S. equities. International markets vary (\(245\) for the UK, \(246\) for Japan, \(250\) for the Eurozone in some conventions). FX trades around the clock and conventions differ: some shops use \(252\), some use \(260\). Crypto is genuinely \(365\). Mixing conventions across asset classes will produce volatilities that are off by several percent.
Pitfall 2 — Annualizing a number that is not a daily return. If you accidentally annualize a monthly return series with \(\sqrt{252}\), you will multiply by approximately \(15.87\) instead of \(\sqrt{12} \approx 3.46\). The resulting “annualized volatility” of \(300\%\) on a normal equity portfolio is a giveaway, but more subtle mistakes — using a \(\sqrt{252}\) factor on weekly data, for instance — produce numbers that look superficially plausible.
Pitfall 3 — Annualizing volatility but not the mean, or vice versa. Means scale linearly with horizon, volatilities scale with \(\sqrt{T}\). Annualizing mean return with a factor of \(252\) and volatility with a factor of \(\sqrt{252}\) is correct. Using \(252\) for both, or \(\sqrt{252}\) for both, is a common bug in homework solutions and student code.
The annualized Sharpe ratio for a well-diversified equity portfolio sits roughly between \(0.3\) and \(0.7\) over long horizons. If your annualized mean is \(25\%\) and your annualized volatility is \(5\%\), giving a Sharpe of \(5.0\), you have almost certainly mis-annualized one of the two numbers.
Drawdowns and the Running Maximum
Where you’ll see this: when a fund manager hides behind “but my volatility is only 10%!” while their investors are panicking — that’s the gap drawdown reveals. A YouTube finance bro proudly showing his portfolio is at an all-time high tells you nothing about the 40% drop he might have lived through last year. Drawdowns are how you measure that pain.
A drawdown is how far below your previous best you currently are. If your portfolio peaked at $100, then fell to $70, your drawdown right now is -30%. The maximum drawdown over your whole history is the worst pain you ever endured — the question “how bad was the worst slump?” reduced to a single number.
Volatility tells you the average magnitude of daily wiggles. It says nothing about the worst experience an investor has actually lived through. For that, we need the drawdown — the percentage decline from a portfolio’s running peak (its previous best level).
Definition
Let \(W_t\) denote the cumulative wealth at time \(t\) — that is, the value of one dollar invested at time \(0\) and grown by the realized returns through time \(t\):
\[ W_t = \prod_{s=1}^{t}(1 + r_s). \]
The running maximum is the largest value of wealth attained up to and including time \(t\):
\[ M_t = \max_{1 \le s \le t} W_s. \]
The drawdown at time \(t\) is the percentage shortfall from that running peak:
\[ D_t = \frac{W_t - M_t}{M_t} = \frac{W_t}{M_t} - 1. \]
By construction \(D_t \le 0\) for all \(t\): drawdown is either zero (the portfolio is at a new all-time high) or negative (the portfolio is below its previous best). The maximum drawdown is the worst of these values across the sample:
\[ \text{MDD} = \min_{1 \le t \le T} D_t. \]
A maximum drawdown of \(-0.45\) means: “at the worst point of the sample, the portfolio had lost \(45\%\) of its peak value.” For context: SPY’s drawdown from October 2007 to March 2009 (the global financial crisis) was about \(-55\%\); QQQ’s drawdown from March 2000 to October 2002 (the dot-com bust) was about \(-83\%\) — meaning a tech investor who bought at the peak had to wait fifteen years to break even.
Why Drawdown Is the Risk Metric Investors Actually Feel
Volatility is a property of the return distribution. Drawdown is a property of the price path. Two return distributions with identical means and variances can produce wildly different drawdown experiences if the timing of the negative returns differs. A long run of small losses concentrated together produces a deep drawdown; the same losses scattered randomly across the sample produce a much milder one.
End investors — especially retail investors and pension fund trustees — almost always react to drawdown, not volatility. When the news headline reads “Portfolio down 30% from peak,” redemptions follow. When it reads “Portfolio volatility 18% annualized,” nothing happens. This asymmetry is what makes drawdown the single most important number on a fund tear sheet, even though academic finance has spent fifty years developing more sophisticated risk measures.
Computing Drawdowns in Pandas
The cummax() method on a pandas Series returns the running maximum (cummax = “cumulative maximum” — for each row \(t\), it gives you the largest value seen so far, i.e. from row 1 through \(t\)). This is exactly the \(M_t\) we need. The full computation is three lines:
What this gave us: four numbers that together describe the painful side of the track record — where you ended, how deep the worst slump was, when it happened, and whether you’ve recovered. .iloc[-1] is pandas slang for “the last row” (negative indices count from the end, just like in plain Python lists).
Anatomy of a drawdown calculation
A drawdown is built in three stages, each producing its own Series. The diagram below stacks the three Series vertically — equity curve on top, running max in the middle, drawdown on the bottom — and labels what each step does to the row above it.
Read the diagram top-to-bottom. The green dots on the equity curve mark dates where \(W_t\) ties its own running peak — those are the only dates where the drawdown is exactly zero. The middle red step function is the non-decreasing envelope: it only ever ratchets up, never down. The bottom red filled region is the gap between the two — exactly the drawdown — and its lowest point is the maximum drawdown. Three Series, two lines of pandas, one risk number.
The drawdown series is also useful in its own right, not just its minimum. Plotting \(D_t\) over time produces an “underwater chart” — a visualization beloved by hedge-fund allocators because it shows simultaneously how deep and how long the drawdown was.
Visualizing the Underwater Chart
The underwater chart visualizes two distinct dimensions of pain: the depth (how far below water did we go?) and the duration (how long did it take to climb back to a new high?). A shallow but multi-year drawdown is a different animal from a deep but quickly-recovered one, and they require different conversations with investors. The duration of the longest drawdown — sometimes called the time underwater — is a separate metric that institutional allocators frequently compute alongside the max drawdown.
Maximum drawdown is a minimum-order statistic over the sample. It is therefore extremely sensitive to the sample endpoints. Reporting “max drawdown of \(-12\%\)” on a five-year sample that happens to omit 2008 and 2020 is technically true and economically misleading. When you read a tear sheet, the first thing to check is the sample period.
The Sharpe Ratio
Where you’ll see this: every fund pitch deck contains a line like “we achieved a Sharpe ratio of 2.5!” — and most of those claims are either over-fitted, computed on too-short a sample, or use leverage that the volatility hides. By the end of this section you’ll know exactly how to compute it yourself, what numbers are plausible, and what counts as a red flag.
The Sharpe ratio is “reward divided by risk”. If two funds both made 10% last year, but Fund A’s volatility was 5% while Fund B’s was 30%, Fund A is the better fund — it earned the same reward with less stomach-churning. Sharpe makes that comparison numerical: higher Sharpe = more return per unit of risk.
From Two Numbers to One
So far we have produced two numbers from a return series: a mean \(\bar{r}\) (average return) and a volatility \(\hat{\sigma}\) (the wiggle size). Neither alone tells you whether the investment was worth it. A \(20\%\) return is excellent if it came with \(10\%\) volatility; it is mediocre at best if it came with \(40\%\) volatility, because at that risk level you could have built a leveraged Treasury position — a borrowed-money bet on safe US government bonds — that produced the same expected return with cleaner downside behavior.
William Sharpe’s 1966 paper proposed the cleanest one-number summary of this trade-off: the Sharpe ratio, defined as the excess return per unit of volatility. In the formula below, \(\bar{r}\) is your investment’s average return, \(r_f\) is what you could have earned doing nothing risky (the “risk-free rate” — see below), and \(\hat{\sigma}\) is the volatility we just defined:
\[ \text{SR} = \frac{\bar{r} - r_f}{\hat{\sigma}}. \]
Here \(r_f\) is the risk-free rate — the return on essentially safe assets like short-term US government bills, your “do nothing risky” benchmark — measured over the same horizon as the returns. The numerator measures how much you earned over and above that safe alternative. The denominator scales by how much fluctuation you had to live through to earn it. The ratio is a slope: the steeper the slope, the more reward per unit of risk.
For decades, Sharpe ratio has been the single most cited performance metric in finance. It appears in every fund factsheet, every consultant evaluation, every robo-advisor dashboard. Sharpe himself shared the 1990 Nobel Memorial Prize in part for this insight.
Annualizing the Sharpe Ratio
Because mean returns scale linearly with horizon and volatilities scale with \(\sqrt{T}\), the Sharpe ratio scales with \(\sqrt{T}\):
\[ \text{SR}_{\text{annual}} = \text{SR}_{\text{daily}} \cdot \sqrt{252}. \]
The derivation is a one-line calculation. If daily excess return has mean \(\mu_d\) and standard deviation \(\sigma_d\), then annual excess return has mean \(252 \mu_d\) and standard deviation \(\sigma_d \sqrt{252}\), so
\[ \text{SR}_{\text{annual}} = \frac{252 \mu_d}{\sigma_d \sqrt{252}} = \frac{\mu_d}{\sigma_d} \cdot \sqrt{252} = \text{SR}_{\text{daily}} \cdot \sqrt{252}. \]
This is why a daily Sharpe ratio of \(0.05\) is actually excellent (\(0.05 \cdot \sqrt{252} \approx 0.79\) annualized), and a daily Sharpe of \(0.01\) is mediocre (\(\approx 0.16\) annualized). The raw daily number looks tiny because \(\sqrt{252}\) is large.
Benchmarks for Sharpe Ratios
Calibration is everything. The following table summarizes roughly what to expect across asset classes and strategies, based on long-run U.S. data:
| Strategy | Typical annualized Sharpe |
|---|---|
| Cash / T-bills (the risk-free leg) | \(0.00\) by construction |
| U.S. equities (SPY, long-only) | \(0.4\) – \(0.5\) |
| U.S. aggregate bonds (AGG) | \(0.3\) – \(0.5\) |
| 60/40 balanced portfolio | \(0.5\) – \(0.7\) |
| Diversified hedge-fund composite | \(0.5\) – \(0.8\) |
| Top-decile quant equity market-neutral | \(1.0\) – \(1.5\) |
| High-frequency market-making (private) | \(3.0\) – \(10.0\)+ |
A claim of annualized Sharpe above \(2\) on a long-only equity strategy (one that only buys stocks, no shorting, no derivatives) should be treated with deep skepticism — either the sample is too short, the strategy is using leverage (borrowed money) that the reported volatility ignores, or there is a methodological error somewhere. Sharpe ratios above \(1\) are rare and almost always come from short-horizon, high-turnover strategies — i.e. ones that trade many times a day and can only manage a small amount of money before their edge disappears.
Computing the Sharpe Ratio
What this gave us: the four numbers you’d put on a one-page strategy summary — average return, volatility, daily Sharpe (which always looks tiny), and the annualised Sharpe that everyone actually quotes.
A few real-world details that matter:
- Use excess returns, not raw returns, when computing the standard deviation in the denominator. (Excess return = your return minus the risk-free rate.) In practice, the difference is tiny at daily frequencies because \(r_f\) is small and nearly constant, but the correct definition uses the std of the excess series.
- Be honest about \(r_f\). A 5-year sample spanning 2020–2024 saw the U.S. risk-free rate vary from near \(0\%\) to over \(5\%\). Using a single average is fine for a rough number; using a time-varying \(r_f\) from the FRED 3-month T-bill series (FRED is the free macro database run by the St. Louis Fed; T-bill = short-term Treasury bill, the canonical “safe asset”) is what a real performance-attribution system does.
- The Sharpe ratio is itself a statistic with sampling error — meaning that if you re-ran history with different luck, you’d get a different Sharpe even from the same strategy. The standard error (a measure of how much that “luck wobble” affects the number) of an annualized Sharpe ratio computed from \(T\) daily observations is approximately \(\sqrt{(1 + 0.5 \cdot \text{SR}^2)/T} \cdot \sqrt{252}\). For a five-year sample (\(T = 1260\)) and a true Sharpe of \(0.5\), that standard error is around \(0.32\). Two strategies with reported Sharpes of \(0.6\) and \(0.9\) are not reliably distinguishable on a five-year sample. Most casual comparisons of Sharpe ratios ignore this.
Why the Sharpe Ratio Alone Is Not Enough
The Sharpe ratio compresses a return distribution into two moments — mean and volatility (in statistics, “moments” are summary numbers like mean, variance, skewness, kurtosis that describe the shape of a distribution) — and discards everything else. This is a feature when those two moments are sufficient (i.e. when returns are approximately normally distributed, the classic bell-curve shape). It is a bug when they are not, which is essentially always in finance.
Tails. Real return distributions have fatter tails than the normal distribution — meaning extreme events (big crashes, big rallies) happen much more often than a bell curve would predict. A strategy that earns small positive returns most days and occasionally suffers a catastrophic loss — selling out-of-the-money options (insurance-like contracts that pay zero most of the time but can lose huge amounts in a crash) is the canonical example — can have a beautiful Sharpe ratio for years and blow up in a single afternoon. The 1998 collapse of Long-Term Capital Management, a hedge fund run by Nobel laureates that lost $4.6 billion in months, followed exactly this profile. Volatility does not see tail risk, so Sharpe does not see tail risk.
Skewness. Two strategies with the same mean and same volatility can have very different skewness — the lopsidedness of the return distribution. Insurance-like strategies have negative skew (many small wins, occasional large losses — think of selling earthquake insurance: you collect premiums until the big one hits). Lottery-like strategies have positive skew (many small losses, occasional large wins). Investors strongly prefer positive skew, all else equal, but Sharpe is blind to skew.
Drawdown blindness. Sharpe says nothing about how losses are clustered in time. A strategy with steady drip-drip losses concentrated in a six-month window has the same Sharpe as one whose losses are scattered uniformly through the sample — but the first one terrifies investors. This is precisely the case we examined in the previous section.
This is why the modern performance report displays Sharpe alongside drawdown, alongside skewness and kurtosis (kurtosis = “how fat are the tails of the distribution” — higher kurtosis means more outliers than a bell curve), and increasingly alongside a tail-risk measure such as conditional value-at-risk (CVaR — the average loss on your worst-5% days, a more honest summary of crash risk than vol alone). Sharpe is the entry ticket. It is not the full performance picture.
The six portfolios are simulated tracks calibrated to occupy different corners of the risk plane. The “High-Sharpe trap” is the cautionary tale: its day-to-day volatility is low, so the Sharpe ratio is flattering, yet a single concentrated stress event pulls its drawdown deep into the red. A practitioner who screens funds on Sharpe alone would have ranked it near the top; an allocator who also checked the drawdown column would have rejected it on sight. Reading risk in two dimensions — Sharpe and drawdown — is the minimum defensible standard.
A high Sharpe ratio is necessary but not sufficient evidence of a good strategy. Always pair Sharpe with maximum drawdown and a glance at the return distribution shape (skewness, kurtosis, worst-day return) before drawing conclusions.
Cross-Asset Correlation
Why this matters: “diversification reduces risk” is the most repeated cliché in personal finance. But how much it reduces risk depends entirely on a single number: the correlation between the things you bought. If you own AAPL and MSFT, you’re not very diversified — they march together. Correlation tells you, precisely, what counts as diversified.
Correlation is a number between −1 and +1 that measures how synchronously two assets move. +1 means “they march in lockstep”. 0 means “they have nothing to do with each other”. −1 means “when one goes up, the other goes down by a proportional amount”. Diversification benefits come from correlations below +1.
From One Asset to Two
Every statistic so far has been a property of a single return series. Once we hold two assets, a new question becomes the dominant one: how do they move together? Correlation is the answer.
Recall from Chapter 3 (and from any prior statistics course) that the Pearson correlation between two return series \(\{r^{(1)}_t\}\) and \(\{r^{(2)}_t\}\) is:
\[ \rho_{1,2} = \frac{\text{Cov}(r^{(1)}, r^{(2)})}{\sigma_1 \, \sigma_2} = \frac{\sum_{t=1}^T (r^{(1)}_t - \bar{r}^{(1)})(r^{(2)}_t - \bar{r}^{(2)})}{\sqrt{\sum_{t=1}^T (r^{(1)}_t - \bar{r}^{(1)})^2 \cdot \sum_{t=1}^T (r^{(2)}_t - \bar{r}^{(2)})^2}}. \]
The metric lives in \([-1, +1]\). A value near \(+1\) says the two assets march in lock-step; near \(-1\) says they move oppositely; near \(0\) says they are uncorrelated, at least in the linear sense. The pairwise correlations across \(N\) assets form an \(N \times N\) symmetric matrix with ones on the diagonal — the correlation matrix, which is the input every portfolio optimizer ever written consumes.
Typical Cross-Asset Correlations
Some long-run correlations from U.S. data are worth committing to memory because they organize how you think about diversification:
- SPY and large-cap U.S. stocks (e.g. AAPL, MSFT): \(\rho \approx 0.6\) to \(0.8\). Large-caps inherit most of their daily variation from the market.
- SPY and small-cap U.S. stocks (IWM): \(\rho \approx 0.85\) to \(0.95\) — closer than people expect.
- SPY and developed international equities (EFA): \(\rho \approx 0.7\) to \(0.9\).
- SPY and U.S. aggregate bonds (AGG): \(\rho \approx -0.1\) to \(+0.3\), highly regime-dependent. The post-2000 stock-bond correlation was generally negative; the post-2022 regime flipped it back to positive.
- SPY and gold: \(\rho \approx -0.1\) to \(+0.2\), weakly negative on average.
- SPY and Bitcoin: \(\rho \approx 0.2\) to \(0.5\) since 2020, despite the marketing.
- Two random S&P 500 stocks in the same sector: \(\rho \approx 0.5\) to \(0.7\).
- Two random S&P 500 stocks in different sectors: \(\rho \approx 0.3\) to \(0.5\).
The stock-bond correlation deserves its own warning: it is not a constant. Investors who built 60/40 portfolios in 1995–2020 on the assumption of \(\rho \approx -0.3\) discovered in 2022 that the correlation can flip to \(+0.5\) in a regime shift, and the portfolio’s diversification benefit largely evaporates in that regime.
Correlation Matrix in Pandas
The diagonal is identically one. The off-diagonal entries are the pairwise correlations. For visual analysis at larger scale, you would pass the matrix to a heatmap (e.g. seaborn.heatmap); the matrix itself is the object every risk model consumes.
Rolling Correlation
A single correlation number computed over the entire sample masks an important fact: correlations move. A rolling correlation computed over a 60-day window reveals when two assets are coupling and when they are decoupling. This is the diagnostic that flagged the 2022 stock-bond correlation flip months before consensus caught up.
The window length is a modeling choice: 20 days is reactive but noisy, 252 days (one year) is stable but slow to detect regime change. Sixty days is a common compromise. In any case, plotting the rolling correlation is usually more informative than reporting the full-sample number.
A correlation of zero does not mean two assets are independent. It means there is no linear association. Two assets can have \(\rho = 0\) and still co-move strongly via a quadratic relationship (e.g. both move sharply when a third variable moves, regardless of direction). In finance, the most important non-linear effect is the tail correlation jump: pairs of assets that look uncorrelated in normal markets often become highly correlated during crises. The 2008 financial crisis is the canonical case — virtually every risky asset class became highly correlated as liquidity dried up.
Diversification: The Two-Asset Portfolio Variance
Why this matters: when a robo-advisor builds a “balanced portfolio” for you, this formula is doing the work behind the scenes. It is also the single insight Harry Markowitz won a Nobel Prize for in 1990 — so it’s worth understanding rather than just trusting the app.
Suppose you hold two assets. The portfolio’s return is just a weighted average of the two returns (linear, intuitive). But the portfolio’s risk is not — it has an extra term that depends on how the two assets move together. When they’re less than perfectly correlated, that term makes the combined risk smaller than the average of the two individual risks. That gap is the “free lunch” of diversification.
Why Correlation Pays the Bills
Now the punchline. Correlation is not just a descriptive statistic — it directly determines how much risk reduction you get from holding more than one asset. This is the mathematical core of diversification, and it was the insight for which Harry Markowitz shared the 1990 Nobel Prize.
Consider a portfolio of two assets with weights \(w_1\) and \(w_2 = 1 - w_1\). A weight is just the fraction of your money in each asset — if you put 60% in stocks and 40% in bonds, then \(w_1 = 0.6\) and \(w_2 = 0.4\). They have to add up to 1 because you can’t allocate more than 100% of what you have (without borrowing). Let the assets have expected returns \(\mu_1, \mu_2\) (Greek letter “mu” — the standard symbol for “mean”), volatilities \(\sigma_1, \sigma_2\), and correlation \(\rho\) (Greek letter “rho”). Portfolio return is the weighted average:
\[ r_p = w_1 r_1 + w_2 r_2, \qquad \mathbb{E}[r_p] = w_1 \mu_1 + w_2 \mu_2. \]
Portfolio variance, by contrast, is not a weighted average. It includes a cross-term:
\[ \sigma_p^2 = w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2 + 2 w_1 w_2 \rho \sigma_1 \sigma_2. \]
The volatility is the square root:
\[ \sigma_p = \sqrt{w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2 + 2 w_1 w_2 \rho \sigma_1 \sigma_2}. \]
The entire diversification story is in the third term, \(2 w_1 w_2 \rho \sigma_1 \sigma_2\).
Three Special Cases
Case 1: \(\rho = +1\) (perfect positive correlation). The variance formula collapses to \(\sigma_p^2 = (w_1 \sigma_1 + w_2 \sigma_2)^2\), so \(\sigma_p = w_1 \sigma_1 + w_2 \sigma_2\). Volatility is a weighted average; there is no diversification benefit at all. The two assets are economically the same risk in different packaging.
Case 2: \(\rho = 0\) (uncorrelated). The cross-term vanishes. Variance is the weighted sum of variances: \(\sigma_p^2 = w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2\). For an equal-weighted portfolio with \(\sigma_1 = \sigma_2 = \sigma\), this gives \(\sigma_p = \sigma / \sqrt{2}\) — volatility drops by roughly \(30\%\) just from holding two equal-volatility uncorrelated assets instead of one.
Case 3: \(\rho = -1\) (perfect negative correlation). With the right weights, \(\sigma_p\) can be driven all the way to zero. Specifically, setting \(w_1 = \sigma_2 / (\sigma_1 + \sigma_2)\) produces \(\sigma_p = 0\) — a risk-free portfolio out of two risky assets. This is the theoretical limit. In practice, no two real assets have \(\rho = -1\), but pairs with \(\rho\) near \(-0.5\) exist (long-duration Treasuries vs. equities in some regimes) and produce meaningful risk reduction.
Why Less-Than-Perfect Correlation Is Free Money
This is the punchline of modern portfolio theory: as long as \(\rho < 1\), combining two risky assets always produces a portfolio whose volatility is strictly less than the weighted average of the two individual volatilities. The reduction is proportional to \((1 - \rho)\). Diversification is a free lunch in volatility terms — the only free lunch most academics will admit exists in finance.
The intuition: when one asset zigs and the other zags, their fluctuations partially cancel, leaving a smoother combined path. The deeper the cancellation (more negative \(\rho\)), the smoother the path. Even small reductions in \(\rho\) produce measurable risk reduction; this is why a portfolio of \(50\) stocks from different sectors has dramatically lower volatility than a single stock, even though each individual stock might have \(\rho \approx 0.5\) with each other.
A Numerical Illustration
The pattern: at \(\rho = +1\) there is zero reduction; at \(\rho = 0\) the portfolio volatility falls by roughly \(29.3\%\) relative to the weighted average; at \(\rho = -0.5\) it falls by about \(50\%\).
Each curve plots portfolio volatility as a function of the weight in the first asset, for a different correlation. At \(\rho = +1\) the curve coincides with the naive weighted-average dotted line — there is no diversification at all. As \(\rho\) falls, the curve sags below that line, and the gap is the diversification benefit. At \(\rho = -1\) the curve touches zero at the 50/50 point, which is the mathematical limit of two-asset diversification. Real cross-asset pairs sit in the \(\rho \in [-0.3, 0.5]\) band, well inside this picture.
Beyond Two Assets
For \(N\) assets, the portfolio variance formula generalizes to:
\[ \sigma_p^2 = \sum_{i=1}^N \sum_{j=1}^N w_i w_j \rho_{ij} \sigma_i \sigma_j = \mathbf{w}^\top \Sigma \mathbf{w}, \]
where \(\Sigma\) is the \(N \times N\) covariance matrix. The structure is identical — it is just a quadratic form. As \(N\) grows, the average pairwise correlation \(\bar{\rho}\) becomes the dominant driver of portfolio volatility; the individual variances matter less. This is why, when an equity portfolio grows to \(30\) or \(50\) stocks, additional names produce diminishing risk reduction: you have already absorbed most of the diversification you can get given the average pairwise correlation in the asset class.
Volatility Drag and the CAGR Approximation
Where you’ll see this: “this strategy averaged 15% per year!” — sounds great, except average can mean two different things, and which one the marketer uses changes the actual wealth you end up with by a lot. This section explains the silent gap that swallows real investor money.
If a stock goes -50% one year and +50% the next, the arithmetic average is 0%. But you actually ended at $0.75 from each $1, i.e. a loss of 25%. The arithmetic mean is misleading whenever the asset is volatile, and the size of the lie grows with the square of the volatility. This invisible tax is called volatility drag.
The Arithmetic-Geometric Gap
There is one more piece of vocabulary every investor needs: the gap between arithmetic mean return (the everyday “add them up and divide by N” average) and geometric (compound) mean return (the per-period rate that actually generates the observed final wealth). They are not the same number, and the difference — the volatility drag — grows with volatility.
The arithmetic mean of a return series is what we have been computing all along:
\[ \bar{r}_{\text{arith}} = \frac{1}{T} \sum_{t=1}^T r_t. \]
The geometric mean is what an investor actually earned per period, accounting for compounding:
\[ \bar{r}_{\text{geom}} = \left( \prod_{t=1}^T (1 + r_t) \right)^{1/T} - 1. \]
The geometric mean is what is sometimes called the CAGR (Compound Annual Growth Rate) when computed at the annual frequency — it’s the “if I plug a single constant growth rate into a compound interest formula, what rate would reproduce the actual final wealth?” number.
The Volatility Drag Approximation
For “small” returns (which daily and even monthly returns are), there is a beautiful approximation that connects the two:
\[ \bar{r}_{\text{geom}} \approx \bar{r}_{\text{arith}} - \frac{\sigma^2}{2}. \]
The correction term \(\sigma^2/2\) is the volatility drag. It is non-negative — volatility always pulls the realized geometric return below the arithmetic mean. The intuition is elementary: a \(-50\%\) return followed by a \(+50\%\) return averages to \(0\%\) arithmetically, but leaves you at \(75\%\) of your starting capital (because the second year’s +50% applies to your reduced $0.50, not your original $1). The volatility forced a permanent loss of capital that the arithmetic mean cannot see.
The derivation comes from a second-order Taylor expansion of \(\log(1 + r)\) around \(r = 0\):
\[ \log(1 + r) \approx r - \frac{r^2}{2}. \]
Taking expectations on both sides and recognizing that the geometric mean of \((1 + r_t)\) is \(\exp(\mathbb{E}[\log(1 + r_t)])\):
\[ \mathbb{E}[\log(1 + r)] \approx \mu - \frac{\mathbb{E}[r^2]}{2} = \mu - \frac{\mu^2 + \sigma^2}{2} \approx \mu - \frac{\sigma^2}{2}, \]
where the last step drops \(\mu^2/2\) as a higher-order term when \(\mu\) is small.
Why This Matters in Practice
For an equity portfolio with annualized arithmetic mean \(10\%\) and annualized volatility \(20\%\), the volatility drag is:
\[ \frac{\sigma^2}{2} = \frac{0.20^2}{2} = 0.02 = 2\%. \]
That is the gap between the headline arithmetic mean (\(10\%\)) and the CAGR the investor actually compounds at (\(\approx 8\%\)). Two percent per year, compounded over a 40-year career, is the difference between \(1.10^{40} \approx 45.3\times\) and \(1.08^{40} \approx 21.7\times\) — roughly half the terminal wealth. Volatility drag is not a footnote. It is one of the largest line items in a long-horizon investor’s lifetime P&L.
Two operational consequences follow:
Always quote CAGR, not arithmetic mean, when reporting long-horizon returns. A fund that advertises “12% average annual return” while running a 30% volatility is misleading its investors. The actual compound rate is closer to \(12\% - 0.30^2/2 = 7.5\%\).
Risk reduction has direct return consequences. Cutting volatility from \(30\%\) to \(20\%\) (via diversification) recovers \(0.30^2/2 - 0.20^2/2 = 2.5\%\) of geometric return per year, even if the arithmetic mean is unchanged. This is the most overlooked argument for diversification: it does not just reduce risk, it raises long-run compound growth.
\(\text{CAGR} \approx \text{Arithmetic mean} - \frac{1}{2} \sigma^2\).
For most equity portfolios you encounter, \(\sigma^2/2\) is in the range of \(1\%\) to \(3\%\) per year. That is the volatility tax you pay every year for taking on risk.
Computing CAGR and Drag
What this gave us: five numbers that, side by side, show the volatility-drag formula working in practice: arithmetic mean minus drag really is approximately CAGR, validating the rule of thumb.
The “approx” line and the exact CAGR should agree to within a few basis points. When they disagree by more than that, the issue is usually a long sample over which the approximation \(\mu^2/2 \approx 0\) breaks down.
Worked Example: A 60/40 SPY/AGG Portfolio
Why this matters: if you’ve ever heard the phrase “balanced portfolio” — in a personal-finance article, in an MBA class, from your parents’ financial advisor — they’re almost always talking about something close to 60/40. It is the default retirement portfolio for hundreds of millions of people, and this final worked example shows you every diagnostic you’d run on it before recommending it.
We now assemble every tool in this chapter into a single, end-to-end analysis of a classic balanced portfolio: \(60\%\) U.S. equities (proxied by SPY) and \(40\%\) U.S. investment-grade bonds — bonds rated as low-default-risk by credit agencies — proxied by AGG (an ETF tracking the broad US bond market). The \(60/40\) allocation has been the default mix for U.S. pensions and individual retirement accounts for forty years. Understanding its risk-return profile is foundational.
For reproducibility in the browser, we simulate plausible daily return paths for SPY and AGG with statistical properties calibrated to long-run historical data: SPY at roughly \(10\%\) annual mean return and \(16\%\) annual volatility, AGG at \(4\%\) mean and \(5\%\) volatility, with a mildly negative correlation of \(-0.2\).
Step 1: Generate the Return Data
Step 2: Build the 60/40 Portfolio
What this gave us: a new column "60/40" whose value each day is just 0.6 × SPY return + 0.4 × AGG return. The @ symbol in ret @ weights is Python’s matrix-multiplication operator — here it’s just a slick way to compute the weighted sum without a loop.
The portfolio return on day \(t\) is \(r_{p,t} = 0.6 \cdot r_{\text{SPY},t} + 0.4 \cdot r_{\text{AGG},t}\). That is the entire portfolio-construction step; the daily rebalancing assumption (rebalancing = adjusting the holdings back to the target 60/40 split after they drift) is the only thing keeping the weights fixed at \(60/40\) over the sample. In real work, monthly or quarterly rebalancing is closer to industry practice — daily rebalancing produces nearly identical results when transaction costs are ignored.
Step 3: Volatility, Mean, and Sharpe for Each Series
The expected pattern: the \(60/40\) portfolio has volatility much closer to AGG than to SPY (about \(10\%\) versus SPY’s \(16\%\) and AGG’s \(5\%\)), and its Sharpe ratio is higher than either component’s. This is the entire point of diversification — the portfolio’s risk-adjusted return exceeds that of its parts.
Step 4: Drawdowns and the Underwater Chart
Expect the \(60/40\) portfolio’s maximum drawdown to be deeper than AGG’s but substantially shallower than SPY’s — typically around half of SPY’s. That is the risk-reduction benefit of the bond sleeve, made visible.
Step 5: Volatility Drag and CAGR
The volatility drag for SPY (around \(1.3\%\) annually) is roughly four times that of AGG (about \(0.13\%\)). The \(60/40\) portfolio’s drag sits in between but is much closer to the lower end because portfolio volatility is non-linear in the weights — diversification disproportionately cuts the drag.
Step 6: Putting It All Together
The full picture for this simulated sample:
| Series | Annualized mean | Annualized vol | Sharpe | Max DD | CAGR |
|---|---|---|---|---|---|
| SPY | ~10% | ~16% | ~0.44 | ~-25% to -35% | ~8.7% |
| AGG | ~4% | ~5% | ~0.20 | ~-7% to -10% | ~3.9% |
| 60/40 | ~7.6% | ~10% | ~0.46 | ~-13% to -18% | ~7.1% |
Three things to notice:
- The \(60/40\) Sharpe ratio is higher than either SPY or AGG individually. This is diversification at work — the portfolio has earned more return per unit of risk than any single component could.
- The \(60/40\) maximum drawdown is roughly half of SPY’s. The bond sleeve cushions equity crashes (assuming the negative or near-zero stock-bond correlation regime holds).
- The \(60/40\) CAGR is closer to SPY than to AGG, but the path it took to get there was much smoother. For an investor with finite tolerance for drawdown — i.e. every investor — that smoother path is the entire reason to hold AGG at all.
Diversification raises Sharpe, reduces drawdown, and improves compounding — simultaneously. It is the closest thing to a free lunch in finance, and the entire portfolio-management industry exists to exploit it. The only price is that the diversifying asset (here, AGG) must have \(\rho < 1\) with the core asset. When that correlation flips — as it did in 2022 — the diversification benefit shrinks dramatically, and 60/40 portfolios deliver worse drawdowns than expected. Always monitor the rolling correlation.
Exercises
Exercise 1 — Annualizing volatility correctly
A research analyst sends you the following statistics for an emerging-markets equity fund:
- Mean return: \(0.08\%\) per trading day
- Standard deviation of returns: \(1.4\%\) per trading day
A second analyst sends you the equivalent statistics from a monthly dataset for the same fund:
- Mean return: \(1.7\%\) per month
- Standard deviation of returns: \(6.2\%\) per month
Compute the annualized mean and annualized volatility from each dataset (using \(252\) trading days and \(12\) months per year). They should approximately agree. Do they? If not, what is the most likely explanation? Hint: think about the assumptions underlying \(\sqrt{T}\) scaling.
Exercise 2 — Building a drawdown function
Write a Python function drawdown_stats(returns) that takes a pandas Series of daily returns and returns a dictionary containing:
- The maximum drawdown (a negative number).
- The date (or integer index) at which the maximum drawdown was reached.
- The peak date (or index) that preceded the maximum drawdown — i.e. where the wealth was highest before the worst loss.
- The recovery date — the first date after the trough at which wealth returned to the previous peak. If the recovery has not occurred by the end of the sample, return
Nonefor this field.
Test your function on the simulated SPY series from the 60/40 worked example.
Exercise 3 — Sharpe under a time-varying risk-free rate
In the worked example we assumed a constant \(3\%\) annual risk-free rate. In reality, the U.S. 3-month T-bill rate moved from approximately \(0.05\%\) in 2021 to over \(5.3\%\) in 2023. Simulate a daily risk-free rate path that linearly rises from \(0.0001\) (about \(2.5\%\) annual) to \(0.0002\) (about \(5\%\) annual) over a 5-year sample, and recompute the Sharpe ratio of a fixed return series under (a) the average \(r_f\) and (b) the time-varying \(r_f\). How much does the choice matter? Under what circumstances would it matter more?
Exercise 4 — Diversification under regime change
Consider two assets with annualized volatilities \(\sigma_1 = 0.18\) and \(\sigma_2 = 0.18\), equal weights \(w_1 = w_2 = 0.5\), and correlation \(\rho\). Compute portfolio volatility for \(\rho \in \{-0.5, 0, 0.3, 0.6, 0.9, 1.0\}\). Plot portfolio volatility against \(\rho\). By what percentage does portfolio volatility increase when \(\rho\) rises from \(0\) to \(0.6\) (a typical “all assets going down together” regime)? Discuss the implication for a fund manager whose stress tests assume \(\rho = 0\).
Exercise 5 — Volatility drag over a long horizon
A portfolio has arithmetic annual mean \(12\%\) and annual volatility \(\sigma\). The investor holds it for \(30\) years.
Compute the terminal wealth from \(\$1\) starting capital using the CAGR approximation \(\bar{r}_{\text{geom}} \approx \bar{r}_{\text{arith}} - \sigma^2/2\), for \(\sigma \in \{0.10, 0.20, 0.30, 0.40\}\).
The investor’s financial advisor is using the arithmetic mean of \(12\%\) to project terminal wealth (i.e. ignoring volatility drag entirely). For each \(\sigma\), compute the projection error — the ratio of the advisor’s predicted wealth to the true expected compound wealth.
At what level of \(\sigma\) does the advisor’s projection overstate terminal wealth by a factor of \(2\) or more? Comment on the implications for retirement planning under high-volatility strategies.
You now have the four core diagnostics — volatility, drawdown, Sharpe, correlation — and the two derived ideas — diversification benefit and volatility drag — that govern essentially every conversation in performance evaluation, asset allocation, and risk management. Chapter 5 lifts these tools from a single portfolio to cross-sectional questions: how do we compare hundreds of assets at once? How do we screen, rank, and build portfolios systematically? The statistics of this chapter become the inputs to the optimization machinery of the next.