# Knowledge Base/Finance/Commentary on Mathematical Finance Textbooks

Whenever you're called on to make up your mind,
and you're hampered by not having any,
the best way to solve the dilemma, you'll find,
is simply by spinning a penny.
No — not so that chance shall decide the affair
while you're passively standing there moping;
but the moment the penny is up in the air,
you suddenly know what you're hoping.

A grook by Piet Hein (1905-1996)

## Measure, Integral and Probability (second edition) by Marek Capiński and Ekkehard Kopp

### Page 8, Example 1.3

We are "using the fact that f is increasing" in that we notice that

$M_i = a_i = \frac{i}{n}$ in U(Pn,f)
$m_i = a_{i-1} = \frac{i-1}{n}$ in L(Pn,f)

The algebra is easy enough to check:

$U(P_n, f) = \sum_{i=1}^n (\frac{i}{n}) \{(\frac{i}{n})^2 - (\frac{i-1}{n})^2\}$
$= \sum_{i=1}^n (\frac{i}{n}) \{\frac{i^2}{n^2} - \frac{i^2 - 2i + 1}{n^2}\}$
$= \sum_{i=1}^n (\frac{2i^2 - i}{n^3})$
$= \frac{1}{n^3} \sum_{i=1}^n (2i^2 - i)$

Similarly:

$L(P_n, f) = \sum_{i=1}^n (\frac{i-1}{n}) \{(\frac{i}{n})^2 - (\frac{i-1}{n})^2\}$
$= \sum_{i=1}^n (\frac{i-1}{n}) \{\frac{i^2 - i^2 + 2i - 1}{n^2}\}$
$= \sum_{i=1}^n \{ \frac{(i - 1)(2i - 1)}{n^3} \}$
$= \sum_{i=1}^n \{ \frac{2i^2 - i - 2i + 1}{n^3} \}$
$= \frac{1}{n^3} \sum_{i=1}^n (2i^2 - 3i + 1)$

The book claims that "the integral must be $\frac{2}{3}$, since both U(Pn,f) and L(Pn,f) converge to this value, as is easily seen".

This is easily seen if and only if one remembers the following elementary results:

$\sum_{i=1}^n 1 = n$

which is obvious as we are adding n 1's;

$\sum_{i=1}^n i = \frac{1}{2} n (n + 1)$

which is the sum of an arithmetic series;

$\sum_{i=1}^n i^2 = \frac{1}{6} n (n + 1) (2n + 1)$

which is the ith square pyramidal number. These results are certainly worth remembering!

So

$\frac{1}{n^3} \sum_{i=1}^n (2i^2 - i)$
$= \frac{1}{n^3} \{ 2 \sum_{i=1}^n i^2 - \sum_{i=1}^n i \}$
$= \frac{1}{n^3} \{ 2 \cdot \frac{2n^3 + 3n^2 + n}{6} - \frac{n^2 + n}{2} \}$
$= \frac{2n^3 + 3n^2 + n}{3n^3} - \frac{n^2 + n}{2n^3}$ — now multiply both sides by $\frac{n^{-3}}{n^{-3}}$
$= \frac{2 + 3n^{-1} + n^{-2}}{3} - \frac{n^{-1} + n^{-2}}{2} \rightarrow \frac{2}{3}$ as $n \rightarrow \infty$

because $n^{-1}, n^{-2} \rightarrow 0$ as $n \rightarrow \infty$.

Similarly,

$\frac{1}{n^3} \sum_{i=1}^n (2i^2 - 3i + 1)$
$= \frac{1}{n^3} \{ 2 \sum_{i=1}^n i^2 - 3 \sum_{i=1}^n i + \sum_{i=1}^n 1 \}$
$= \frac{1}{n^3} \{ 2 \cdot \frac{2n^3 + 3n^2 + n}{6} - 3 \cdot \frac{n^2 + n}{2} + n \}$
Failed to parse (syntax error): = \frac{2n^3 + 3n^2 + n}{3n^3} - \frac{3n^2 + 3n}{2n^3} } + \frac{1}{n^2}
— now multiply both sides by $\frac{n^{-3}}{n^{-3}}$

$= \frac{2 + 3n^{-1} + n^{-2}}{3} - \frac{3n^{-1} + 3n^{-2}}{2} + n^{-2} \rightarrow \frac{2}{3}$ as $n \rightarrow \infty$

because $n^{-1}, n^{-2} \rightarrow 0$ as $n \rightarrow \infty$.

### Page 9, "(which is of course automatically bounded)"

Continuous functions on the closed interval [a,b] are bounded. Despite the "of course", and although the result is somewhat intuitive, it is not obvious.

There are several ways to prove this result.

We could start with the Heine-Borel theorem: whichever way you write [a,b] as a union of a collection of open sets, you can also write it as a union of a finite subcollection of that collection of open sets. Consider the sets $U_x = \{x': x' \in [a, b], |f(x') - f(x)| < 1\}$ for each $x \in [a, b]$. The sets Ux are open (because f is continuous) and their union is all of [a,b]. By the Heine-Borel theorem, there are $x_1, \ldots, x_n \in [a, b]$ such that the union of the sets $U_{x_i}$ is also all of [a,b]. But then f is bounded by 1 + max f(xi).

Another proof starts with the Bolzano-Weierstrass theorem: every sequence in [a,b] (in fact, every bounded sequence) has a convergent subsequence. Assume for a contradiction that a continuous function f from [a,b] to $\mathbb{R}$ is unbounded. Then there is a bounded sequence (xn) such that f(xn) > n for every n. Pick a convergent subsequence from this sequence. Say it converges to some $x \in [a, b]$. We can then check that f cannot be continuous at x. Contradiction.

How different are these proofs? Both the Heine-Borel theorem and Bolzano-Weierstrass theorem state, in somewhat different ways, that [a,b] is compact. (Indeed, a metrisable space is compact if and only if it is sequentially compact, which makes the two theorems essentially the same for $\mathbb{R}^n$.)

In http://www.dpmms.cam.ac.uk/~wtg10/bounded.html, Timothy Gowers explains "How to find an unusual proof that continuous functions on the closed interval [0,1] are bounded" which is more elementary than the two proofs above. It is worth reading.

### Page 18, "Step 2. The intervals $I_k^n$ are formed into a sequence"

We use the ancient trick, the "diagonal argument", which is commenly used to show that $\mathbb{Q}$ and some other sets are countable (this will be mentioned on page 16). We can enumerate the Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): I_k_n . The trick here is the order. It's easier to visualise it if we produce the following diagram:

Failed to parse (unknown function\begin): \begin{tabular}{cccccccccccc} I_1^1 & \rightarrow & I_2^1 & & I_3^1 & \rightarrow & I_4^1 & & I_5^1 & \rightarrow & I_6^1 & \cdots \\ & \swarrow & & \nearrow & & \swarrow & & \nearrow & & \swarrow & & \\ I_1^2 & & I_2^2 & & I_3^2 & & I_4^2 & & I_5^2 & & I_6^2 & \cdots \\ \downarrow & \nearrow & & \swarrow & & \nearrow & & \swarrow & & \nearrow & & \\ I_1^3 & & I_2^3 & & I_3^3 & & I_4^3 & & I_5^3 & & I_6^3 & \cdots \\ & \swarrow & & \nearrow & & \swarrow & & \nearrow & & \swarrow & & \\ I_1^4 & & I_2^4 & & I_3^4 & & I_4^4 & & I_5^4 & & I_6^4 & \cdots \\ \downarrow & \nearrow & & \swarrow & & \nearrow & & \swarrow & & \nearrow & & \\ I_1^5 & & I_2^5 & & I_3^5 & & I_4^5 & & I_5^5 & & I_6^5 & \cdots \\ \vdots & & \vdots & & \vdots & & \vdots & & \vdots & & \vdots & \\ \end{tabular}

This is not quite the order of enumeration used by the authors but it will work just as well (you can see that there are many variations on this theme).

### Page 18, "We can rearrange the double sum because the components are non-negative (a fact from elementary analysis)"

This is indeed a fact from elementary analysis. We decided to comment on it just in case the reader has forgotten it. When is it the case that

$\sum_{j = 1}^{\infty} \sum_{k = 1}^{\infty} a_{jk} = \sum_{k = 1}^{\infty} \sum_{j = 1}^{\infty} a_{jk}$

i.e. when can we interchange the order of summation?

For finite sums, this is always possible:

$\sum_{j = 1}^m \sum_{k = 1}^n a_{jk} = \sum_{k = 1}^n \sum_{j = 1}^m a_{jk}$

This is a simple consequence of the associativity and commutativity of addition.

Let's consider, for example the case when m = n = 3 (the proof generalises naturally to other values):

$\sum_{j = 1}^3 \sum_{k = 1}^3 a_{jk} = (a_{11} + a_{12} + a_{13}) + (a_{21} + a_{22} + a_{23}) + (a_{31} + a_{32} + a_{33})$

We can drop the brackets because of the associativity, which makes them irrelevant:

$\sum_{j = 1}^3 \sum_{k = 1}^3 a_{jk} = a_{11} + a_{12} + a_{13} + a_{21} + a_{22} + a_{23} + a_{31} + a_{32} + a_{33}$

And because of the commutativity we can rearrange the terms:

$\sum_{j = 1}^3 \sum_{k = 1}^3 a_{jk} = a_{11} + a_{21} + a_{31} + a_{12} + a_{22} + a_{32} + a_{13} + a_{23} + a_{33} = \sum_{k = 1}^3 \sum_{j = 1}^3 a_{jk}$

For infinite sums, we are dealing with two limit processes:

$\sum_{j = 1}^{\infty} \sum_{k = 1}^{\infty} a_{jk} = \sum_{j = 1}^{\infty} \left\{ \lim_{n \rightarrow \infty} \sum_{k = 1}^n a_{jk} \right\} = \lim_{m \rightarrow \infty} \sum_{j = 1}^m \left\{ \lim_{n \rightarrow \infty} \sum_{k = 1}^n a_{jk} \right\}$

It turns out that we can interchange the order of summation if

• The terms are all nonnegative (the sum could be infinite, if so, it is infinite both ways);
• The series is absolutely convergent, i.e. the double sum of absolute values is finite.

## Financial Calculus: An Introduction to Derivative Pricing by Martin Baxter and Andrew Rennie

### Page vii, "as Dr Johnson might have put it"

The reference here is to the famous quote: "Your manuscript is both good and original, but the part that is good is not original and the part that is original is not good". According to Wikiquote this quote is misattributed as it has not been found in any of Johnson's writings or in the writings of contemporaries who quoted him.

### Page 7, "Integration and the law of the unconscious statistician then tells us..."

We are asked to compute $\mathbb{E}(S_0 \exp(X)) = S_0 \mathbb{E}(\exp(X))$. But $\mathbb{E}(\exp(X))$ is simply the expectation of a lognormal random variable Y = exp(X) and we know what it is (or we can look it up): $E(Y) = \exp(\mu + \frac{1}{2}\sigma^2)$.

You may argue that looking up $E(Y) = \exp(\mu + \frac{1}{2}\sigma^2)$ is cheating. Perhaps.

Let us apply the law of the unconscious statistician with h(X) = S0exp(X):

$\mathbb{E}(h(X)) = \int_{-\infty}^{\infty} h(x) f(x) dx$
$= \int_{-\infty}^{\infty} S_0 \exp(x) \cdot \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \exp( \frac{-(x - \mu)^2}{2\sigma^2} ) dx$
$= S_0 \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \exp( \frac{2\sigma^2 x -(x - \mu)^2}{2\sigma^2} ) dx$
$= S_0 \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \exp( \frac{2\sigma^2 x -x^2 + 2 \mu x - \mu^2}{2\sigma^2} ) dx$

Let us complete the square:

$= S_0 \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \exp( \frac{-x^2 + 2 (\mu + \frac{1}{2}) x - (\mu + \frac{1}{2})^2 + \mu + \frac{1}{4}}{2\sigma^2} ) dx$
$= S_0 \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \exp( \frac{-x^2 + 2 (\mu + \frac{1}{2}) x - (\mu + \frac{1}{2})^2}{2\sigma^2} ) \exp( \frac{\mu + \frac{1}{4}}{2\sigma^2} ) dx$
$= S_0 \exp( \frac{\mu + \frac{1}{4}}{2\sigma^2} ) \int_{-\infty}^{\infty} \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \exp( \frac{-(x - (\mu + \frac{1}{2}))^2}{2\sigma^2} ) dx$

Now we observe that our integral is in fact over the probability density function of $N(\mu + \frac{1}{2}, \sigma^2)$ and therefore it evaluates to 1. We are left with

$= S_0 \exp( \frac{\mu + \frac{1}{4}}{2\sigma^2} ) \cdot 1$
$= S_0 \exp(\mu + \frac{1}{2}\sigma^2)$

as required.

### Page 58, "because of the variance structure of Brownian motion"

This means the following. We have a key fact about the variance

$Var(X) = \mathbb{E}(X^2) - [\mathbb{E}(X)]^2$

We know that WtN(0,t), so

Var(Wt) = t
$\mathbb{E}(W_t) = 0$

and from the above fact it follows that $\mathbb{E}(W_t^2) = t$.

### Page 58, "What went wrong? Consider a Taylor expansion..."

First of all, can we recognise this as a Taylor expansion? Taylor expansion is given in the Thalesian Formula Sheet as

$f(a + h) = f(a) + h f'(a) + \frac{h^2}{2!} f''(a) + \frac{h^3}{3!} f'''(a) + \ldots$

And on page 58 we are given

$df(W_t) = f'(W_t) dW_t + \frac{1}{2} f''(W_t) (dW_t)^2 + \frac{1}{3!} f'''(W_t) (dW_t)^3 + \ldots$

Is the second equation a special case of the first?

Well, perhaps there is some notational confusion. First of all, what is df(Wt), really?

Let's rewrite our Taylor expansion with a = Wt and h = dWt:

$f(W_t + dW_t) = f(W_t) + dW_t f'(W_t) + \frac{(dW_t)^2}{2!} f''(W_t) + \frac{(dW_t)^3}{3!} f'''(W_t) + \ldots$

Now let's move f(Wt) to the left-hand side:

$f(W_t + dW_t) - f(W_t) = dW_t f'(W_t) + \frac{(dW_t)^2}{2!} f''(W_t) + \frac{(dW_t)^3}{3!} f'''(W_t) + \ldots$

Aha, now it's clear that in the book's notation

df(Wt) = f(Wt + dWt) − f(Wt)

so we do, in fact, have a Taylor expansion here:

$df(W_t) = dW_t f'(W_t) + \frac{(dW_t)^2}{2!} f''(W_t) + \frac{(dW_t)^3}{3!} f'''(W_t) + \ldots$

The key here is the behaviour of the powers of the increments. If instead of Wt we had a nice Newtonian xt we would have

$df(x_t) = dx_t f'(x_t) + \frac{(dx_t)^2}{2!} f''(x_t) + \frac{(dx_t)^3}{3!} f'''(x_t) + \ldots$

Then

$(dx_t)^2 = (dx_t)^3 = \ldots = 0$

and we end up with the simple Newtonian chain rule:

df(xt) = f'(xt)dxt

But Brownian motion presents us with a surprise:

(dWt)2 = dt while $(dW_t)^3 = (dW_t)^4 = (dW_t)^5 = \ldots = 0$

so we get

$df(W_t) = f'(W_t) dW_t + \frac{1}{2} f''(W_t) dt$

Which is, in essence, Itô's formula.

### Page 59, "By the weak law of large numbers..."

We first encountered Kolmogorov's strong law of large numbers on page 4:

"Suppose we have a sequence of independent random numbers X1,X2,X3, and so on all sampled from the same distribution, which has mean (expectation) μ, and we let Sn be the arithmetical average of the sequence up to the nth term, that is $S_n = (X_1 + X_2 + \ldots + X_n) / n$. Then, with probability one, as n gets larger the value of Sn tends towards the mean μ of the distribution."

Now there's also a weak version of Kolmogorov's law. Let us explain the difference between them.

Both versions of the law state that the sample average

$S_n = (X_1 + X_2 + \ldots + X_n)$

converges to the (finite) expected value

$S_n \rightarrow \mu$ for $n \rightarrow \infty$

The difference between the strong and the weak version is concerned with the mode of convergence being asserted.

The weak law of large numbers states that the sample average converges in probability towards the expected value

$S_n \rightarrow^P \mu$ for $n \rightarrow \infty$

That is to say that for any positive number $\epsilon$

$\lim_{n \rightarrow \infty} P(|S_n - \mu| < \epsilon) = 1$

Interpreting this result, the weak law essentially states that for any nonzero margin specified, no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value, that is, within margin.

Convergence in probability is also called weak convergence of random variables. This version is called the weak law because random variables may converge weakly (in probability) as above without converging strongly (almost surely) as below.

The strong law of large numbers states that the sample average converges almost surely to the expected value

$S_n \rightarrow^{\text{a.s.}} \mu$ for $n \rightarrow \infty$

That is,

$P(\lim_{n \rightarrow \infty} S_n = \mu) = 1$

(Note that we have a probability of the limit, whereas for the weak version we have a limit of the probability!)

The proof is more complex than that of the weak law. This law justifies the intuitive interpretation of the expected value of a random variable as the "long-term average when sampling repeatedly".

Almost sure convergence is also called strong convergence of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). The strong law implies the weak law.

### Page 62, The product rule

We shall explain in some detail how the Itô product rule

d(XtYt) = XtdYt + YtdXt + σtρtdt

for two stochastic processes, Xt and Yt adapted to the same Brownian motion Wt (of course, this is a special case, but an important one at that):

dXt = σtdWt + μtdt
dYt = ρtdWt + νtdt

We are given a "trick":

$\frac{1}{2} ((X_t + Y_t)^2 - X_t^2 - Y_t^2) = X_t Y_t$

So

$d(X_t Y_t) = \frac{1}{2} d(X_t + Y_t)^2 - \frac{1}{2} dX_t^2 - \frac{1}{2} dY_t^2$

We shall define f(z) = z2 for all real z, so f'(z) = 2z and f''(z) = 2.

We shall now apply Itô's formula three times. We shall first substitute Xt for z:

$dX_t^2$
= df(Xt)
$= (\sigma_t f'(X_t)) dW_t + (\mu_t f'(X_t) + \frac{1}{2} \sigma_t^2 f''(X_t)) dt$
$= 2 \sigma_t X_t dW_t + (2 \mu_t X_t + \sigma_t^2) dt$

Next, we substitute Yt for z, remembering that now our mean is νt and variance is $\rho_t^2$:

$dY_t^2$
= df(Yt)
$= (\rho_t f'(Y_t)) dW_t + (\nu_t f'(Y_t) + \frac{1}{2} \rho_t^2 f''(Y_t)) dt$
$= 2 \rho_t Y_t dW_t + (2 \nu_t Y_t + \rho_t^2) dt$

Finally, we substitute Xt + Yt for z. And as

Xt + Yt = (σt + ρt)dWt + (μt + νt)dt

our mean is μt + νt and variance is t + ρt)2:

d((Xt + Yt)2)
= df(Xt + Yt)
$= ((\sigma_t + \rho_t) f'(X_t + Y_t)) dW_t + ((\mu_t + \nu_t) f'(X_t + Y_t) + \frac{1}{2} (\sigma_t + \rho_t)^2 f''(X_t + Y_t)) dt$
= 2(σt + ρt)(Xt + Yt)dWt + (2(μt + νt)(Xt + Yt) + (σt + ρt)2)dt
$= 2 \sigma_t X_t dW_t + 2 \sigma_t Y_t dW_t + 2 \rho_t X_t dW_t + 2 \rho_t Y_t dW_t + 2 \mu_t X_t dt + 2 \mu_t Y_t dt + 2 \nu_t X_t dt + 2 \nu_t Y_t dt + \sigma_t^2 dt + 2 \sigma_t \rho_t dt + \rho_t^2 dt$

Now we substitute the three results into

$d(X_t Y_t) = \frac{1}{2} d(X_t + Y_t)^2 - \frac{1}{2} dX_t^2 - \frac{1}{2} dY_t^2$

and collect the terms:

d(XtYt) = σtYtdWt + ρtXtdWt + μtYtdt + νtXtdt + σtρtdt

But

XtdYt + YtdXt = σtYtdWt + ρtXtdWt + μtYtdt + νtXtdt

so we have

d(XtYt) = XtdYt + YtdXt + σtρtdt

as required.

### Page 62, "The final term above is actually..."

dXtdYt = σtρtdt. This is what we are told, but why? Here is an "explanation".

dXtdYt = (σtdWt + μtdt)(ρtdWt + νtdt)
= σtρt(dWt)2 + μtρtdWtdt + σtνtdWtdt + μtνt(dt)2

Now let's use the fact (discovered on page 59) that (dWt)2 = dt.

= σtρtdt + μtρt(dWt)2 + σtνt(dWt)3 + μtνt(dWt)4.

On page 59 we are also told that, mysteriously, $(dW_t)^3 , (dW_t)^4, \ldots$ are zero. So dXtdYt = σtρtdt.

### Page 72, Identifying normals

We are told that a random variable X is a normal N(μ,σ2) under a measure $\mathbb{P}$ if and only if

$\mathbb{E} (\exp(\theta X)) = \exp(\theta \mu + \frac{1}{2} \theta^2 \sigma^2)$

for all real θ.

This is the first time we encounter moment generating functions in this book.

These functions are important, so it may be worth revising them at this point.

Recall that a moment generating function is defined as (we shall imply the measure, $\mathbb{P}$, for the time being)

$M(\theta) = \mathbb{E} (\exp(\theta X))$
 = ∑ exp(θx)p(x) x

if X is discrete with mass function p(x)

$= \int_{-\infty}^{\infty} \exp(\theta x) f(x) dx$ if X is continuous with density f(x).

Let us explain why these functions are called moment generating. Recall that $\mathbb{E} (X)$ is also called the first moment of the probability distribution of X, $\mathbb{E} (X^2)$ is called the second moment, $\mathbb{E} (X^3)$ is called the third moment and so on.

$M'(\theta) = \frac{d}{d\theta} \mathbb{E} (\exp(\theta X))$
$= \mathbb{E} (\frac{d}{d\theta} \exp(\theta X))$
$= \mathbb{E} (X \exp(\theta X))$

(Here we have assumed that the interchange of the differentiation and expectation operators is legitimate. Indeed, this is "almost always" the case.)

Thus

$M'(0) = \mathbb{E} (X)$,

which is the first moment of the distribution of X.

Similarly

$M''(\theta) = \frac{d}{d\theta} M' (\theta))$
$= \frac{d}{d\theta} \mathbb{E} (X \exp(\theta X))$
$= \mathbb{E} (\frac{d}{d\theta} X \exp(\theta X))$
$= \mathbb{E} (X^2 \exp(\theta X))$

So

$M''(0) = \mathbb{E} (X^2)$,

which is the second moment.

In general,

$M^{(n)}(\theta) = \mathbb{E} (X^n \exp(\theta X))$

and

$M^{(n)}(0) = \mathbb{E} (X^n)$,

which means that the moment generating function enables us to back out each of the moments of the distribution of X by taking the appropriate derivative with respect to θ and computing it at zero. Hence the name, "moment generating function".

One of the most important facts about the moment generating functions is that a finite moment generating function identifies the probability distribution uniquely. (The proof of this result is quite technical, so we won't give it here.) For example, if we know that

$\mathbb{E} (\exp(\theta X)) = \exp(\theta \mu + \frac{1}{2} \theta^2 \sigma^2)$

which we recognise as the moment generating function of N(μ,σ2), then we know that the distribution of X is indeed N(μ,σ2). The authors rely on this result in the discussion that ensues.

### Page 72, "This equals..."

It may not be immediately apparent how the result

$\mathbb{E}_{\mathbb{P}} (\exp(-\gamma W_T - \frac{1}{2} \gamma^2 T + \theta W_T)) = \exp(-\frac{1}{2} \gamma^2 T + \frac{1}{2} (\theta - \gamma)^2 \sigma^2)$

is derived.

In fact, it is derived using the "Identifying normals" formula (actually it's a moment generating function), which appears on the same page and is later used again, in reverse. Let's go through the derivation step by step:

$\mathbb{E}_{\mathbb{P}} (\exp(-\gamma W_T - \frac{1}{2} \gamma^2 T + \theta W_T))$
$= \mathbb{E}_{\mathbb{P}} (\exp(-\frac{1}{2} \gamma^2 T + (\theta - \gamma) W_T)$
$= \mathbb{E}_{\mathbb{P}} (\exp(-\frac{1}{2} \gamma^2 T) \exp((\theta - \gamma) W_T))$
$= \exp(-\frac{1}{2} \gamma^2 T) \mathbb{E}_{\mathbb{P}} (\exp((\theta - \gamma) W_T))$

since $\exp(-\frac{1}{2} \gamma^2 T)$ is a constant,

$= \exp(-\frac{1}{2} \gamma^2 T) \exp((\theta - \gamma) \mu + \frac{1}{2} (\theta - \gamma)^2 \sigma^2)$

by applying the "Identifying normals" formula but replacing θ in that formula with θ − γ.

But WTN(0,t), so μ = 0,σ2 = t and

$\exp(-\frac{1}{2} \gamma^2 T + \frac{1}{2} (\theta - \gamma)^2 \sigma^2)$

as required.

### Page 78, "Martingale representation theorem"

$N_t = N_0 + \int_0^t \phi_s dM_s$

becomes

dNt = ϕtdMt

in differential notation. It is important to understand that the process ϕt is "simply the ratio of volatilities" of Mt and Nt.

### Page 79, Exercise 3.11

"σt is a function of both time and sample path" means that σt is really σ(t,ω). When we say that it is a bounded function, we mean that there is a constant K such that for all $t \leq T$ and for all $\omega \in \Omega$ we have $|\sigma(t, \omega)| \leq K$.

### Page 85, "But of course Itô makes it possible to write down the SDE..."

Let us justify

$dS_t = \sigma S_t dW_t + (\mu + \frac{1}{2} \sigma^2) S_t dt$

The notation becomes a little bit clumsy because we have reserved X for the value of the claim. In Ito's formula (page 59) we use Xt for the original process and Yt = f(Xt) for the transformed process.

On page 85 our original process is Yt = log(St) = σWt + μt is our original process and St = f(Yt) = exp(Yt) is our transformed process. Now f(Yt) = f'(Yt) = f''(Yt) = exp(Yt) = St, which is very nice indeed.

Time to apply Ito's formula:

$dS_t = (\sigma_t f'(Y_t)) dW_t + (\mu_t f'(Y_t) + \frac{1}{2} \sigma_t^2 f''(Y_t)) dt$

Here σt = σ is our volatility for Yt, which is constant in the Black-Scholes model (so the subscript disappears). Similarly, μt = μ is our drift for Yt, which is also constant in the Black-Scholes model (so the subscript disappears).

$dS_t = (\sigma S_t) dW_t + (\mu S_t + \frac{1}{2} \sigma^2 S_t) dt$

### Page 85, "the first thing to do is to kill the drift"

We need to "kill the drift" for dSt. That's the coefficient of dt in

$dS_t = \sigma S_t dW_t + (\mu + \frac{1}{2} \sigma^2) S_t dt$

In other words, we need to kill $(\mu + \frac{1}{2} \sigma^2) S_t$. We can apply the Cameron-Martin-Girsanov (C-M-G) theorem to alter the drift of Wt — by changing the measure. Note that we are playing with Wt, not with St itself.

What sort of drift do we need?

Well, (iii) in C-M-G can be written as

$d\tilde{W}_t = dW_t + \gamma_t$

Thus C-M-G will replace our dWt with $d\tilde{W}_t - \gamma_t$ (and change the measure from $\mathbb{P}$ to $\mathbb{Q}$).

So we'll have

$dS_t = \sigma S_t (d\tilde{W}_t - \gamma_t) + (\mu + \frac{1}{2} \sigma^2) S_t dt$
$= \sigma S_t d\tilde{W}_t - \sigma S_t \gamma_t + (\mu + \frac{1}{2} \sigma^2) S_t dt$

of course we want

$dS_t = \sigma S_t d\tilde{W}_t$

(no drift), so our γt should be set to

$\gamma_t = \frac{1}{\sigma} (\mu + \frac{1}{2} \sigma^2)$

just as the book suggests.

We just need one final justification. γt is constant as in our model μ and σ are both constant. Note that they don't have the t prefix. Therefore

$\mathbb{E}_{\mathbb{P}} (\frac{1}{2} \int_0^T \gamma_t^2 dt) = \frac{1}{2 \sigma^2} (\mu + \frac{1}{2} \sigma^2)^2 T < \infty$

Thus the boundedness condition of C-M-G is satisfied and we can apply this theorem. In other words, we have our

$dS_t = \sigma S_t d\tilde{W}_t$

under the new measure $\mathbb{Q}$.

Note that we didn't have to write down $d\mathbb{Q}/d\mathbb{P}$. Great!

### Page 85, "The exponential martingales box (section 3.5) contains a condition..."

Let's check this condition. In our case we have $dS_t = \sigma S_t d\tilde{W}_t$, so we need

$\mathbb{E}(\exp(\frac{1}{2} \int_0^T \sigma^2 ds)) < \infty$

But σ is merely a constant, so

$\mathbb{E}(\exp(\frac{1}{2} \int_0^T \sigma^2 ds))$
$= \mathbb{E}(\exp(\frac{1}{2} \sigma^2 T))$
$= \exp(\frac{1}{2} \sigma^2 T)$
$< \infty$

as required. The condition is satisfied and St is a $\mathbb{Q}$-martingale. Consequently, $\mathbb{Q}$ is the martingale measure for St.

### Page 100, "Consider the following static replicating strategy..."

This example of static replication is quite simple. Because of the ambiguous notation it is also somewhat confusing.

We are told that at time t we have some units of sterling cash bonds and some units of dollar cash bonds. However, t does not feature, it's T that appears in the formulae giving the units. Does this mean that the value of our portfolio is constant? No.

Here is what's happening. We want to have 1 unit of sterling cash bonds at time T. This means that we had to purchase expuT units of sterling cash bonds at time 0.

We could only purchase them for units of dollar cash bonds, which we would have to borrow, which would cost us. So at time 0 we have a negative position in dollar cash bonds, C0expuT.

At time t our position in sterling cash bonds is expu(Tt) and in dollar cash bonds it is C0expuTexprt.

At time T, this gives us expu(TT) = 1 unit of sterling cash bonds (by construction) and C0expuTexprT = C0exp(ru)T units of dollar cash bonds, the quantity that is defined as the forward price. This is effectively the exchange rate that we have manufactured using our static replication technique.

## The Concepts and Practice of Mathematical Finance by Mark Joshi

### Page 14, Exercise 1.2

In this Exercise, the die is rolled once (not once for each of the six assets). The die is rolled, the number on the die is recorded, then each asset's index is checked to see whether it matches this number and the asset pays off accordingly. In this way, exactly one of the six assets will pay off 1, the rest will pay off 0. Thus, for the sum of the assets, the payoff is deterministically 1.

The point of this exercise is to give an example of diversifiable risk.

The resource InvestorWords gives the following definition of diversifiable risk: "The risk of price change due to the unique circumstances of a specific security, as opposed to the overall market. The risk can be virtually eliminated from a portfolio through diversification. Also called unsystematic risk."

In this exercise the risk was eliminated completely.

### Page 94, Equation 5.3

I would like to thank Dr. Paul Davis for clarifying this to me.

In Section 5.3 (Stochastic Processes), the following equation appears on page 94:

$X_{t+h} - X_t = h \mu(t) + \int_{t}^{t+h} e(r) dr + h^{1/2} \sigma(t) N(0, 1) + g(t, h) N(0, 1)$

Unfortunately it isn't immediately apparent from the explanation how this equation is derived (at least it wasn't immediately apparent to me, which certainly doesn't mean much).

Here

$g(t, h) = (h \sigma(t)^2 + \int_{t}^{t+h} f(r) dr)^{1/2} - h^{1/2} \sigma(t)$

What is happening? Nothing extraordinary. The algebra is trivial once the logic behind the derivation is understood. I shall derive (5.3) carefully (perhaps too carefully!), step by step.

We have written

μ(r) = μ(t) + e(r)

and

σ2(r) = σ2(t) + f(r)

Thus we have expressed the mean (variance) at r in terms of the mean (variance) at t plus the error term.

In the first equation on page 94 we were assuming that σ is identically zero:

$X_t - X_s = \int_s^t \mu(r) dr$

We are no longer assuming that. Let us reinstate our stochastic component, as in page 90 (remember μ + σN(0,1)):

$X_{t+h} - X_t = \int_t^{t+h} \mu(r) dr + \sqrt{\int_t^{t+h} \sigma^2(r) dr} \cdot N(0, 1)$

Now let us substitute our expressions for μ(r) and σ2(r):

$X_{t+h} - X_t = \int_t^{t+h} (\mu(t) + e(r)) dr + \sqrt{\int_t^{t+h} (\sigma^2(t) + f(r)) dr} \cdot N(0, 1)$

Hence

$X_{t+h} - X_t = \mu(t) \int_t^{t+h} dr + \int_t^{t+h} e(r) dr + \sqrt{\sigma^2(t) \int_t^{t+h} dr + \int_t^{t+h} f(r) dr} \cdot N(0, 1)$
$X_{t+h} - X_t = h \mu(t) + \int_t^{t+h} e(r) dr + \sqrt{h \sigma^2(t) + \int_t^{t+h} f(r) dr} \cdot N(0, 1)$
$X_{t+h} - X_t = h \mu(t) + \int_t^{t+h} e(r) dr + \sqrt{h \sigma^2(t) (1 + h^{-1} \sigma^{-2}(t) \int_t^{t+h} f(r) dr)} \cdot N(0, 1)$
$X_{t+h} - X_t = h \mu(t) + \int_t^{t+h} e(r) dr + h^{1/2} \sigma(t) \sqrt{1 + h^{-1} \sigma^{-2}(t) \int_t^{t+h} f(r) dr} \cdot N(0, 1)$
$X_{t+h} - X_t = h \mu(t) + \int_t^{t+h} e(r) dr + h^{1/2} \sigma(t) \left( 1 + \sqrt{1 + h^{-1} \sigma^{-2}(t) \int_t^{t+h} f(r) dr} - 1 \right) \cdot N(0, 1)$
$X_{t+h} - X_t = h \mu(t) + \int_t^{t+h} e(r) dr + h^{1/2} \sigma(t) N(0, 1) + \left[ h^{1/2} \sigma(t) \left( \left( 1 + h^{-1} \sigma^{-2}(t) \int_t^{t+h} f(r) dr \right)^{\frac{1}{2}} - 1 \right) \right] \cdot N(0, 1)$

The factor in square brackets is, of course, g(t,h), and we are home.