# Knowledge Base/Finance/Commentary on Mathematical Finance Textbooks

### From Thalesians

*Whenever you're called on to make up your mind,*

* and you're hampered by not having any,*

*the best way to solve the dilemma, you'll find,*

* is simply by spinning a penny.*

*No — not so that chance shall decide the affair*

* while you're passively standing there moping;*

*but the moment the penny is up in the air,*

* you suddenly know what you're hoping.*

A grook by Piet Hein (1905-1996)

## *Measure, Integral and Probability* (second edition) by Marek Capiński and Ekkehard Kopp

### Page 8, Example 1.3

We are "using the fact that *f* is increasing" in that we notice that

- in
*U*(*P*_{n},*f*) - in
*L*(*P*_{n},*f*)

The algebra is easy enough to check:

Similarly:

The book claims that "the integral must be , since both *U*(*P*_{n},*f*) and *L*(*P*_{n},*f*) converge to this value, as is easily seen".

This is easily seen if and only if one remembers the following elementary results:

which is obvious as we are adding *n* 1's;

which is the sum of an arithmetic series;

which is the *i*th square pyramidal number. These results are certainly worth remembering!

So

- — now multiply both sides by
- as

because as .

Similarly,

**Failed to parse (syntax error): = \frac{2n^3 + 3n^2 + n}{3n^3} - \frac{3n^2 + 3n}{2n^3} } + \frac{1}{n^2}**

— now multiply both sides by

- as

because as .

### Page 9, "(which is of course automatically bounded)"

Continuous functions on the closed interval [*a*,*b*] are bounded. Despite the "of course", and although the result is somewhat intuitive, it is not obvious.

There are several ways to prove this result.

We could start with the **Heine-Borel theorem**: whichever way you write [*a*,*b*] as a union of a collection of open sets, you can also write it as a union of a finite subcollection of that collection of open sets. Consider the sets for each . The sets *U*_{x} are open (because *f* is continuous) and their union is all of [*a*,*b*]. By the Heine-Borel theorem, there are such that the union of the sets is also all of [*a*,*b*]. But then *f* is bounded by 1 + max *f*(*x*_{i}).

Another proof starts with the **Bolzano-Weierstrass theorem**: every sequence in [*a*,*b*] (in fact, every bounded sequence) has a convergent subsequence. Assume for a contradiction that a continuous function *f* from [*a*,*b*] to is unbounded. Then there is a bounded sequence (*x*_{n}) such that *f*(*x*_{n}) > *n* for every *n*. Pick a convergent subsequence from this sequence. Say it converges to some . We can then check that *f* cannot be continuous at *x*. Contradiction.

How different are these proofs? Both the Heine-Borel theorem and Bolzano-Weierstrass theorem state, in somewhat different ways, that [*a*,*b*] is compact. (Indeed, a metrisable space is compact if and only if it is sequentially compact, which makes the two theorems essentially the same for .)

In http://www.dpmms.cam.ac.uk/~wtg10/bounded.html, Timothy Gowers explains "How to find an unusual proof that continuous functions on the closed interval [0,1] are bounded" which is more elementary than the two proofs above. It is worth reading.

### Page 18, "Step 2. The intervals are formed into a sequence"

We use the ancient trick, the "diagonal argument", which is commenly used to show that and some other sets are countable (this will be mentioned on page 16). We can enumerate the **Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): I_k_n **
. The trick here is the order. It's easier to visualise it if we produce the following diagram:

**Failed to parse (unknown function\begin): \begin{tabular}{cccccccccccc} I_1^1 & \rightarrow & I_2^1 & & I_3^1 & \rightarrow & I_4^1 & & I_5^1 & \rightarrow & I_6^1 & \cdots \\ & \swarrow & & \nearrow & & \swarrow & & \nearrow & & \swarrow & & \\ I_1^2 & & I_2^2 & & I_3^2 & & I_4^2 & & I_5^2 & & I_6^2 & \cdots \\ \downarrow & \nearrow & & \swarrow & & \nearrow & & \swarrow & & \nearrow & & \\ I_1^3 & & I_2^3 & & I_3^3 & & I_4^3 & & I_5^3 & & I_6^3 & \cdots \\ & \swarrow & & \nearrow & & \swarrow & & \nearrow & & \swarrow & & \\ I_1^4 & & I_2^4 & & I_3^4 & & I_4^4 & & I_5^4 & & I_6^4 & \cdots \\ \downarrow & \nearrow & & \swarrow & & \nearrow & & \swarrow & & \nearrow & & \\ I_1^5 & & I_2^5 & & I_3^5 & & I_4^5 & & I_5^5 & & I_6^5 & \cdots \\ \vdots & & \vdots & & \vdots & & \vdots & & \vdots & & \vdots & \\ \end{tabular} **

This is not quite the order of enumeration used by the authors but it will work just as well (you can see that there are many variations on this theme).

### Page 18, "We can rearrange the double sum because the components are non-negative (a fact from elementary analysis)"

This is indeed a fact from elementary analysis. We decided to comment on it just in case the reader has forgotten it. When is it the case that

i.e. when can we interchange the order of summation?

For finite sums, this is always possible:

This is a simple consequence of the associativity and commutativity of addition.

Let's consider, for example the case when *m* = *n* = 3 (the proof generalises naturally to other values):

We can drop the brackets because of the associativity, which makes them irrelevant:

And because of the commutativity we can rearrange the terms:

For infinite sums, we are dealing with two limit processes:

It turns out that we can interchange the order of summation if

- The terms are all nonnegative (the sum could be infinite, if so, it is infinite both ways);
- The series is absolutely convergent, i.e. the double sum of absolute values is finite.

For more information, see http://www.artofproblemsolving.com/Forum/viewtopic.php?t=109133 and http://www.math.ubc.ca/~feldman/m321/twosumSlides.pdf

## *Financial Calculus: An Introduction to Derivative Pricing* by Martin Baxter and Andrew Rennie

### Page vii, "as Dr Johnson might have put it"

The reference here is to the famous quote: "Your manuscript is both good and original, but the part that is good is not original and the part that is original is not good". According to Wikiquote this quote is misattributed as it has not been found in any of Johnson's writings or in the writings of contemporaries who quoted him.

### Page 7, "Integration and the law of the unconscious statistician then tells us..."

We are asked to compute . But is simply the expectation of a lognormal random variable *Y* = exp(*X*) and we know what it is (or we can look it up): .

You may argue that looking up is cheating. Perhaps.

Let us apply the law of the unconscious statistician with *h*(*X*) = *S*_{0}exp(*X*):

Let us complete the square:

Now we observe that our integral is in fact over the probability density function of and therefore it evaluates to 1. We are left with

as required.

### Page 58, "because of the variance structure of Brownian motion"

This means the following. We have a key fact about the variance

We know that *W*_{t}∼*N*(0,*t*), so

*V**a**r*(*W*_{t}) =*t*

and from the above fact it follows that .

### Page 58, "What went wrong? Consider a Taylor expansion..."

First of all, can we recognise this as a Taylor expansion? Taylor expansion is given in the Thalesian Formula Sheet as

And on page 58 we are given

Is the second equation a special case of the first?

Well, perhaps there is some notational confusion. First of all, what is *d**f*(*W*_{t}), really?

Let's rewrite our Taylor expansion with *a* = *W*_{t} and *h* = *d**W*_{t}:

Now let's move *f*(*W*_{t}) to the left-hand side:

Aha, now it's clear that in the book's notation

*d**f*(*W*_{t}) =*f*(*W*_{t}+*d**W*_{t}) −*f*(*W*_{t})

so we do, in fact, have a Taylor expansion here:

The key here is the behaviour of the powers of the increments. If instead of *W*_{t} we had a nice Newtonian *x*_{t} we would have

Then

and we end up with the simple Newtonian chain rule:

*d**f*(*x*_{t}) =*f*'(*x*_{t})*d**x*_{t}

But Brownian motion presents us with a surprise:

- (
*d**W*_{t})^{2}=*d**t*while

so we get

Which is, in essence, Itô's formula.

### Page 59, "By the weak law of large numbers..."

We first encountered Kolmogorov's strong law of large numbers on page 4:

"Suppose we have a sequence of independent random numbers *X*_{1},*X*_{2},*X*_{3}, and so on all sampled from the same distribution, which has mean (expectation) μ, and we let *S*_{n} be the arithmetical average of the sequence up to the *n*th term, that is . Then, with probability one, as *n* gets larger the value of *S*_{n} tends towards the mean μ of the distribution."

Now there's also a weak version of Kolmogorov's law. Let us explain the difference between them.

Both versions of the law state that the sample average

converges to the (finite) expected value

- for

The difference between the strong and the weak version is concerned with the mode of convergence being asserted.

The weak law of large numbers states that the sample average **converges in probability** towards the expected value

- for

That is to say that for any positive number

Interpreting this result, the weak law essentially states that for any nonzero margin specified, no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value, that is, within margin.

**Convergence in probability** is also called **weak convergence** of random variables. This version is called the weak law because random variables may converge weakly (in probability) as above without converging strongly (almost surely) as below.

The strong law of large numbers states that the sample average converges **almost surely** to the expected value

- for

That is,

(Note that we have a probability of the limit, whereas for the weak version we have a limit of the probability!)

The proof is more complex than that of the weak law. This law justifies the intuitive interpretation of the expected value of a random variable as the "long-term average when sampling repeatedly".

**Almost sure convergence** is also called **strong convergence** of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). The strong law implies the weak law.

### Page 62, The product rule

We shall explain in some detail how the Itô product rule

*d*(*X*_{t}*Y*_{t}) =*X*_{t}*d**Y*_{t}+*Y*_{t}*d**X*_{t}+ σ_{t}ρ_{t}*d**t*

for two stochastic processes, *X*_{t} and *Y*_{t} adapted to the same Brownian motion *W*_{t} (of course, this is a special case, but an important one at that):

*d**X*_{t}= σ_{t}*d**W*_{t}+ μ_{t}*d**t**d**Y*_{t}= ρ_{t}*d**W*_{t}+ ν_{t}*d**t*

We are given a "trick":

So

We shall define *f*(*z*) = *z*^{2} for all real *z*, so *f*'(*z*) = 2*z* and *f*''(*z*) = 2.

We shall now apply Itô's formula three times. We shall first substitute *X*_{t} for *z*:

- =
*d**f*(*X*_{t})

Next, we substitute *Y*_{t} for *z*, remembering that now our mean is ν_{t} and variance is :

- =
*d**f*(*Y*_{t})

Finally, we substitute *X*_{t} + *Y*_{t} for *z*. And as

*X*_{t}+*Y*_{t}= (σ_{t}+ ρ_{t})*d**W*_{t}+ (μ_{t}+ ν_{t})*d**t*

our mean is μ_{t} + ν_{t} and variance is (σ_{t} + ρ_{t})^{2}:

*d*((*X*_{t}+*Y*_{t})^{2})- =
*d**f*(*X*_{t}+*Y*_{t}) - = 2(σ
_{t}+ ρ_{t})(*X*_{t}+*Y*_{t})*d**W*_{t}+ (2(μ_{t}+ ν_{t})(*X*_{t}+*Y*_{t}) + (σ_{t}+ ρ_{t})^{2})*d**t*

Now we substitute the three results into

and collect the terms:

*d*(*X*_{t}*Y*_{t}) = σ_{t}*Y*_{t}*d**W*_{t}+ ρ_{t}*X*_{t}*d**W*_{t}+ μ_{t}*Y*_{t}*d**t*+ ν_{t}*X*_{t}*d**t*+ σ_{t}ρ_{t}*d**t*

But

*X*_{t}*d**Y*_{t}+*Y*_{t}*d**X*_{t}= σ_{t}*Y*_{t}*d**W*_{t}+ ρ_{t}*X*_{t}*d**W*_{t}+ μ_{t}*Y*_{t}*d**t*+ ν_{t}*X*_{t}*d**t*

so we have

*d*(*X*_{t}*Y*_{t}) =*X*_{t}*d**Y*_{t}+*Y*_{t}*d**X*_{t}+ σ_{t}ρ_{t}*d**t*

as required.

### Page 62, "The final term above is actually..."

*d**X*_{t}*d**Y*_{t} = σ_{t}ρ_{t}*d**t*. This is what we are told, but why? Here is an "explanation".

*d**X*_{t}*d**Y*_{t}= (σ_{t}*d**W*_{t}+ μ_{t}*d**t*)(ρ_{t}*d**W*_{t}+ ν_{t}*d**t*)- = σ
_{t}ρ_{t}(*d**W*_{t})^{2}+ μ_{t}ρ_{t}*d**W*_{t}*d**t*+ σ_{t}ν_{t}*d**W*_{t}*d**t*+ μ_{t}ν_{t}(*d**t*)^{2}

Now let's use the fact (discovered on page 59) that (*d**W*_{t})^{2} = *d**t*.

- = σ
_{t}ρ_{t}*d**t*+ μ_{t}ρ_{t}(*d**W*_{t})^{2}+ σ_{t}ν_{t}(*d**W*_{t})^{3}+ μ_{t}ν_{t}(*d**W*_{t})^{4}.

On page 59 we are also told that, mysteriously, are zero. So *d**X*_{t}*d**Y*_{t} = σ_{t}ρ_{t}*d**t*.

### Page 72, Identifying normals

We are told that a random variable *X* is a normal *N*(μ,σ^{2}) under a measure if and only if

for all real θ.

This is the first time we encounter **moment generating functions** in this book.

These functions are important, so it may be worth revising them at this point.

Recall that a moment generating function is defined as (we shall imply the measure, , for the time being)

= | ∑ | exp(θx)p(x) |

x |

if *X* is discrete with mass function *p*(*x*)

- if
*X*is continuous with density*f*(*x*).

Let us explain why these functions are called moment generating. Recall that is also called the first moment of the probability distribution of *X*, is called the second moment, is called the third moment and so on.

(Here we have assumed that the interchange of the differentiation and expectation operators is legitimate. Indeed, this is "almost always" the case.)

Thus

- ,

which is the first moment of the distribution of *X*.

Similarly

So

- ,

which is the second moment.

In general,

and

- ,

which means that the moment generating function enables us to back out each of the moments of the distribution of *X* by taking the appropriate derivative with respect to θ and computing it at zero. Hence the name, "moment generating function".

One of the most important facts about the moment generating functions is that a finite moment generating function identifies the probability distribution uniquely. (The proof of this result is quite technical, so we won't give it here.) For example, if we know that

which we recognise as the moment generating function of *N*(μ,σ^{2}), then we know that the distribution of *X* is indeed *N*(μ,σ^{2}). The authors rely on this result in the discussion that ensues.

### Page 72, "This equals..."

It may not be immediately apparent how the result

is derived.

In fact, it is derived using the "Identifying normals" formula (actually it's a moment generating function), which appears on the same page and is later used again, in reverse. Let's go through the derivation step by step:

since is a constant,

by applying the "Identifying normals" formula but replacing θ in that formula with θ − γ.

But *W*_{T}∼*N*(0,*t*), so μ = 0,σ^{2} = *t* and

as required.

### Page 78, "Martingale representation theorem"

becomes

*d**N*_{t}= ϕ_{t}*d**M*_{t}

in differential notation. It is important to understand that the process ϕ_{t} is "simply the ratio of volatilities" of *M*_{t} and *N*_{t}.

### Page 79, Exercise 3.11

"σ_{t} is a function of both time and sample path" means that σ_{t} is really σ(*t*,ω). When we say that it is a *bounded* function, we mean that there is a constant *K* such that for all and for all we have .

### Page 85, "But of course Itô makes it possible to write down the SDE..."

Let us justify

The notation becomes a little bit clumsy because we have reserved *X* for the value of the claim. In Ito's formula (page 59) we use *X*_{t} for the original process and *Y*_{t} = *f*(*X*_{t}) for the transformed process.

On page 85 our original process is *Y*_{t} = log(*S*_{t}) = σ*W*_{t} + μ*t* is our original process and *S*_{t} = *f*(*Y*_{t}) = exp(*Y*_{t}) is our transformed process. Now *f*(*Y*_{t}) = *f*'(*Y*_{t}) = *f*''(*Y*_{t}) = exp(*Y*_{t}) = *S*_{t}, which is very nice indeed.

Time to apply Ito's formula:

Here σ_{t} = σ is our volatility for *Y*_{t}, which is constant in the Black-Scholes model (so the subscript disappears). Similarly, μ_{t} = μ is our drift for *Y*_{t}, which is also constant in the Black-Scholes model (so the subscript disappears).

### Page 85, "the first thing to do is to kill the drift"

We need to "kill the drift" for *d**S*_{t}. That's the coefficient of *d**t* in

In other words, we need to kill . We can apply the Cameron-Martin-Girsanov (C-M-G) theorem to alter the drift of *W*_{t} — by changing the measure. Note that we are playing with *W*_{t}, not with *S*_{t} itself.

What sort of drift do we need?

Well, (iii) in C-M-G can be written as

Thus C-M-G will replace our *d**W*_{t} with (and change the measure from to ).

So we'll have

of course we want

(no drift), so our γ_{t} should be set to

just as the book suggests.

We just need one final justification. γ_{t} is constant as in our model μ and σ are both constant. Note that they don't have the *t* prefix. Therefore

Thus the boundedness condition of C-M-G is satisfied and we can apply this theorem. In other words, we have our

under the new measure .

Note that we didn't have to write down . Great!

### Page 85, "The exponential martingales box (section 3.5) contains a condition..."

Let's check this condition. In our case we have , so we need

But σ is merely a constant, so

as required. The condition is satisfied and *S*_{t} is a -martingale. Consequently, is the martingale measure for *S*_{t}.

### Page 100, "Consider the following static replicating strategy..."

This example of static replication is quite simple. Because of the ambiguous notation it is also somewhat confusing.

We are told that at time *t* we have some units of sterling cash bonds and some units of dollar cash bonds. However, *t* does not feature, it's *T* that appears in the formulae giving the units. Does this mean that the value of our portfolio is constant? No.

Here is what's happening. We want to have 1 unit of sterling cash bonds at time *T*. This means that we had to purchase *e**x**p* − *u**T* units of sterling cash bonds at time 0.

We could only purchase them for units of dollar cash bonds, which we would have to borrow, which would cost us. So at time 0 we have a negative position in dollar cash bonds, *C*_{0}*e**x**p* − *u**T*.

At time *t* our position in sterling cash bonds is *e**x**p* − *u*(*T* − *t*) and in dollar cash bonds it is *C*_{0}*e**x**p* − *u**T**e**x**p**r**t*.

At time *T*, this gives us *e**x**p* − *u*(*T* − *T*) = 1 unit of sterling cash bonds (by construction) and *C*_{0}*e**x**p* − *u**T**e**x**p**r**T* = *C*_{0}*e**x**p*(*r* − *u*)*T* units of dollar cash bonds, the quantity that is defined as the **forward price**. This is effectively the exchange rate that we have manufactured using our static replication technique.

## *The Concepts and Practice of Mathematical Finance* by Mark Joshi

### Page 14, Exercise 1.2

In this Exercise, the die is rolled once (not once for each of the six assets). The die is rolled, the number on the die is recorded, then each asset's index is checked to see whether it matches this number and the asset pays off accordingly. In this way, exactly one of the six assets will pay off 1, the rest will pay off 0. Thus, for the sum of the assets, the payoff is deterministically 1.

The point of this exercise is to give an example of diversifiable risk.

The resource InvestorWords gives the following definition of **diversifiable risk**: "The risk of price change due to the unique circumstances of a specific security, as opposed to the overall market. The risk can be virtually eliminated from a portfolio through diversification. Also called **unsystematic risk**."

In this exercise the risk was eliminated completely.

### Page 94, Equation 5.3

*I would like to thank Dr. Paul Davis for clarifying this to me.*

In Section 5.3 (Stochastic Processes), the following equation appears on page 94:

Unfortunately it isn't immediately apparent from the explanation how this equation is derived (at least it wasn't immediately apparent to me, which certainly doesn't mean much).

Here

What is happening? Nothing extraordinary. The algebra is trivial once the logic behind the derivation is understood. I shall derive (5.3) carefully (perhaps too carefully!), step by step.

We have written

- μ(
*r*) = μ(*t*) +*e*(*r*)

and

- σ
^{2}(*r*) = σ^{2}(*t*) +*f*(*r*)

Thus we have expressed the mean (variance) at *r* in terms of the mean (variance) at *t* plus the error term.

In the first equation on page 94 we were assuming that σ is identically zero:

**We are no longer assuming that.** Let us reinstate our stochastic component, as in page 90 (remember μ + σ*N*(0,1)):

Now let us substitute our expressions for μ(*r*) and σ^{2}(*r*):

Hence

The factor in square brackets is, of course, *g*(*t*,*h*), and we are home.

- This page was last modified on 14 May 2011, at 12:57.
- This page has been accessed 145,560 times.