Integration by substitution: the ultimate fudge?

I have often worked with students on the topic of integration by substitution. This isn’t much of a surprise – it’s a fiddly topic with plenty of room for error and, conceptually, it contains challenging ideas. The thing that interests me, though, is that the challenges faced by students on this topic are never the same. In part, this is because there are many different ways of teaching the topic in the first place – and most of them involve some sort of fudge.

If you’re not familiar with the term, a fudge is when something is presented in a vague way to avoid an underlying issue, perhaps because the author feels the truth is too complicated for the reader – but sometimes because the truth is too complicated for the author and they lack the skills to explain it! In maths, a fudge might occur when a student knows where to start and where they should finish, but has to gloss over a step in the middle. Or it may occur when an answer is obtained which is obviously wrong – for example an inequality is pointing in the wrong direction – but finding the source of the error proves too tricky. In these cases, a student might fudge the matter by correcting the answer and hoping that nobody notices the gap in the logic en route.

Another common fudge in mathematics is the abuse of notation, where valid notation is manipulated in a non-standard way in order to obtain a conclusion. Some abuses of notation are straight-up errors, but others are justifiable either as an appropriate shorthand or because they rely on an idea which is true but which is yet to be developed in a particular context. Consider the following example:

Evaluate  \int_0^3 x(2x+1)^3~\mathrm{d}x.

There are many different ways which you may have been taught to solve this – some not involving substitution at all – but the most obvious approach is to use a substitution where  u=2x+1. You might therefore write:

 u=2x+1  so   \dfrac{\mathrm{d}u}{\mathrm{d}t} = 2 \dfrac{\mathrm{d}x}{\mathrm{d}t}  and therefore   \frac12\mathrm{d}u=\mathrm{d}x.

 u=2x+1  so   x=\dfrac{u-1}2.
When   x=0,u=1. When  x=3, u=7.

Hence  \displaystyle{\int_0^3\! x(2x+1)^3~\mathrm{d}x = \int_1^7\frac{u-1}2\times u^3~\frac12\mathrm{d}u = \frac14\int_1^7 \!u^4-u^3~\mathrm{d}u = \frac14\left[\frac15 u^5 -\frac14 u^4\right]_1^7,}

so  \displaystyle\int_0^3 x(2x+1)^3~\mathrm{d}x = 690.3

And actually, if a student wrote this, I’d be pretty happy. But the first line is problematic. Where did the  t come from? And then where did the  \mathrm{d}t vanish to? It makes it look like  \frac{\mathrm{d}x}{\mathrm{d}t} is a fraction – but most students are taught (correctly) that derivatives aren’t fractions and that you can’t just split them into numerators and denominators… so why make an exception now?

A formal derivation

For many students, an informal understanding of how this works is enough. But for those looking to pursue mathematics at a higher level, a more thorough understanding can be beneficial.

So: consider functions  \mathrm{F}, \mathrm{f} and  \mathrm{u} such that  \mathrm{F}^\prime(x)=\mathrm{f}(x), and values  x_1, x_2, u_1 and  u_2 such that  \mathrm{u}(x_i) = u_i.

Since  \mathrm{f} is the derivative of  \mathrm{F}, it follows from the fundamental theorem of calculus that

 \displaystyle{\int_a^b\! \mathrm{f}(t) ~\mathrm{d} t = \left[\mathrm{F}(t)\right]_a^b = \mathrm{F}(b)-\mathrm{F}(a)}

for any variable t.

Specifically, we note that  \displaystyle\int_{u_1}^{u_2}\! \mathrm{f}(u) ~\mathrm{d}u = \mathrm{F}(u_2)-\mathrm{F}(u_1).

However, since  \mathrm{u} is a function of  x, we can apply the chain rule to  \mathrm{F}(\mathrm{u}(x)). This shows us that

 \frac{\mathrm{d}}{\mathrm{d}x} \mathrm{F} (\mathrm{u} (x) ) = \mathrm{f}(\mathrm{u}(x)) \frac{\mathrm{d}u}{\mathrm{d}x}.

Applying the fundamental theorem of calculus again, it follows that

 \displaystyle{\int_{x_1}^{x_2} \mathrm{f}(\mathrm{u}(x)) \frac{\mathrm{d}u}{\mathrm{d}x} ~\mathrm{d}x = \left[\mathrm{F} (\mathrm{u} (x) ) \right]_{x_1}^{x_2} = \mathrm{F}(\mathrm{u}(x_2))-\mathrm{F}(\mathrm{u}(x_1))}.

Finally, recall that  \mathrm{u}(x_1) = u_1 and  \mathrm{u}(x_2) = u_2. We can substitute these into the previous result to show that

 \displaystyle{\int_{x_1}^{x_2} \mathrm{f}(\mathrm{u}(x)) \frac{\mathrm{d}u}{\mathrm{d}x} ~\mathrm{d}x = \mathrm{F}(u_2)-\mathrm{F}(u_1)}.

We have therefore formed two integrals, each of which is equivalent to  \mathrm{F}(u_2)-\mathrm{F}(u_1). Since they are both equal to the same quantity, the integrals must be equal to one another. Therefore,

 \displaystyle{\int_{x_1}^{x_2} \mathrm{f}(\mathrm{u}(x)) \frac{\mathrm{d}u}{\mathrm{d}x} ~\mathrm{d}x = \int_{u_1}^{u_2} \mathrm{f}(u) ~\mathrm{d}u}.

This result is the core of integration by substitution, and students can use it directly should they wish. The formula shows us that a complicated integral (the left-hand side of the equation) can be replaced with a simpler integral (the right hand side). In principle, the simpler integral should be easier to work with. Let’s return to our original example:

Evaluate  \displaystyle{\int_0^3 x(2x+1)^3~\mathrm{d}x}.

We again will use the substitution  u=2x+1, so  \frac{\mathrm{d}u}{\mathrm{d}x} = 2 . Halving both sides gives us that  \frac{1}{2}\frac{\mathrm{d}u}{\mathrm{d}x} = 1 .

Forming an expression with value 1 is important for the next stage, because 1 is the multiplicative identity.

Therefore,

 \displaystyle{\int_0^3 \!x(2x+1)^3~\mathrm{d}x = \int_0^3 \!x(2x+1)^3 \frac{1}{2}\frac{\mathrm{d}u}{\mathrm{d}x}~\mathrm{d}x}

and by the result proved earlier, we can write

 \displaystyle{\int_0^3 \! x(2x+1)^3 \frac{1}{2}\frac{\mathrm{d}u}{\mathrm{d}x}~\mathrm{d}x = \int_1^7 \!\frac{u-1}2 \times u^3 \times \frac{1}{2}~\mathrm{d}u},

which we can integrate as before to obtain 690.3.

So why do it this way? Is it essential to learn the technique of integration by substitution rigorously, or is it enough to have an informal understanding of why it works? Really, that depends on the student. Whilst some will be perfectly happy to apply methods without fully deriving them, others prefer to know exactly where the rules have come from.

The product rule

In fact, there are plenty of other techniques and methods that teachers often tend to state when they could instead be derived – many of them  to do with calculus. For instance, the product rule for differentiation, which states that

 \frac {\mathrm{d}} {\mathrm{d}x} (uv) = u v^\prime + v u^\prime,

can be easily derived if students have already covered implicit differentiation and the differentiation of the natural logarithm – both quite approachable topics. Start by letting  y = uv, where  y,  u and  v are all functions of  x. Taking the natural logarithm of both sides enables the product to be written as a sum, and this is the key to finding its derivative, since

 \frac {\mathrm{d}} {\mathrm{d}x} (f+g) = \frac {\mathrm{d}f} {\mathrm{d}x} +\frac {\mathrm{d} g} {\mathrm{d}x}.

So,

 y = ux \Rightarrow \ln y = \ln u + \ln v

and so, differentiating implicitly, we note that

 \frac{1}{y} y^\prime = \frac{1}{u} u^\prime + \frac{1}{v} v^\prime.

Multiplying through by  y gives

 y^\prime = \frac{y}{u} u^\prime + \frac{y}{v} v^\prime,

and, recalling that  y = uv we can conclude that

 (uv)^\prime = v u^\prime + u v^\prime,

 or, alternatively,

 \frac {\mathrm{d}} {\mathrm{d}x} (uv) = u \frac {\mathrm{d}v} {\mathrm{d}x} + v \frac {\mathrm{d}u} {\mathrm{d}x}.

What are the advantages of learning to derive the formula? It’s certainly not something which most students would ever need to do in an examination. However, it is the case that working through the derivation gives students the opportunity to practice other skills that they do need to master – differentiating implicitly; applying the laws of logarithms; differentiating the logarithmic function – in addition to being introduced to new strategies such as converting a product to a sum. For students going on to tackle more advanced problems in the future, having a broader range of strategies at their disposal will make them more versatile and more likely to succeed in particularly difficult circumstances such as Olympiad or STEP problems.

The quotient rule

I have also found that students who have become familiar with the idea of deriving formulae for themselves are more likely to look for strategies to derive other results. Take, for example, the quotient rule. Once a student has learnt to derive the product rule, they might consider applying a similar strategy to derive the quotient rule. Starting by letting

 y = \dfrac u v,

a student might observe that

 \ln y = \ln u - \ln v

and so, differentiating implicitly, find similarly that

 \dfrac{1}{y} y^\prime = \dfrac{1}{u} u^\prime - \dfrac{1}{v} v^\prime.

Substituting   y = \frac u v  yields

 \dfrac{v}{u} y^\prime = \dfrac{1}{u} u^\prime - \dfrac{1}{v} v^\prime,

and this is then easily rearranged to give the familiar form of the quotient rule:

 \dfrac {\mathrm{d}} {\mathrm{d}x} \left( \dfrac u v \right) = \dfrac{v u^\prime - u v^\prime}{v^2}.

Students who have developed an appetite for derivation, however, might also consider applying the product rule to  y = u v^{-1} in conjunction with the chain rule, which also enables a derivation of the quotient rule. The best students will then ask themselves which approach they prefer, and why. Is one easier? More elegant? Faster?

So why derive?

There are lots of contexts in which we don’t need to know why something works in order to use it. I don’t need to know how an engine works to ride a bus, or even to drive a car; I don’t need to understand microwaves to heat up my noodles safely. In each case, I just need to know the circumstances under which the given tool can be used safely. The same goes for integration by substitution: provided we understand clearly the circumstances under which a certain short-cut can be used, I can apply taught principles – even if they are a bit fudgey – to obtain correct results. But anybody aiming for high performance in a particular field needs to understand more than the basics of operation. Learning exactly where rules come from, and precisely why they work, enables the development of expertise. And this in turn creates more flexible thinkers, able to adapt rules to suit new situations and to develop new rules when the circumstances demand. And since derivation often provides an opportunity to practice other important skills, it has tangible value of its own, as well.

 

Posted by

Copyright © 2022 Warp Drive Tutors, Inc.
Scroll to Top