Why Taylor Series Work
Taylor series allow us to very accurately approximate some infinitely differentiable functions, while only having knowledge of its derivatives at one point. This post deals with understanding how this is possible.
Only a basic familiarity with calculus is assumed, but motivation to understand mathematics is almost essential to understanding what's going on. Formally, you should know about sequences, series, and general properties/applications of derivatives.
All of the proofs in this post are presented as a set of exercises. The idea is to provide guidance through the proof and encourage a better understanding of the underlying motivation and concepts. However, solutions or hints are provided for guidance.
Let's say you have an approximation p of π so that ∣π−p∣<1. That is, suppose we have a pretty close approximation of π, like p=3 or p=3.14. Let n be the number of decimal places that p successfully approximates. For example, taking p=3.14 would mean n=2.
Then p+sin(p) successfully approximates 3n decimal places. That means that p+sin(p) is at least 3 times better than p at approximating π. To see this, we can start with p=3.1. Then p+sin(p)=3.1+sin(3.1)≈3.14158066243. Repeating this once more, we have 3.14158066243+sin(3.14159265359)≈3.14159265389. After repeating this only twice, we started at a number which approximates π with 1 decimal place of accuracy and received a number which approximates π with 11 decimal places of accuracy. Why does this happen?
Like most things in math, this is no coincidence, and the tools provided by calculus allow you to see precisely why this "magic trick" works so well.
The first challenge is determining the role of sin. Clearly it has something to do with the approximations getting closer to π, but it not so clear what this relationship is. What we'll try to do is simplify this problem so that instead of studying functions like sin or cos, we study polynomials.
This is because polynomials are generally easier to work with: they are nicer to differentiate, it is easy to study their roots, and it is easier to describe how they transform a given input (i.e. describing what x2 does to x is easier than describing what sin(x) does to x).
Our new goal will be to write the aforementioned functions (and many more) as polynomials. It turns out that if this function is "nice" enough (we'll figure out what "nice" means later), we can represent these functions as an infinite series. This infinite series is called the Taylor series of f, and taking its partial sums are polynomials. To understand this, we need to first understand what an "infinite series" or "partial sum" is.
Let (an)n≥1 be a sequence. We can then define another sequence (sn)n≥1 by sn=k=1∑nak. Then we define the infinite series ∑k=1∞ak by k=1∑∞ak=n→∞limk=1∑nak=n→∞limsn. For a particular positive integer k, sk is called the kth partial sum of the series.
If the sequence (sn) diverges (i.e. does not converge), then ∑k=1∞ak is said to diverge and does not represent any value.
There are some properties of series that are important for us to note. The first is that not every infinite series converges. For example, ∑n=1∞n diverges.
Another, less obvious note is that even if an→0 (read "an converges to zero"), ∑n=1∞an does not necessarily converge. For example, choosing an=n1 yields what is called the harmonic series: n=1∑∞n1 This series diverges, and various proofs can be found here. However, an does converge to zero.
We now know enough about series to study the Taylor series.
Suppose f is defined on (a,b) and has all derivatives at some point a<c<b. Then the Taylor series of f centered at c is a function defined by Tc(x)=n=0∑∞n!f(n)(c)(x−c)n. Note that Tc(x) need not converge, meaning that Tc need not be defined for all values in the domain of f.
We also define Pn(x)=k=0∑nk!f(k)(c)(x−c)k to denote the first n terms of the Taylor series. Pn is called the nth Taylor polynomial of f centered at c. It is a polynomial of degree (at most) n. The interesting part about the Taylor series is that for certain functions f and certain values of x, Ta(x)=f(x). To see why this happens, we need to study Taylor's theorem.
Taylor's theorem gives us a way to estimate how closely Pn approximates the original function. To quantify this "closeness," we define the remainder Rn(x) by Rn(x)=f(x)−Pn(x)=f(x)−k=0∑nk!f(k)(c)(x−c)k. Some simple algebraic manipulation shows us exactly what this remainder means, as well as why it is called the remainder: f(x)=Pn(x)+Rn(x). Taylor's theorem gives us a formula for the remainder, which will later prove to be useful in studying the Taylor series of a function.
Taylor's Theorem: Let n be some positive integer. Suppose f be a function defined on (a,b) (we allow a=−∞ and/or b=∞) which is differentiable n+1 times on (a,b). Then for any real numbers a<c<x<b, there exists some c<z<x so that f(x)=k=0∑nk!f(k)(c)(x−c)k+(n+1)!f(n+1)(z)(x−c)n+1.
Exercise 1: Give an example of a function which satisfies the hypothesis of Taylor's theorem on some interval (a,b)
Since sin has all derivatives on (−∞,∞), it satisfies the hypothesis of Taylor's theorem.
Exercise 2: Explain how the theorem tells us that for a function f satisfying the hypothesis above, Rn(x)=(n+1)!f(n+1)(z)(x−c)n+1 for some c<z<x.
Observe that (n+1)!f(n+1)(z)(x−c)n+1=f(x)−k=0∑nk!f(k)(c)(x−c)k=f(x)−Pn(x)=Rn(x). The first equality is given by Taylor's theorem.
The proof of Taylor's theorem relies on Rolle's Theorem (see below). If you want to prove Rolle's theorem yourself, you can see a similar set of guiding exercises in my other post.
Rolle's Theorem: If a function h is continuous on [a,b] and differentiable on (a,b) and h(a)=h(b), then there exists at least one p in (a,b) such that h′(p)=0.
Our method of proof will be to observe that there exists an M which satisfies f(x)=Pn(x)+(n+1)!M(x−c)n+1. Now we just need to show that M=f(n+1)(z) for some c<z<x.
Exercise 3: Explain why such an M exists.
A solution to M can be found through algebraic manipulation, so it is clear that M exists for a particular x and c.
Let's define a function g on (a,b) by g(t)=Pn(t)+(n+1)!M(t−c)n+1−f(t). We will first try to prove some properties of g. Soon, the reason for defining g in this way will become clear.
Exercise 4: Is g differentiable n+1 times on (a,b)? Explain.
The answer is yes. Can you explain why? Try to use the fact that g is a sum of a polynomial and a function which is differentiable n+1 times.
Polynomials have all derivatives on (a,b) and f is differentiable n+1 times on (a,b). Since g is a sum of functions differentiable n+1 times on (a,b), g is itself differentiable n+1 times.
Exercise 5: Let j be a positive integer such that 0≤j≤n. Show that g(j)(c)=0 for all such j.
Differentiation can be done term by term, meaning g(j)(c)=Pn(j)(c)+dtjdj((n+1)!M(t−c)n+1)(c)−f(j)(c).
Since c−c=0 and j<n+1, dtjdj((n+1)!M(t−c)n+1)(c)=(n+1−j)!M(c−c)n+1−j=0
Using the fact that Pn is a polynomial, we see that Pn(j)(c)=k=j∑n(k−j)!f(k)(c)0k−j=f(j)(c).
For the solution, open all three hints at the same time and read them in order.
Exercise 6: Compute g(n+1).
Recall that we defined g by g(t)=Pn(t)+(n+1)!M(t−c)n+1−f(t). Since Pn(t) is a polynomial of degree n, Pn(n+1)(t)=0. Similarly, dtjdj((n+1)!M(t−c)n+1)=M. Hence, g(n+1)(t)=M−f(n+1)(t).
Exercise 7: Show that there exists a c<x1<x so that g′(x1)=0.
Does g satisfy the conditions for Rolle's theorem on [c,x]?
Show that g satisfies the conditions for Rolle's theorem on [c,x] by showing that:
- g is continuous on [c,x]
- g is differentiable on (c,x)
- g(c)=g(x)
Since g is differentiable n+1 times on (a,b) by Exercise 4, it is continuous on (a,b). Then g is continuous on [c,x] and differentiable on (c,x) since both [c,x] and (c,x) are contained in (a,b). Moreover, g(c)=g(x)=0 by Exercise 5 and the definition of M.
Thus g satisfies the conditions for Rolle's theorem on [c,x], meaning there exists c<x1<x so that g′(x1)=0.
Exercise 8: Show that there exists c<xn+1<x such that g(n+1)(xn+1)=0. This exercise is the meat of the proof, so make sure you understand it. For this reason, plenty of hints are provided.
Use a technique similar to what was used in Exercise 7.
Observe that g′(x1)=g′(c)=0.
Apply Rolle's theorem to g′ on [c,x1] (where x1 is defined as in Exercise 7) to find c<x2<x1<x so that g′′(x2)=0.
Recall x2 from Hint 3. Can we apply Rolle's theorem to g′′ on [c,x2]?
Recall what was concluded in Exercise 5. This should help you show that every g(j) (where 0≤j≤n) satisfies Rolle's theorem on some subinterval of [c,x].
Recall x1 from Exercise 7 and observe that g(j)(c)=0 for 0≤j≤n.
Since g′(x1)=g′(c)=0 and [c,x1] is contained in [c,x], there exists c<x2<x1<x such that g′′(x2)=0 by Rolle's theorem. By repeating this process, we find some c<xn+1<⋯<x1<x so that g(n+1)(xn+1)=0.
Exercise 9: Use Exercises 6 and 8 to complete the proof of Taylor's theorem.
By Exercise 6, g(n+1)(t)=M−f(n+1)(x). Setting z=xn+1 from Exercise 8 results in g(n+1)(z)=M−f(n+1)(z)=0⟺M=f(n+1)(z) as desired. □
Note that the ⟺ symbol means "if and only if." As used above, it shows that the two equations are algebraically equivalent to each other.
Exercise 10: Explain why Taylor's theorem also deals with the case where c>x (when the center is bigger than x) even though we assumed that a<c<x<b.
Since x and c are arbitrary, we can choose x as the center and the theorem gives us an equivalent result for f(c). Thus, simply "switching" x and c would suffice if x<c.
We have now shown that remainder Rn(x) can be written as: Rn(x)=(n+1)!f(n+1)(z)(x−c)n+1 for some z between x and c. The above expression is called the Lagrange form of the remainder.
Recall that for a function f which is infinitely differentiable at some point c, we denoted the Taylor series of f centered at c by Tc. Although it was hinted that Tc(x) equals f(x) for certain values of x, it is not yet clear when this happens. This is precisely what Taylor's theorem allows us to study.
First, note that by the definition of an infinite series, Tc(x)=limn→∞Pn(x). Now suppose limn→∞Rn=0. Then n→∞lim[Pn(x)−f(x)]=0⟺n→∞limPn(x)=f(x)⟺Tc(x)=f(x).
So, we can show that Tc(x)=f(x) by showing that Rn(x)→0. To do this, we often consider the Lagrange error bound of the function, which follows intuitively from Taylor's theorem. Specifically, pick any B≥∣f(n+1)(t)∣ for all t between c and x. Then ∣Rn(x)∣≤(n+1)!B∣x−c∣n+1
Let's try applying this to the function sin. Let x and c be real numbers. Since ∣sin∣ and ∣cos∣ are both bounded by 1, we see that ∣Rn∣≤(n+1)!∣x−c∣n+1. Since limn→∞[(n+1)!∣x−c∣n+1]=0 (the proof of this is beyond the scope of this post, but it can be concluded from Stirling's formula), limn→∞Rn(x)=limn→∞∣Rn(x)∣=0.
Note: For a proof which does not involve Stirling's formula, you can see the last slide of my presentation on the irrationality of π.
Thus, we have shown that the Taylor series Tc(x) for sin converges to sin(x) for all x, irrespective of the choice of center c.
We now have the tools we need to make sense of the π approximation trick from before. To start, we'll formally write this "trick" in mathematical terms.
Theorem: Let p be a real number so that ∣π−p∣<10−j for some positive integer j. Then ∣π−(p+sin(p))∣<10−3j.
You will once again prove this theorem through a series of exercises. Before proceeding, make sure you understand how the above theorem formalizes the math trick presented in the first section.
Exercise 11: Write P4(p) centered at π for the function sin.
First, observe that sin(π)=0, sin(1)(π)=−1, sin(2)(π)=0, sin(3)(π)=1, and sin(4)(π)=0. Then we write P4(p)=π−p+61(p−π)3.
Exercise 12: Use the Lagrange error bound to find an upper bound for ∣R4(p)∣.
Since cos is bounded by 1, we may write ∣R4(p)∣≤120∣p−π∣5.
Exercise 13: Show that ∣π−(p+sin(p))∣<10−3j. You may use the triangle inequality, which states that ∣a+b∣≤∣a∣+∣b∣ for all real numbers a,b
Substitute P4+R4(p) for sin(p).
Substituting P4+R4(p) for sin(p) yields π−(p+sin(p))=π−p−sin(p)=−61(p−π)3−R4(x)=−(61(p−π)3+R4(x)). Invoking the triangle inequality and Exercise 12, we have that ∣π−(p+sin(p))∣≤61∣p−π∣3+1201∣p−π∣5≤6110−3j+120110−5j. Since 6510−3j≥120110−5j, the proof is complete.
Last Updated: December 2022