Welcome back! So far, we've explored partial derivatives, $f_x$ and $f_y$, which tell us the rate of change of a function in the specific directions of the coordinate axes. But what if we're standing on a mountain and want to know the slope in the exact direction we're facing, say, northeast? Today, we develop the tools to answer that question. We will generalize the derivative to find the rate of change in any direction.
Imagine you are a hiker standing on a mountainside. The surface of the mountain is the graph of the function $z=f(x,y)$.
In the graph below, move the sliders for $a$ and $b$ to change the direction vector $\mathbf{u}=\langle a,b \rangle$ and see how the steepness of the tangent line on the surface changes.
The directional derivative measures the rate of change with respect to distance. To make this a standardized measure, we must use a unit vector $\mathbf{u}$. A unit vector has a magnitude of 1, so it represents a pure direction. If we used a vector of length 2, our answer would be twice as large, representing the change over two units of distance, not the instantaneous rate of change "per unit distance" at the point.
Conceptually, the directional derivative is defined by a limit that mirrors the definition from Calculus I. It measures the change in $f$ as we move an infinitesimally small distance $h$ in the direction of a unit vector $\mathbf{u} = \langle a, b \rangle$.
While the limit definition is the formal foundation, we can derive a simpler computational formula. To do this, we define a function $g(h) = f(x+ha, y+hb)$. The directional derivative is then simply $g'(0)$. By applying the multivariable chain rule from our last lecture to $g(h)$, where $x(h)=x+ha$ and $y(h)=y+hb$, we get $g'(h) = f_x \cdot \frac{dx}{dh} + f_y \cdot \frac{dy}{dh} = f_x \cdot a + f_y \cdot b$. This leads directly to our computational formula.
If $f$ is a differentiable function of $x$ and $y$, then the directional derivative of $f$ in the direction of the unit vector $\mathbf{u} = \langle a, b \rangle$ is:
$$D_{\mathbf{u}}f(x,y) = f_x(x,y)a + f_y(x,y)b$$Find the directional derivative of $f(x,y) = x^2 y^3 - 4y$ at the point $(2, -1)$ in the direction of the vector $\mathbf{v} = \langle 2, 5 \rangle$.
Step 1: Normalize the direction vector. The vector $\mathbf{v}$ is not a unit vector. We find its magnitude: $|\mathbf{v}| = \sqrt{2^2 + 5^2} = \sqrt{4+25} = \sqrt{29}$.
The unit vector $\mathbf{u}$ is $\mathbf{u} = \frac{\mathbf{v}}{|\mathbf{v}|} = \left\langle \frac{2}{\sqrt{29}}, \frac{5}{\sqrt{29}} \right\rangle$. So, $a = \frac{2}{\sqrt{29}}$ and $b = \frac{5}{\sqrt{29}}$.
Step 2: Find the partial derivatives.
$f_x(x,y) = 2xy^3$
$f_y(x,y) = 3x^2y^2 - 4$
Step 3: Evaluate the partial derivatives at the point $(2, -1)$.
$f_x(2,-1) = 2(2)(-1)^3 = -4$
$f_y(2,-1) = 3(2)^2(-1)^2 - 4 = 3(4)(1) - 4 = 12 - 4 = 8$
Step 4: Use the formula.
$$D_{\mathbf{u}}f(2,-1) = f_x(2,-1)a + f_y(2,-1)b = (-4)\left(\frac{2}{\sqrt{29}}\right) + (8)\left(\frac{5}{\sqrt{29}}\right)$$
$$= \frac{-8}{\sqrt{29}} + \frac{40}{\sqrt{29}} = \frac{32}{\sqrt{29}}$$
Find the directional derivative of $g(x,y) = xe^y$ at the point $(2,0)$ in the direction of $\mathbf{v} = \langle 3, -4 \rangle$.
The formula for the directional derivative, $f_x a + f_y b$, looks like a dot product. This is no coincidence! We can "package" the partial derivatives of a function into a special new vector called the gradient.
The gradient of a function $f$, denoted $\nabla f$ (pronounced "del f"), is a vector field that contains all the first-order partial derivative information of $f$.
For $f(x,y)$, the gradient is a two-dimensional vector: $\nabla f(x,y) = \langle f_x(x,y), f_y(x,y) \rangle$.
For $f(x,y,z)$, the gradient is: $\nabla f(x,y,z) = \langle f_x(x,y,z), f_y(x,y,z), f_z(x,y,z) \rangle$.
The directional derivative can now be written concisely as the dot product of the gradient and the unit direction vector:
$$D_{\mathbf{u}}f = \nabla f \cdot \mathbf{u}$$This dot product has a beautiful geometric interpretation. Recall from our study of vectors that the scalar projection of a vector $\mathbf{a}$ onto a vector $\mathbf{b}$ is given by $\frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{b}|}$. This number tells us the signed magnitude of the "shadow" that $\mathbf{a}$ casts on $\mathbf{b}$.
In our case, the directional derivative is precisely the scalar projection of the gradient vector $\nabla f$ onto the direction vector $\mathbf{u}$. Since $\mathbf{u}$ is a unit vector, its magnitude $|\mathbf{u}|$ is 1. The formula thus simplifies beautifully:
Scalar Projection of $\nabla f$ onto $\mathbf{u} = \frac{\nabla f \cdot \mathbf{u}}{|\mathbf{u}|} = \frac{\nabla f \cdot \mathbf{u}}{1} = \nabla f \cdot \mathbf{u}$.
This scalar (a number) represents the "steepness" of the mountain heading in the direction $\mathbf{u}$, or more formally, the instantaneous rate of change of the function in the direction of $\mathbf{u}$.
Find the gradient of $f(x,y,z) = x\sin(yz)$ at the point $(1, 3, 0)$.
First, we find the partial derivatives:
$f_x = \sin(yz)$
$f_y = xz\cos(yz)$
$f_z = xy\cos(yz)$
Now, we evaluate each partial derivative at the point $(1,3,0)$:
$f_x(1,3,0) = \sin(3 \cdot 0) = \sin(0) = 0$
$f_y(1,3,0) = (1)(0)\cos(0) = 0$
$f_z(1,3,0) = (1)(3)\cos(0) = 3$
The gradient vector at this point is $\nabla f(1,3,0) = \langle 0, 0, 3 \rangle$.
Find the gradient of $g(x,y,z) = z^2 e^{xy}$ at the point $(0,1,2)$.
The gradient is far more than a notational shortcut. Using the alternate formula for the dot product, $D_{\mathbf{u}}f = |\nabla f| |\mathbf{u}| \cos\theta$, and knowing $|\mathbf{u}|=1$, we get:
where $\theta$ is the angle between the gradient vector $\nabla f$ and the direction vector $\mathbf{u}$. This simple equation reveals three crucial properties:
That last point is huge: if you move in a direction perpendicular to the gradient, the function's value does not change. This means the gradient vector at a point is always normal (perpendicular) to the level curve (or surface) passing through that point.
It is important to clarify what "moving in the direction of the gradient" means. For a function $z=f(x,y)$, the gradient $\nabla f$ is a 2D vector that lies in the xy-plane (like a compass direction on a map). "Going in the direction of the gradient" means walking on the 3D surface (the mountain) in such a way that your shadow on the xy-plane (the map) moves in the direction of $\nabla f$.
In the graph below, unclick the first equation for the surface to see the relationship in the xy-plane. The gradient (red) points in the direction of steepest ascent, and it is orthogonal to the tangent vector (black) of the level curve (blue).
The concept of "steepest descent" is the foundation of many modern optimization algorithms. In machine learning, a "cost function" measures how inaccurate a model's predictions are. This function is a high-dimensional surface, and the goal is to find its lowest point. The gradient descent algorithm does this by starting at a random point on the surface and repeatedly taking a small step in the direction of the negative gradient, $-\nabla f$. By always moving in the direction of steepest descent, it systematically "walks down the hill" to find a minimum of the cost function, thereby improving the model.
Here is a visualization of the gradient descent algorithm we just discussed, which repeatedly takes a step in the direction of the negative gradient to descend to a minimum.
Let $f(x,y) = x^2 + 4y^2$. Find the direction of maximum increase and the value of this maximum rate of change at the point $P(2, -1)$.
The direction of maximum increase is simply the direction of the gradient vector. First, we compute the gradient:
$\nabla f = \langle f_x, f_y \rangle = \langle 2x, 8y \rangle$.
Next, evaluate the gradient at the point $P(2, -1)$:
$\nabla f(2, -1) = \langle 2(2), 8(-1) \rangle = \langle 4, -8 \rangle$.
The direction of maximum increase is $\langle 4, -8 \rangle$.
The maximum rate of change is the magnitude of this gradient vector:
$|\nabla f(2,-1)| = \sqrt{4^2 + (-8)^2} = \sqrt{16 + 64} = \sqrt{80} = 4\sqrt{5}$.
For the function $f(x,y) = \sin(xy)$, find the direction of steepest descent at the point $(\pi, 1/2)$.
Recall that for a function $F(x,y,z)$, a level surface is the set of all points $(x,y,z)$ where the function has a constant value, i.e., $F(x,y,z) = k$. For example, the level surfaces of $F=x^2+y^2+z^2$ are concentric spheres.
Just as the 2D gradient is normal to level curves, the 3D gradient $\nabla F$ at a point $P(x_0,y_0,z_0)$ is normal (perpendicular) to the level surface that passes through $P$. This gives us an easy way to find the tangent plane to a surface.
The equation of the tangent plane to the level surface $F(x,y,z)=k$ at the point $P(x_0,y_0,z_0)$ is:
$$F_x(P)(x-x_0) + F_y(P)(y-y_0) + F_z(P)(z-z_0) = 0$$The normal vector to this plane is the gradient $\nabla F(P)$. The normal line is the line that passes through $P$ in the direction of this normal vector.
Find the equation of the tangent plane to the ellipsoid $x^2 + 4y^2 + z^2 = 18$ at the point $(1, 2, 1)$.
The ellipsoid is a level surface of the function $F(x,y,z) = x^2 + 4y^2 + z^2$ for the constant $k=18$. The normal vector to the tangent plane is the gradient of $F$.
$\nabla F = \langle 2x, 8y, 2z \rangle$.
Evaluate the gradient at the point $(1,2,1)$ to get the specific normal vector:
$\nabla F(1,2,1) = \langle 2(1), 8(2), 2(1) \rangle = \langle 2, 16, 2 \rangle$.
Using the point $(x_0,y_0,z_0)=(1,2,1)$ and the normal vector $\langle 2, 16, 2 \rangle$, the equation of the plane is:
$2(x-1) + 16(y-2) + 2(z-1) = 0$
$2x - 2 + 16y - 32 + 2z - 2 = 0$
$2x + 16y + 2z = 36$, or more simply, $x + 8y + z = 18$.
Find the equation of the tangent plane to the paraboloid $z = x^2 + y^2$ at the point $(1, 2, 5)$.
The gradient is one of the most important concepts in multivariable calculus. This table summarizes its key properties and formulas.
| Property/Concept | Formula / Description |
|---|---|
| Definition | $\nabla f = \langle f_x, f_y, f_z \rangle$ |
| Directional Derivative | $D_{\mathbf{u}}f = \nabla f \cdot \mathbf{u}$ |
| Direction of Max Increase | The direction of the gradient vector, $\nabla f$. |
| Max Rate of Increase | The magnitude of the gradient vector, $|\nabla f|$. |
| Direction of Max Decrease | The direction opposite the gradient vector, $-\nabla f$. |
| Orthogonality | $\nabla f$ is perpendicular to the level curves/surfaces of $f$. |
| Tangent Plane Normal | $\nabla F$ is the normal vector to the level surface $F(x,y,z)=k$. |
Today we moved beyond the limitations of partial derivatives. The directional derivative lets us find the rate of change in any direction, and the gradient vector is the key to calculating it. More importantly, the gradient itself tells us the direction of steepest ascent and is always normal to level curves, making it a powerful tool for understanding the geometry of multivariable functions.
The temperature at a point $(x,y)$ on a metal plate is given by $T(x,y) = 400 e^{-(x^2+y)/2}$. An ant at $(1,1)$ wants to walk in the direction in which it will cool off the fastest. In what direction should it walk?
Find the equations of the tangent plane and the normal line to the hyperboloid $x^2 + y^2 - z^2 = 1$ at the point $P(1, 1, 1)$.