5
$\begingroup$

I've just moved on from single variable to multivariable calculus and having trouble understanding the gradient as I'm trying to draw a comparison b/w single and multivariable derivatives.

Why is a gradient a vector? So, let's take this with an example that we have a scalar function with 2 inputs i.e $z= F(x,y) = x^2 - y^2$. Now, I do understand that gradient is a vector of the partial derivative of $z$ (or $F(x,y)$). However, why do we need to write it as a vector? If we write it as a vector we are also assigning it a direction.

And to me, this direction seems not intuitive as partial derivatives ($\partial_x z, \partial_y z$) tell us the change in $z$ w.r.t to $x$ or $y$ ($2x$ & $-2y$ in the example above).

So, if we plug in some values of $x$ and $y$ (let's say 2 and 3) then we end up with 2 scalar values which would tell us the change in $z$. So, in our example, that would be 4(w.r.t x) and -6 (w.r.t $y$). Using this we can get a net change in $z$.

And to take this forward, these scalar values could be multiplied with any other unit vector to get the directional derivative (or a net change in $z$). Just not sure why gradient is a vector with direction?

Also, the proof (dot product yields maximum value when 2 vectors point the same direction) for gradient being the steepest descent/ascent seems to be dependent on this fact (gradient as a vector).

The source I'm using is Khan Academy's Multivariable calculus.

$\endgroup$
2
  • 1
    $\begingroup$ The basic idea is that the length/norm of the gradient is the maximum rate of change of $z(x,y)$ at the point $(x,y)$. It also turns out that the direction of the maximum rate of change is also the direction in which the gradient points. For those two reasons, it is nice to think of the gradient as a vector. Then plotting the gradient of a scalar function as a vector field shows which direction is "uphill". $\endgroup$ Commented Feb 14, 2022 at 19:10
  • 1
    $\begingroup$ Differentiability means linear approximation at a point. The "gradient" is the vector representation of the linear transformation in this approximation. There are some geometrical motivations that makes the gradient to be thought as a "direction of maximal increase" (this is a good intuition, albeit not a mathematical theorem). This is the direction you were missing. $\endgroup$
    – William M.
    Commented Feb 14, 2022 at 19:46

4 Answers 4

6
$\begingroup$

Simple explanation:

You have the directional derivative of a scalar function defined as:

$$ D_v f(x) = \lim_{\epsilon \to 0} \frac{f(x+\epsilon v)-f(x)}{\epsilon}$$

Intuitively how much does the function change when you move an $\epsilon$ distance from the original point $x$ (point in inut plane) in direction of $v$.

Question: Does there exist a vector, notated as $\nabla F$, such that $\nabla F \cdot v= D_v f(x)$? Let's suppose that it exists, then what would be it's components? Well... in Cartesian bases (2D) we have:

$$ \nabla F = u \hat{i} + v \hat{j}$$

Where $u,v$ are some functions. Suppose we doted both sides with $\hat{i}$, then:

$$D_{\hat{i}} F= \nabla F \cdot \hat{i} = u$$

But the directioncal derivative in that $\hat{i}$ direction is just $\frac{\partial F}{\partial x}$ meaning that:

$$ u= \frac{\partial F}{\partial x}$$

Similarly we can get $v = \frac{\partial F}{\partial y}$ in a similar fashion, specify all components of the gradients and hence the vector.

Here's the thing about the steepest ascent, if we want to know the direction $\hat{v}$ such that the function $F$ increases the fastest. Inspect the introduction to the gradient $$D_{\hat{v} } F = \nabla F \cdot v$$

We know $\nabla F$ and length of $v$ is fixed, hence the only thing we can vary is the angle between the vectors. It is clear the dot product is maximized when the vectors are parallel meaning $v$ is in same direction as $\nabla F$.

In other words, $\nabla F$ is the pointer to the direction which the function increases the fastest.

$\endgroup$
3
$\begingroup$

There are some nice answers already. However, I wanted to tackle the main essence of the question from a different perspective. Indeed, the gradient is related to this "steepest descent direction" and is used to compute directional derivatives as well. However, as the OP rightfully asks, why is it that combining the two partial derivatives give us "a vector with direction"?

This all boils down to the question "What is a vector?" so I'll start from there. Suppose we are working in a 2D space. If I pick 2 arbitrary quantities and put them in an array, would that be a vector? For example, I have two scalars $m,h$ for my mass and my height. If I write them as $(m,h)$ would that be a vector? We can ask the same question for $\left(\frac{\partial f(x,y)}{\partial x},\frac{\partial f(x,y)}{\partial y}\right)$ by combining the scalars $\frac{\partial f(x,y)}{\partial x},\frac{\partial f(x,y)}{\partial y}$.

Since the OP background is just single variable calculus, I'll keep things simple. In 2D, you can picture generic vectors as arrows drawn in a white sheet. As you know, a vector (an arrow) $\vec{A}$ have a magnitude, and a direction.

enter image description here

You can scale them as $\alpha \vec{A}$ with any real number $\alpha$ to make them larger or smaller. Also, you can add them as $\vec{A}+\vec{B}$ using the classical parallelogram rule. Finally, you have this special vector $\vec{0}$ which is the point from where you draw all other vectors, such that $\vec{A}+\vec{0}=\vec{A}$. All these is what we call the "axioms of the vector space" which are the rules by which we define vectors to work.

Note that, we haven't talked about the components of the vector. Up to now, vectors are just arrows, and do not seem to have anything to do with arrays of numbers. However, very often you can pick what we call a coordinate basis, coordinate system, or coordinate axes. Instead of just drawing the vector, you might want to draw this coordinate axes in 2 arbitrary non-parallel directions, for example by drawing two perpendicular rays which we might call the coordinate axes $x,y$:

enter image description here

Once you pick these axis, you can obtain the components $A_x,A_y$ of the vector $\vec{A}$ as the projections of the vector onto the axes (just two numbers in fact). Hence, $(A_x,A_y)$ represents the components of the vector. The important part here is that $(A_x,A_y)$ are the components but not the vector itself. The vector is there, regardless if you put coordinate axes. In fact, someone else could have picked different coordinate axes $x',y'$, for example one that is a tilted by an angle $\theta$. Under these other coordinate axes, the projections $A_x',A_y'$ won't be the same $A_x,A_y$ as before:

enter image description here

Sorry for insisting, but recall that the vector $\vec{A}$ is the same, regardless if you represent it as $(A_x,A_x)$ in the coordinate system $x,y$, or as $(A_x',A_y')$ in the coordinate axis $x',y'$. However, using a little bit of geometry, you can obtain that $$ \begin{bmatrix} A_x' \\ A_y' \end{bmatrix} = \begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} A_x\\ A_y \end{bmatrix} $$ So, we say that the components of a vector "transform" according to this geometrical rule defined by the relation between coordinate systems $x,y$ and $x',y'$. Note that, we could have picked a different system $x',y'$ not necessarily perpendicular:

enter image description here

In this case, the transformation for the vector components would look a little bit more complicated, but it can be obtained using some more trigonometry.

This is what physicist mean when they say "a vector is something that transforms as a vector", meaning that when you change the coordinate system, the vector components should transform as well according to the way the coordinate systems are geometrically related between them.

Now, we can think of what kinds of things are NOT vectors. I'll explain this part from the intuitive physics perspective first. Lets recall when we just pictured vectors as arrows, without coordinate systems. There, we also talked about scale parameters $\alpha$ which we could use to scale vectors as $\alpha\vec{A}$. These are called scalars, and are agnostic to the coordinate system as well. For example, the length of a vector is the same regardless of how you pick the coordinate axes. Once we give some meaning to two scalars, for example weight and height $w,h$ of some person, it might be silly to think that the array of numbers $(w,h)$ would change when you change the coordinate system as well: your height and weight should remain the same. Thus $(w,h)$ is not a vector, since it doesn't transform like a vector.

Now, what we have been waiting for. What if we combine the quantities $\frac{\partial f(x,y)}{\partial x},\frac{\partial f(x,y)}{\partial y}$? is this a vector? We need to check if these transform like a vector.

First, lets see what $f(x,y)$ really means. $f$ is a scalar function which takes as an argument a position vector $(x,y)$. However, the representation $(x,y)$ depends on the picked coordinate system. Despite this, the position of a point should be agnostic of the particular coordinate system, so its better to say that $f$ depends on the position vector $\vec{r}$ instead, with components $x,y$ on a particular coordinate system. If we change coordinates as we did before, the value of $f(\vec{r})$ should remain invariant.

In this context, $\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}$ are just the rates of change in the directions of the particular coordinate axes we choose. But we could have chosen totally different coordinates $x',y'$ as before, obtaining $\frac{\partial f}{\partial x'},\frac{\partial f}{\partial y'}$ as the rates of change in the directions of the new coordinate axes. Now, how are these related between?

Assume for simplicity that the coordinates change with the tilted axes using an angle $\theta$. As we obtained before, the components of the position vector are related as: $$ \begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x\\ y \end{bmatrix} $$ equivalently: $$ \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x'\\ y' \end{bmatrix} $$ Hence, $x,y$ are both combinations of $x',y'$ as $x = \cos(\theta)x' - \sin(\theta)y'$, $y=\sin(\theta)x' + \cos(\theta)y'$. So, one can use the chain rule to compute $$ \begin{aligned} \frac{\partial f}{\partial x'} &= \frac{\partial{f}}{\partial x}\frac{\partial x}{\partial x'} + \frac{\partial{f}}{\partial y}\frac{\partial y}{\partial x'} = \frac{\partial{f}}{\partial x}\cos(\theta) +\frac{\partial{f}}{\partial y}\sin(\theta) \\ \frac{\partial f}{\partial y'} &= \frac{\partial{f}}{\partial x}\frac{\partial x}{\partial y'} + \frac{\partial{f}}{\partial y}\frac{\partial y}{\partial y'} = -\frac{\partial{f}}{\partial x}\sin(\theta) +\frac{\partial{f}}{\partial y}\cos(\theta) \\ \end{aligned} $$ equivalently: $$ \begin{bmatrix} \frac{\partial f}{\partial x'} \\ \frac{\partial f}{\partial y'} \end{bmatrix} = \begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} \frac{\partial f}{\partial x}\\ \frac{\partial f}{\partial y} \end{bmatrix} $$ One can do a similar reasoning for general coordinate transformations as well.

Thus, the numbers $\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}$ transforms like vectors! Hence, $(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})$ are the components of a vector (the gradient vector) for the chosen coordinate system.

Now, this can be thought as the "proof" (informally) that the gradient is a vector. However, more deeply, what does it mean for the gradient to be a vector?

As you said so, the gradient is the direction of steepest ascent for a function $f$. Lets try to start with that object, without thinking about partial derivatives. If you have your function $f$ (pictured as huge a blanket with its valleys and hills) which depends on the position vector, the direction of steepest ascent would depend only on the geometry of $f$. Thus, by construction, the direction of steepest ascent is a vector and won't depend on how you choose the coordinate axes, but only its components will. Despite this, one might try to find what are the components of such vector in the current coordinate system. As others have pointed out, these components correspond to the partial derivatives (with respect to the chosen coordinate system variables $x,y$) which we showed transform consistently as vector components.

So, in summary, it is not that the $(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})$ comprise a vector, but that they are the components of something that IS a vector, this is, the gradient vector which is invariant/agnostic to the particular coordinate system, namely the direction of steepest ascent.

I hope this helps!

$\endgroup$
5
  • $\begingroup$ Thanks for this detailed answer. However, I have a couple of questions: $\endgroup$
    – DDG
    Commented Feb 15, 2022 at 19:49
  • $\begingroup$ If we change the co-ordinate axes from (x,y) to (x',y'), wouldn't our 𝑓(x,y)/𝑓(𝑟⃗ ) change according to that ? Considering the example where 𝑓(x,y)=x2y and x,y=1,2 with x',y' = 1.866,1.233 and theta = 30°. Also, I agree with the fact that changing the coordinate axes won’t change the vector’s magnitude; however, wouldn’t the direction change (considering direction is given in terms of angle for one of the axes)? Isn’t the direction dependent on the coordinate axes we choose? $\endgroup$
    – DDG
    Commented Feb 15, 2022 at 20:08
  • 1
    $\begingroup$ If you have an expresion for $f(\vec{r})$ in some coordinate system, it doesnt mean that the it must have the same expresion in another one. For example, if $f=x^2y$, when you change to system $x',y'$ you would need to replace the expresions of $x,y$ in terms of $x',y'$ inside the expresion of $f$. Thus, obtaining another new expresion for $f$ in the new system of coordinates. Just like vector components, the "expresion" of the scalar function might change under coordinate transformation, but the scalar function itself (its values) remain the same. $\endgroup$ Commented Feb 15, 2022 at 20:20
  • 1
    $\begingroup$ Regarding if direction changes under coordinate changes: the direction changes with respect to the coordinate system, since each coordinate system might have a different notion of what is up/down/left/right. But the direction of the vector by itself (by essence, without any coordinate system) is not affected. Think of gravity. Gravity pulls towards earth's center (no coordinate system required). However, I can put an arbitrary coordinate system and say that gravity pulls "to the right" in some point in space. Saying "pulls towards the earth's center" doesnt require a coordinate system. $\endgroup$ Commented Feb 15, 2022 at 20:25
  • $\begingroup$ However, if you care of direction defined as orientation with respect a certain set of coordinate axes, then yes, such direction would change (since by definition it depends on the coordinate system). However, the vector itself (such element of the vector space) remains invariant. $\endgroup$ Commented Feb 15, 2022 at 20:57
1
$\begingroup$

Your question is very legitimate. The natural extension of the derivative is the differential $d F = (dz/dx, dz/dy)$, which as you point out is not a vector, but rather a linear form, that can tell us (at any point (x,y)) the rate of change of $F$ in the x-direction, or in the y-direction, or more generally any direction $\vec{v}$, via the formula: $$F( (x,y) + \vec{v}) - F(x,y) \approx dF(\vec{v})$$ In this formula, $dF$ is a linear form (a row vector) acting on a (column) vector $\vec{v}$ linearly (by left multiplication).

Arguably, the gradient is not as natural as the differential, in fact differential geometers are well aware that it is a metric notion: it requires an inner product to be well-defined. On $\mathbb{R}^2$, we have our standard dot product, which allows the definition of the gradient by the formula: $$ dF(\vec{v}) = \operatorname{grad}F \cdot \vec{v}$$ Of course, this is equivalent in this simple setting to saying that the gradient is the column vector that is the transpose of the differential. (More generally, it is the dual vector given by the Riesz representation theorem.) But this formula is very insightful and holds the answers to any further questions you might have. For instance, it should be clear from Cauchy-Schwarz that it gives the direction of steepest change, and that the gradient vanishes if and only if the function is constant to first order.

$\endgroup$
2
  • 2
    $\begingroup$ Considering OP just finished single variable calculus, it would be quite the feat if OP understands half of this post. $\endgroup$ Commented Feb 14, 2022 at 19:41
  • $\begingroup$ IMO differential form is actually easier than vectors but the issue is people are afraid of it @Golden_Ratio $\endgroup$ Commented Apr 27, 2022 at 5:30
1
$\begingroup$

I think its back tracking from the total derivative.

$df(x,y,z)=\frac{\partial f}{\partial x}dx+\frac{\partial f}{\partial y}dy+\frac{\partial f}{\partial z}dz$. If you associate a distance vector to the product $d\vec{s}=dx\hat {i}+dy\hat{j}+d\hat{k}$, to get the total derivative, you need something like $\nabla f=\frac{\partial f}{\partial x}\hat{i}+\frac{\partial f}{\partial y}\hat{j}+\frac{\partial f}{\partial z}\hat{k}$.

You take the dot product of a vector with another vector. You start with one vector sort out the other. This works in other coordinate systems.

The line element for spherical coordinates is $d\vec{s}= dr \hat{r} + r \sin \theta d\phi\hat{\phi}+rd\theta\hat{\phi}$. To get the total derivative you'd need $\nabla f= \frac{\partial f}{\partial r}\hat{r}+\frac{1}{r\sin\theta}\frac{\partial f}{\partial \phi}d\phi\hat{\phi}+\frac{1}{r}\frac{\partial f}{\partial \theta}d\theta\hat{\theta}$.

Suppose we constraint f so that $f=0 \ $. $df=\nabla f \cdot d\vec{s}=0$ suggests the gradient is perpendicular to level curves of a scalar function. The maximum total derivative occurs when the direction traveled makes an angle of zero degrees with the gradient. This suggests direction as well as magnitude for the gradient.

You get a better feel for the concept of direction regarding the gradient when you think of the gradient as being normal to lines on a topological map. The lines indicate constant elevation. They are closer together the steeper the curve.

A more formal treatment takes the level curve aspect and suggests that's the gradient. The gradient is defined as a 1-form in differential geometry. Such considerations are probably best avoided when introduced to gradients.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .