5
$\begingroup$

I want to intuitively understand why the gradient gives you the direction of the steepest ascent of a function.

Apart from the already posted questions, my confusion arises from the fact that we form the gradient vector from the derivative of each dimension separately. Then take the vector consisting of both (for 2D) derivatives take it as the steepest ascent.

What if in both directions the derivative is say $5$, so our vector will be $45$ degrees from both axis, But in that direction specifically the function goes down ?

If it's not clear what I'm confused with, consider this function represented as an image :

$$ \begin{pmatrix}100&5&-100\\0&\textit{0}&5\\0& 0& 0\end{pmatrix}$$

at 0, it makes sense that the derivative is $5$ in $x$ and in $y$, but a vector of $(5,5)$ goes to a direction that's not a steepest ascent. Does this have to do with the differentiability of the function ? what am I missing ?

$\endgroup$
4
  • 6
    $\begingroup$ That function you've given is not differentiable, and that's why it doesn't work. $\endgroup$
    – apnorton
    Commented Jan 21, 2014 at 16:46
  • 2
    $\begingroup$ You are missing the values inbetween 0, 5, and -100. Somewhere there is a level set connecting the two values of 5. So when you leave 0 in the direction (5,5), first you come to that level set. In other words, consider the function $\sin(x \pi)$. If you sample it at the integer values of $x$, to obtain an image as you say, you will only see a constant function zero. Or, if you actually mean a piecewise constant function, then it is not differentiable as the above comment says. $\endgroup$
    – user66081
    Commented Jan 21, 2014 at 16:46
  • 1
    $\begingroup$ To be differentiable means to be almost flat. Can you see how gradient works on a plane? It's the same for all differentiable functions. $\endgroup$ Commented Jan 21, 2014 at 16:52
  • $\begingroup$ Does this answer your question? Why is gradient the direction of steepest ascent? $\endgroup$ Commented Apr 27, 2022 at 5:31

3 Answers 3

5
$\begingroup$

What will help your intuition the most is remembering that the derivative (the gradient) is a local feature, it only depends on what the function is at that point, and not any distance away.

You may be visualizing a function which buckles down in the gradient direction, so it's not the steepest ascent some distance away -- but at the point where you find the tangent plane it is the steepest ascent for at least a very small distance.

At a point where a function is differentiable, the function is almost planar in a very, very small region around that point. Remember to visualize the local region as nearly a plane, and your intuition will be happier with the gradient.

$\endgroup$
1
$\begingroup$

Recall the relationship between the gradient $\nabla f(p)$ and the directional derivative $df(p) \textbf{u}$ in the direction of a unit vector $\textbf{u}$: $$df(p) \textbf{u} = \nabla f(p) \cdot \textbf{u}$$ Since $$\nabla f(p) \cdot \textbf{u} = ||\nabla f(p)|| \cos \theta$$ where $\theta$ is the angle between $\nabla f(p)$ and $\textbf{u}$, the unit vector which maximizes the directional derivative is clearly the one whose angle against $\nabla f(p)$ has cosine $1$, an this is $\theta = 0$.

The place where this geometric intuition collides with the phenomenon that you worry about is at the beginning: why are the partial derivatives, i.e. the directional derivatives in just two directions, enough to determine the directional derivatives in all directions? The answer is: they aren't. An unmentioned hypothesis in what I wrote above is that the gradient is continuous at $p$, and without continuity the behavior you describe is actually possible. Here's an example.

Let $f(x,y) = \frac{x^2y}{x^2 + y^2}$. First, let us calculate $f_x(0,0)$ by taking the derivative of the function $f(t,0)$ at $t=0$: $$f_x(0,0) = \left.\frac{d}{dt}\right|_{t=0} \frac{0}{t^2} = 0$$ Similarly, $f_y(0,0) = 0$. Now let us calculate the directional derivative of $f$ in the direction of the vector $\textbf{v} = (1,1)$: $$df(0,0) \textbf{v} = \left.\frac{d}{dt}\right|_{t=0} f(t,t) = \left.\frac{d}{dt}\right|_{t=0} \frac{t^3}{2t^2} = \frac{1}{2}$$ So indeed, this function exhibits the sort of behavior you describe, though you can calculate that the partial derivatives of this function are not continuous at $(0,0)$.

In fact, even worse behavior is possible: there is a function whose directional derivative exists and is zero in every direction but which has nonzero derivative along the parabola $(t,t^2)$. But all of this pathology is eliminated if the partial derivatives are continuous. It is worthwhile to study the proof of this fact.

$\endgroup$
0
$\begingroup$

Consider a first order Taylor's approximation of a continuously differential function $f:U \to \mathbb{R}, U\subset \mathbb{R}^n$ at a point $x$ as follows:

$f(x+d)\approx f(x)+\nabla f(x)^\top d$, i.e. $f$ takes a small step in the direction $d$ at the point $x$. So, a normalized steepest descent can be defined as follows:

$\Delta x = argmin\{\nabla f(x)^\top d | \|d\|=1\}=-argmax\{\nabla f(x)^\top d | \|d\|=1\}=- d^*.$

Geometrical visualization of the steepest descent direction (with an example of $\ell _2 -$ norm) is as follows: enter image description here enter image description here

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .