There are some nice answers already. However, I wanted to tackle the main essence of the question from a different perspective. Indeed, the gradient is related to this "steepest descent direction" and is used to compute directional derivatives as well. However, as the OP rightfully asks, why is it that combining the two partial derivatives give us "a vector with direction"?
This all boils down to the question "What is a vector?" so I'll start from there. Suppose we are working in a 2D space. If I pick 2 arbitrary quantities and put them in an array, would that be a vector? For example, I have two scalars $m,h$ for my mass and my height. If I write them as $(m,h)$ would that be a vector? We can ask the same question for $\left(\frac{\partial f(x,y)}{\partial x},\frac{\partial f(x,y)}{\partial y}\right)$ by combining the scalars $\frac{\partial f(x,y)}{\partial x},\frac{\partial f(x,y)}{\partial y}$.
Since the OP background is just single variable calculus, I'll keep things simple. In 2D, you can picture generic vectors as arrows drawn in a white sheet. As you know, a vector (an arrow) $\vec{A}$ have a magnitude, and a direction.
![enter image description here](https://i.sstatic.net/SIYu3s.png)
You can scale them as $\alpha \vec{A}$ with any real number $\alpha$ to make them larger or smaller. Also, you can add them as $\vec{A}+\vec{B}$ using the classical parallelogram rule. Finally, you have this special vector $\vec{0}$ which is the point from where you draw all other vectors, such that $\vec{A}+\vec{0}=\vec{A}$. All these is what we call the "axioms of the vector space" which are the rules by which we define vectors to work.
Note that, we haven't talked about the components of the vector. Up to now, vectors are just arrows, and do not seem to have anything to do with arrays of numbers. However, very often you can pick what we call a coordinate basis, coordinate system, or coordinate axes. Instead of just drawing the vector, you might want to draw this coordinate axes in 2 arbitrary non-parallel directions, for example by drawing two perpendicular rays which we might call the coordinate axes $x,y$:
![enter image description here](https://i.sstatic.net/0XPBKm.png)
Once you pick these axis, you can obtain the components $A_x,A_y$ of the vector $\vec{A}$ as the projections of the vector onto the axes (just two numbers in fact). Hence, $(A_x,A_y)$ represents the components of the vector. The important part here is that $(A_x,A_y)$ are the components but not the vector itself. The vector is there, regardless if you put coordinate axes. In fact, someone else could have picked different coordinate axes $x',y'$, for example one that is a tilted by an angle $\theta$. Under these other coordinate axes, the projections $A_x',A_y'$ won't be the same $A_x,A_y$ as before:
![enter image description here](https://i.sstatic.net/Uy7Zim.png)
Sorry for insisting, but recall that the vector $\vec{A}$ is the same, regardless if you represent it as $(A_x,A_x)$ in the coordinate system $x,y$, or as $(A_x',A_y')$ in the coordinate axis $x',y'$. However, using a little bit of geometry, you can obtain that
$$
\begin{bmatrix}
A_x' \\
A_y'
\end{bmatrix}
=
\begin{bmatrix}
\cos(\theta) & \sin(\theta) \\
-\sin(\theta) & \cos(\theta)
\end{bmatrix}
\begin{bmatrix}
A_x\\
A_y
\end{bmatrix}
$$
So, we say that the components of a vector "transform" according to this geometrical rule defined by the relation between coordinate systems $x,y$ and $x',y'$. Note that, we could have picked a different system $x',y'$ not necessarily perpendicular:
![enter image description here](https://i.sstatic.net/PtWS8m.png)
In this case, the transformation for the vector components would look a little bit more complicated, but it can be obtained using some more trigonometry.
This is what physicist mean when they say "a vector is something that transforms as a vector", meaning that when you change the coordinate system, the vector components should transform as well according to the way the coordinate systems are geometrically related between them.
Now, we can think of what kinds of things are NOT vectors. I'll explain this part from the intuitive physics perspective first. Lets recall when we just pictured vectors as arrows, without coordinate systems. There, we also talked about scale parameters $\alpha$ which we could use to scale vectors as $\alpha\vec{A}$. These are called scalars, and are agnostic to the coordinate system as well. For example, the length of a vector is the same regardless of how you pick the coordinate axes. Once we give some meaning to two scalars, for example weight and height $w,h$ of some person, it might be silly to think that the array of numbers $(w,h)$ would change when you change the coordinate system as well: your height and weight should remain the same. Thus $(w,h)$ is not a vector, since it doesn't transform like a vector.
Now, what we have been waiting for. What if we combine the quantities $\frac{\partial f(x,y)}{\partial x},\frac{\partial f(x,y)}{\partial y}$? is this a vector? We need to check if these transform like a vector.
First, lets see what $f(x,y)$ really means. $f$ is a scalar function which takes as an argument a position vector $(x,y)$. However, the representation $(x,y)$ depends on the picked coordinate system. Despite this, the position of a point should be agnostic of the particular coordinate system, so its better to say that $f$ depends on the position vector $\vec{r}$ instead, with components $x,y$ on a particular coordinate system. If we change coordinates as we did before, the value of $f(\vec{r})$ should remain invariant.
In this context, $\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}$ are just the rates of change in the directions of the particular coordinate axes we choose. But we could have chosen totally different coordinates $x',y'$ as before, obtaining $\frac{\partial f}{\partial x'},\frac{\partial f}{\partial y'}$ as the rates of change in the directions of the new coordinate axes. Now, how are these related between?
Assume for simplicity that the coordinates change with the tilted axes using an angle $\theta$. As we obtained before, the components of the position vector are related as:
$$
\begin{bmatrix}
x' \\
y'
\end{bmatrix}
=
\begin{bmatrix}
\cos(\theta) & \sin(\theta) \\
-\sin(\theta) & \cos(\theta)
\end{bmatrix}
\begin{bmatrix}
x\\
y
\end{bmatrix}
$$
equivalently:
$$
\begin{bmatrix}
x \\
y
\end{bmatrix}
=
\begin{bmatrix}
\cos(\theta) & -\sin(\theta) \\
\sin(\theta) & \cos(\theta)
\end{bmatrix}
\begin{bmatrix}
x'\\
y'
\end{bmatrix}
$$
Hence, $x,y$ are both combinations of $x',y'$ as $x = \cos(\theta)x' - \sin(\theta)y'$, $y=\sin(\theta)x' + \cos(\theta)y'$. So, one can use the chain rule to compute
$$
\begin{aligned}
\frac{\partial f}{\partial x'} &= \frac{\partial{f}}{\partial x}\frac{\partial x}{\partial x'} + \frac{\partial{f}}{\partial y}\frac{\partial y}{\partial x'} = \frac{\partial{f}}{\partial x}\cos(\theta) +\frac{\partial{f}}{\partial y}\sin(\theta) \\
\frac{\partial f}{\partial y'} &= \frac{\partial{f}}{\partial x}\frac{\partial x}{\partial y'} + \frac{\partial{f}}{\partial y}\frac{\partial y}{\partial y'} = -\frac{\partial{f}}{\partial x}\sin(\theta) +\frac{\partial{f}}{\partial y}\cos(\theta) \\
\end{aligned}
$$
equivalently:
$$
\begin{bmatrix}
\frac{\partial f}{\partial x'} \\
\frac{\partial f}{\partial y'}
\end{bmatrix}
=
\begin{bmatrix}
\cos(\theta) & \sin(\theta) \\
-\sin(\theta) & \cos(\theta)
\end{bmatrix}
\begin{bmatrix}
\frac{\partial f}{\partial x}\\
\frac{\partial f}{\partial y}
\end{bmatrix}
$$
One can do a similar reasoning for general coordinate transformations as well.
Thus, the numbers $\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}$ transforms like vectors! Hence, $(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})$ are the components of a vector (the gradient vector) for the chosen coordinate system.
Now, this can be thought as the "proof" (informally) that the gradient is a vector. However, more deeply, what does it mean for the gradient to be a vector?
As you said so, the gradient is the direction of steepest ascent for a function $f$. Lets try to start with that object, without thinking about partial derivatives. If you have your function $f$ (pictured as huge a blanket with its valleys and hills) which depends on the position vector, the direction of steepest ascent would depend only on the geometry of $f$. Thus, by construction, the direction of steepest ascent is a vector and won't depend on how you choose the coordinate axes, but only its components will. Despite this, one might try to find what are the components of such vector in the current coordinate system. As others have pointed out, these components correspond to the partial derivatives (with respect to the chosen coordinate system variables $x,y$) which we showed transform consistently as vector components.
So, in summary, it is not that the $(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})$ comprise a vector, but that they are the components of something that IS a vector, this is, the gradient vector which is invariant/agnostic to the particular coordinate system, namely the direction of steepest ascent.
I hope this helps!