In first place, the temperature is a quantity that measures thermal equilibrium by the zeroth law of thermodynamics.
We have the contact with this quantity with a thermal equilibrium can do. For example, the Celsius units is constructed by define $0°~\rm C$ as the volume of mercury in contact with freezing water and $100 °~\rm C$ as the volume of mercury in contact with boiling water.
With more refinement, we may found a better scale for temperature, the Kelvin scale. In this scale the temperature is always positive and the energy in heat channel is expressed by:
$$
T\cdot \mathrm{d}S
$$
where $S$ is the entropy (some mysterious function of state).
Now, with statistical mechanics, the entropy is identified by a measure of information ignored in your description of the system in units of a tiny constant value (in front with macroscopic units) $k_b$, the Boltzmann's constant, in a Napierian basis.
$$
S=k_bI_e \\ I_e=-\sum_{i=1}^{N}p_i \ln(p_i)
$$
where $I_b$ is a Shannon entropy with $b=e\;.$
If we change again the unit of temperature in units of energy per $k_b$ (you can do this by send $k_b=1$), the temperature is now the energy per unit of information ignored. This means that when we ignore information, the mean energy increase by the ratio of temperature. $$d\langle E \rangle=T\cdot \mathrm{d}I_e$$ where $\langle E \rangle$ is the mean energy.
Note that now we can define a lot of units for temperature in terms of $\mathrm{\frac{Energy}{constant}}\,,$ when this constant is defined by the connection of $I_b$ and $S\,,$ for different basis. For canonical ensemble, the best basis is in fact is the Napierian. For microcanonical ensemble, the better basis is the basis that respect the decomposition of the system in subsystems.