I've got a microprocessor and want to quickly identify the settings of my application (stored in some eeprom regions) via a fingerprint instead of having to dump the entire memory every time.
So I was going to compute a 32-bit hash from the few kbytes, but I am not sure which algorithm to use. It doesn't need to be cryptographically secure, but it should be fast and not need some large lookup tables. And most importantly, whenever one of the settings (bytes) changes I need to recompute the hash, which I want to do efficiently.
I guessed I could use any regular hashing algorithm and modify it so that it stores intermediate results in a tree-like structure which trades some memory for a logarithmic benefit in recomputation complexity. (As I found in my research, this concept seems to be called merkle tree). But surely there's an even better way to do it.
I then found about rolling hashes which employ some recursive algorithms, generalised to
$H(arr) = f_r(\ldots f_r(f_r(f_r(f_0 \oplus arr_0) \oplus arr_1) \oplus arr_2) \ldots \oplus arr_n)$
that can be expanded to
$H(arr) = f_0 \oplus f^0(arr_0) \oplus f^1(arr_1) \oplus f^2(arr_2) \oplus \ldots \oplus f^n(arr_n)$
This appears perfect for my use case if $\oplus$ is associative, commutative and reversible, which allows to do the recomputation for updating the array at index $i$ by
$H(arr_{new}) = H(arr_{old}) \ominus f^i(arr_{old,i}) \oplus f^i(arr_{new,i})$
The natural choices for $\oplus$ are bitwise XOR and modular addition, but which hash algorithms employ them, use a both simple and fast $f^i$ implementation, and still have good hashing characteristics?
I found some but am not totally satisfied.
- CRC32 is known to be fast, and although all descriptions of the algorithm are iterative, I'm confident to be able to derive a non-recursive $f$ function (knowing that crcs have the property $\operatorname{crc}(x \oplus y \oplus z) = \operatorname{crc}(x) \oplus \operatorname{crc}(y) \oplus \operatorname{crc}(z)$). Still, CRC is used and designed for error detection, not fingerprinting, so maybe it's not suitable at all. Am I missing something there?
- Rabin fingerprint appears to be suited better, but I only found iterative implementations with explicit polynom division. Not sure how to derive $f^i$ for this (is it even possible?). Also I'm not certain how it does differ from CRC computation.
- Rabin-Karp rolling hash appears to be a good choice, but I have concerns. Does "modulo $n$" imply I won't get all 32 bit values? Isn't the exponentiation $f^i(x) = x \cdot a^i$ relatively slow? And what constants $a$/$n$ should I use?
- Cyclic polynomial / Buzhash looks easy and efficiently to implement at first. But what about the $h$ in $f^i(x) = rotate(h(x), i)$, wouldn't this need to be a function that generates a 32-bit value from my 8-bit bytes? (I read a lookup table with random data works fine, but 256*4 bytes is too large for me). And isn't this rotation susceptible for swapping two settings that happen to be 32 bytes apart? Could the upper bytes of $i$, which is effectively used modulo 32, be utilised for something?
Can you dispel my doubts about any of these? Or is there some other algorithm that I missed entirely?