fork of smhasher from https://github.com/rurban/smhasher

vaeringjar fa740201ff Just adding the hashes results to te repo to be done with it. 7 tahun lalu
doc 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 tahun lalu
.gitignore 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 tahun lalu
.travis.yml fa3d00541f make VERBOSE=1 7 tahun lalu
AvalancheTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
AvalancheTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
Bitslice.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
Bitvec.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
Bitvec.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
CMakeLists.txt 30531c5697 CMakeLists.txt: todo #10 7 tahun lalu
City.cpp 2b28880295 Successfully builds on FreeBSD 10.2 8 tahun lalu
City.h c00a6f1993 Add experimental CityHash32WithSeed 9 tahun lalu
CityCrc.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
CityTest.cpp c00a6f1993 Add experimental CityHash32WithSeed 9 tahun lalu
DifferentialTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
DifferentialTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
FarmTest.cc 0414c0c2fe add Google FarmHash, deprecating CityHash 9 tahun lalu
Hashes.cpp 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 tahun lalu
Hashes.h 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 tahun lalu
KeysetTest.cpp 9c9619c3be FNV32a_YoshimitsuTRIAD: Fix AppendedZeroes 9 tahun lalu
KeysetTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
MurmurHash1.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
MurmurHash1.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
MurmurHash2.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
MurmurHash2.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
MurmurHash3.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
MurmurHash3.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
PMurHash.c 7308eb413a add JenkinsOOAT_perl (an old bad artefact, used in perl until 5.16) 10 tahun lalu
PMurHash.h f834feeb52 fixes for Windows and building with MSVC. 8 tahun lalu
Platform.cpp 2b28880295 Successfully builds on FreeBSD 10.2 8 tahun lalu
Platform.h f834feeb52 fixes for Windows and building with MSVC. 8 tahun lalu
README.md 00d5c185e9 Fix Typos in README summary table #34 7 tahun lalu
Random.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
Random.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
SpeedTest.cpp 9903df95bf Serialize invocations of hash functions 8 tahun lalu
SpeedTest.h 237a0eca39 Add Average results to SpeedTest 9 tahun lalu
Spooky.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
Spooky.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
SpookyTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
Stats.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
Stats.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
SuperFastHash.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
Types.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
Types.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
build.sh 93169294cd Fix sha1 on BIG_ENDIAN 7 tahun lalu
build32.sh 1237a2fd5e minor fixes 8 tahun lalu
cmetrohash.h b2d8e49c0b Added C implementation of MetroHash64 9 tahun lalu
cmetrohash64.c 978496eaac make cmetrohash optimized variant standalone 9 tahun lalu
crc.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
crc32-generated-constants.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
crc32_hw.c b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
crc32_hw1.c b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
crc32c.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
crc32c.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
falkhash-elf64.o ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 tahun lalu
falkhash-macho64.o ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 tahun lalu
falkhash.asm ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 tahun lalu
farmhash-c-test.cc 271d1b3e7b add farmhash (C99) 8 tahun lalu
farmhash-c.c fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 tahun lalu
farmhash-c.h 271d1b3e7b add farmhash (C99) 8 tahun lalu
farmhash.cc fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 tahun lalu
farmhash.h 79df5459ca update farmhash (1.1) 8 tahun lalu
fasthash.cpp fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 tahun lalu
fasthash.h aa7c4d9c37 add fasthash 9 tahun lalu
fhtw.asm b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
fhtw.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
halfsiphash.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 tahun lalu
hasshe2.c b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
jody_hash.c 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 tahun lalu
jody_hash.h 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 tahun lalu
log.hashes fa740201ff Just adding the hashes results to te repo to be done with it. 7 tahun lalu
log.speed fa740201ff Just adding the hashes results to te repo to be done with it. 7 tahun lalu
lookup3.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
main.cpp 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 tahun lalu
md5.cpp 3a8aac7ba9 Fixes for issue #24 (#25) 7 tahun lalu
metrohash.h c063c385e8 add metrohash64crc, do crc hw detection for metro 9 tahun lalu
metrohash128.cpp 84af152c2a add metrohash 9 tahun lalu
metrohash128crc.cpp 84af152c2a add metrohash 9 tahun lalu
metrohash64.cpp 84af152c2a add metrohash 9 tahun lalu
metrohash64crc.cpp c063c385e8 add metrohash64crc, do crc hw detection for metro 9 tahun lalu
mum.cc f8fbccfffc t1ha: adds MUM hash into SMHasher. 8 tahun lalu
mum.h 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 tahun lalu
opt_cmetrohash.h ba0f53885b Added my optimized cmetrohash64, noop oaat read (speed reference) 9 tahun lalu
opt_cmetrohash64_1.c fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 tahun lalu
os_dependent_stuff.asm b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
sha1.cpp 93169294cd Fix sha1 on BIG_ENDIAN 7 tahun lalu
sha1.h 3a8aac7ba9 Fixes for issue #24 (#25) 7 tahun lalu
siphash.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 tahun lalu
siphash.h 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 tahun lalu
siphash_impl.h 1563d5ada2 add SipHash 10 tahun lalu
siphash_sse2.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 tahun lalu
siphash_ssse3.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 tahun lalu
speed.sh 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 tahun lalu
speedall.sh 1a94b0b8c2 add speedall.sh 9 tahun lalu
split.pl b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 tahun lalu
t1ha.c 6cfc20e8db t1ha: update for mainstream (fixes for MSVC and GCC 4.4) (#27) 7 tahun lalu
t1ha.h 6cfc20e8db t1ha: update for mainstream (fixes for MSVC and GCC 4.4) (#27) 7 tahun lalu
testall.sh 6bf9739e99 testall.sh: use cmake in build dir 9 tahun lalu
testspeed.sh ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 tahun lalu
xxhash.c be3c952c60 XXH_256: harmonize int types 9 tahun lalu
xxhash.h 0a1aba751a add xxHash 9 tahun lalu

README.md

SMhasher

Hash function MiB/sec cycles/hash Quality problems
donothing32 26868545.30 6.11 test NOP
donothing64 18810836.72 6.09 test NOP
donothing128 24694622.36 5.11 test NOP
NOP_OAAT_read64 2791.35 21.28 test NOP
BadHash 468.31 105.42 test FAIL
sumhash 11058.06 25.41 test FAIL
sumhash32 26601.48 15.61 test FAIL
--------------------------------------
crc32 352.29 143.32 insecure, 8589.93x collisions, distrib
md5_32a 277.56 716.47 8589.93x collisions, distrib
sha1_32a 324.16 1427.58 collisions, 36.6% distrib
hasshe2 1639.48 103.19 insecure,100% bias, collisions, distrib
crc32_hw 8103.92 32.36 insecure,100% bias, collisions, distrib
crc64_hw 7624.95 35.12 insecure,100% bias, collisions, distrib
crc32_hw1 22610.60 36.80 insecure,100% bias, collisions, distrib
FNV1a 791.84 69.21 zeros,100% bias, collisions, distrib
FNV1a_YT 8328.60 27.78 100% bias, collisions, distrib
FNV64 791.84 70.17 100% bias, collisions, distrib
bernstein 715.21 74.00 100% bias, collisions, distrib
sdbm 791.84 66.51 100% bias, collisions, distrib
x17 637.20 87.19 99.98% bias, collisions, distrib
JenkinsOOAT 412.39 154.66 53.5% bias, collisions, distrib
JenkinsOOAT_pl 452.50 118.25 1.5-11.5% bias, 7.2x collisions
MicroOAAT 689.16 83.45 100% bias, distrib
lookup3 1735.25 48.78 28% bias, collisions, 30% distr
superfast 2045.98 52.16 91% bias, 5273.01x collisions, 37% distr
MurmurOAAT 465.93 110.78 collisions, 99.998% distr
Crap8 2844.03 37.48 2.42% bias, collisions, 2% distrib
Murmur2 3147.50 40.85 1.7% bias, 81x coll, 1.7% distrib
Murmur2A 2843.03 47.78 12.7% bias
Murmur2B 6155.72 57.48 1.8% bias, collisions, 3.4% distrib
Murmur2C 3633.24 46.88 91% bias, collisions, distr
HalfSipHash 587.04 145.03 zeroes
--------------------------------------
GoodOAAT 929.59 79.06
SipHash 951.02 145.97
SipHash13 1678.76 115.11 0.9% bias
PMurHash32 2436.65 60.15
Murmur3A 2364.40 52.83
Murmur3C 2468.70 79.44
Murmur3F 4376.29 55.03
fasthash32 4881.26 55.11
fasthash64 5404.17 46.73
City32 3523.77 53.76
City64 8728.22 54.72 2 minor collisions
City128 9769.19 63.38
CityCrc128 13730.97 85.88
FarmHash64 8711.25 58.16 machine-specific
FarmHash128 9738.29 79.91 machine-specific
FarmHash32 24831.45 24.99 disabled. too machine-specific
farmhash32_c 24647.21 25.36
farmhash64_c 7886.79 64.48
farmhash128_c 9770.40 79.59
Spooky32 9944.88 60.12
Spooky64 9943.72 60.16
Spooky128 9936.26 60.10
xxHash32 4914.62 64.17 collisions with 4bit diff
xxHash64 8474.87 61.57
metrohash64_1 8177.73 56.05
metrohash64_2 9064.45 50.83
metrohash128_1 7931.50 65.88
metrohash128_2 8779.11 59.36
metrohash64crc_1 15827.71 55.72 cyclic collisions 8 byte
metrohash64crc_2 16072.41 56.79 cyclic collisions 8 byte
metrohash128crc_1 15468.70 66.62
metrohash128crc_2 14100.80 71.89
cmetrohash64_1_o 9054.06 50.66
cmetrohash64_1 8135.24 55.96
cmetrohash64_2 9046.35 50.95
falkhash 19888.45 173.92
t1ha 15480.28 26.41
t1ha_64be 5203.00 53.69
t1ha_32le 8930.90 29.79
t1ha_32be 6931.84 34.17
t1ha_crc 16757.73 28.69
t1ha_aes 37299.66 25.68 machine-specific
MUM 7763.68 41.28 machine-specific

Summary

I added some SSE assisted hashes and fast intel/arm CRC32-C and AES HW variants, but not the fastest crcutil yet. See our crcutil results. See also the old https://code.google.com/p/smhasher/w/list.

So the fastest hash functions on x86_64 without quality problems are:

  • falkhash (macho64 and elf64 nasm only, with HW AES extension)
  • t1ha + mum (machine specific, mum: different arch results)
  • FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
  • Metro (but not 64crc yet, WIP)
  • Spooky32
  • xxHash64
  • fasthash
  • City (deprecated)

Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit.

Typical median key size in perl5 is 20, the most common 4. See github.com/rurban/perl-hash-stats

When used in a hash table the instruction cache will usually beat the CPU and throughput measured here. In my tests the smallest FNV1A beats the fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash functions will lead to more collisions, the overall speed advantage beats the slightly worse quality. See e.g. A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing for a concise overview of the best hash table strategies, confirming that the simpliest Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash functions (Tabulation, Murmur, Farm, ...) when used in a hash table.

The fast hash functions tested here are recommendable as fast for file digests and maybe bigger databases, but not for 32bit hash tables. The "Quality problems" lead to less uniform distribution, i.e. more collisions and worse performance, but are rarely related to real security attacks, just the 2nd sanity test against \0 invariance is security relevant.

Other

TODO

Some popular SSE-improved FNV1 (sanmayce) variants, fletcher (ZFS), ... and slower cryptographic hashes or more secure hashes are still missing. BLAKE2, SHA-2, SHA-3 (Keccak), Grøstl, JH, Skein, ...

SECURITY

The hash table attacks described in SipHash against City, Murmur or Perl JenkinsOAAT or at Hash Function Lounge are not included here.

Such an attack avoidance cannot not be the problem of the hash function, but the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from collision timings and independly the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists. Linked lists chaining allows high load factors, but is very cache-unfriendly. The only recommendable linked list scheme is inlining the key or hash into the array. Nowadays everybody uses fast open addressing, even if the load factor needs to be ~50%, unless you use Cuckoo Hashing.

I.e. the usage of SipHash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. SipHash is not secure enough for security purposes and not fast enough for general usage. Brute-force generation of ~32k collisions need 2-4m for all these hashes. siphash being the slowest needs max 4m, other typically max 2m30s, with <10s for practical 16k collision attacks with all hash functions. Using Murmur is usually slower than a simple Mult, even in the worst case. Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic collision scheme (a tree) or a linear collision scheme, such as Robin Hood or Cockoo hashing with collision counting.

One more note regarding security: Nowadays even SHA1 can be solved in a solver, like Z3 (or faster ones) for practical hash table collision attacks (i.e. 14-20 bits). So all hash functions with less than 256 bits tested here cannot be considered "secure" at all.

The '\0' vulnerability attack with binary keys is tested in the 2nd Sanity test.