fork of smhasher from https://github.com/rurban/smhasher

vaeringjar fa740201ff Just adding the hashes results to te repo to be done with it. 7 gadi atpakaļ
doc 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 gadi atpakaļ
.gitignore 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 gadi atpakaļ
.travis.yml fa3d00541f make VERBOSE=1 7 gadi atpakaļ
AvalancheTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
AvalancheTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
Bitslice.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
Bitvec.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
Bitvec.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
CMakeLists.txt 30531c5697 CMakeLists.txt: todo #10 7 gadi atpakaļ
City.cpp 2b28880295 Successfully builds on FreeBSD 10.2 8 gadi atpakaļ
City.h c00a6f1993 Add experimental CityHash32WithSeed 9 gadi atpakaļ
CityCrc.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
CityTest.cpp c00a6f1993 Add experimental CityHash32WithSeed 9 gadi atpakaļ
DifferentialTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
DifferentialTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
FarmTest.cc 0414c0c2fe add Google FarmHash, deprecating CityHash 9 gadi atpakaļ
Hashes.cpp 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 gadi atpakaļ
Hashes.h 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 gadi atpakaļ
KeysetTest.cpp 9c9619c3be FNV32a_YoshimitsuTRIAD: Fix AppendedZeroes 9 gadi atpakaļ
KeysetTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
MurmurHash1.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
MurmurHash1.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
MurmurHash2.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
MurmurHash2.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
MurmurHash3.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
MurmurHash3.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
PMurHash.c 7308eb413a add JenkinsOOAT_perl (an old bad artefact, used in perl until 5.16) 10 gadi atpakaļ
PMurHash.h f834feeb52 fixes for Windows and building with MSVC. 8 gadi atpakaļ
Platform.cpp 2b28880295 Successfully builds on FreeBSD 10.2 8 gadi atpakaļ
Platform.h f834feeb52 fixes for Windows and building with MSVC. 8 gadi atpakaļ
README.md 00d5c185e9 Fix Typos in README summary table #34 7 gadi atpakaļ
Random.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
Random.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
SpeedTest.cpp 9903df95bf Serialize invocations of hash functions 8 gadi atpakaļ
SpeedTest.h 237a0eca39 Add Average results to SpeedTest 9 gadi atpakaļ
Spooky.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
Spooky.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
SpookyTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
Stats.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
Stats.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
SuperFastHash.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
Types.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
Types.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
build.sh 93169294cd Fix sha1 on BIG_ENDIAN 7 gadi atpakaļ
build32.sh 1237a2fd5e minor fixes 8 gadi atpakaļ
cmetrohash.h b2d8e49c0b Added C implementation of MetroHash64 9 gadi atpakaļ
cmetrohash64.c 978496eaac make cmetrohash optimized variant standalone 9 gadi atpakaļ
crc.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
crc32-generated-constants.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
crc32_hw.c b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
crc32_hw1.c b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
crc32c.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
crc32c.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
falkhash-elf64.o ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 gadi atpakaļ
falkhash-macho64.o ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 gadi atpakaļ
falkhash.asm ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 gadi atpakaļ
farmhash-c-test.cc 271d1b3e7b add farmhash (C99) 8 gadi atpakaļ
farmhash-c.c fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 gadi atpakaļ
farmhash-c.h 271d1b3e7b add farmhash (C99) 8 gadi atpakaļ
farmhash.cc fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 gadi atpakaļ
farmhash.h 79df5459ca update farmhash (1.1) 8 gadi atpakaļ
fasthash.cpp fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 gadi atpakaļ
fasthash.h aa7c4d9c37 add fasthash 9 gadi atpakaļ
fhtw.asm b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
fhtw.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
halfsiphash.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 gadi atpakaļ
hasshe2.c b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
jody_hash.c 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 gadi atpakaļ
jody_hash.h 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 gadi atpakaļ
log.hashes fa740201ff Just adding the hashes results to te repo to be done with it. 7 gadi atpakaļ
log.speed fa740201ff Just adding the hashes results to te repo to be done with it. 7 gadi atpakaļ
lookup3.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
main.cpp 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 gadi atpakaļ
md5.cpp 3a8aac7ba9 Fixes for issue #24 (#25) 7 gadi atpakaļ
metrohash.h c063c385e8 add metrohash64crc, do crc hw detection for metro 9 gadi atpakaļ
metrohash128.cpp 84af152c2a add metrohash 9 gadi atpakaļ
metrohash128crc.cpp 84af152c2a add metrohash 9 gadi atpakaļ
metrohash64.cpp 84af152c2a add metrohash 9 gadi atpakaļ
metrohash64crc.cpp c063c385e8 add metrohash64crc, do crc hw detection for metro 9 gadi atpakaļ
mum.cc f8fbccfffc t1ha: adds MUM hash into SMHasher. 8 gadi atpakaļ
mum.h 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 gadi atpakaļ
opt_cmetrohash.h ba0f53885b Added my optimized cmetrohash64, noop oaat read (speed reference) 9 gadi atpakaļ
opt_cmetrohash64_1.c fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 gadi atpakaļ
os_dependent_stuff.asm b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
sha1.cpp 93169294cd Fix sha1 on BIG_ENDIAN 7 gadi atpakaļ
sha1.h 3a8aac7ba9 Fixes for issue #24 (#25) 7 gadi atpakaļ
siphash.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 gadi atpakaļ
siphash.h 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 gadi atpakaļ
siphash_impl.h 1563d5ada2 add SipHash 10 gadi atpakaļ
siphash_sse2.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 gadi atpakaļ
siphash_ssse3.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 gadi atpakaļ
speed.sh 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 gadi atpakaļ
speedall.sh 1a94b0b8c2 add speedall.sh 9 gadi atpakaļ
split.pl b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 gadi atpakaļ
t1ha.c 6cfc20e8db t1ha: update for mainstream (fixes for MSVC and GCC 4.4) (#27) 7 gadi atpakaļ
t1ha.h 6cfc20e8db t1ha: update for mainstream (fixes for MSVC and GCC 4.4) (#27) 7 gadi atpakaļ
testall.sh 6bf9739e99 testall.sh: use cmake in build dir 9 gadi atpakaļ
testspeed.sh ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 gadi atpakaļ
xxhash.c be3c952c60 XXH_256: harmonize int types 9 gadi atpakaļ
xxhash.h 0a1aba751a add xxHash 9 gadi atpakaļ

README.md

SMhasher

Hash function MiB/sec cycles/hash Quality problems
donothing32 26868545.30 6.11 test NOP
donothing64 18810836.72 6.09 test NOP
donothing128 24694622.36 5.11 test NOP
NOP_OAAT_read64 2791.35 21.28 test NOP
BadHash 468.31 105.42 test FAIL
sumhash 11058.06 25.41 test FAIL
sumhash32 26601.48 15.61 test FAIL
--------------------------------------
crc32 352.29 143.32 insecure, 8589.93x collisions, distrib
md5_32a 277.56 716.47 8589.93x collisions, distrib
sha1_32a 324.16 1427.58 collisions, 36.6% distrib
hasshe2 1639.48 103.19 insecure,100% bias, collisions, distrib
crc32_hw 8103.92 32.36 insecure,100% bias, collisions, distrib
crc64_hw 7624.95 35.12 insecure,100% bias, collisions, distrib
crc32_hw1 22610.60 36.80 insecure,100% bias, collisions, distrib
FNV1a 791.84 69.21 zeros,100% bias, collisions, distrib
FNV1a_YT 8328.60 27.78 100% bias, collisions, distrib
FNV64 791.84 70.17 100% bias, collisions, distrib
bernstein 715.21 74.00 100% bias, collisions, distrib
sdbm 791.84 66.51 100% bias, collisions, distrib
x17 637.20 87.19 99.98% bias, collisions, distrib
JenkinsOOAT 412.39 154.66 53.5% bias, collisions, distrib
JenkinsOOAT_pl 452.50 118.25 1.5-11.5% bias, 7.2x collisions
MicroOAAT 689.16 83.45 100% bias, distrib
lookup3 1735.25 48.78 28% bias, collisions, 30% distr
superfast 2045.98 52.16 91% bias, 5273.01x collisions, 37% distr
MurmurOAAT 465.93 110.78 collisions, 99.998% distr
Crap8 2844.03 37.48 2.42% bias, collisions, 2% distrib
Murmur2 3147.50 40.85 1.7% bias, 81x coll, 1.7% distrib
Murmur2A 2843.03 47.78 12.7% bias
Murmur2B 6155.72 57.48 1.8% bias, collisions, 3.4% distrib
Murmur2C 3633.24 46.88 91% bias, collisions, distr
HalfSipHash 587.04 145.03 zeroes
--------------------------------------
GoodOAAT 929.59 79.06
SipHash 951.02 145.97
SipHash13 1678.76 115.11 0.9% bias
PMurHash32 2436.65 60.15
Murmur3A 2364.40 52.83
Murmur3C 2468.70 79.44
Murmur3F 4376.29 55.03
fasthash32 4881.26 55.11
fasthash64 5404.17 46.73
City32 3523.77 53.76
City64 8728.22 54.72 2 minor collisions
City128 9769.19 63.38
CityCrc128 13730.97 85.88
FarmHash64 8711.25 58.16 machine-specific
FarmHash128 9738.29 79.91 machine-specific
FarmHash32 24831.45 24.99 disabled. too machine-specific
farmhash32_c 24647.21 25.36
farmhash64_c 7886.79 64.48
farmhash128_c 9770.40 79.59
Spooky32 9944.88 60.12
Spooky64 9943.72 60.16
Spooky128 9936.26 60.10
xxHash32 4914.62 64.17 collisions with 4bit diff
xxHash64 8474.87 61.57
metrohash64_1 8177.73 56.05
metrohash64_2 9064.45 50.83
metrohash128_1 7931.50 65.88
metrohash128_2 8779.11 59.36
metrohash64crc_1 15827.71 55.72 cyclic collisions 8 byte
metrohash64crc_2 16072.41 56.79 cyclic collisions 8 byte
metrohash128crc_1 15468.70 66.62
metrohash128crc_2 14100.80 71.89
cmetrohash64_1_o 9054.06 50.66
cmetrohash64_1 8135.24 55.96
cmetrohash64_2 9046.35 50.95
falkhash 19888.45 173.92
t1ha 15480.28 26.41
t1ha_64be 5203.00 53.69
t1ha_32le 8930.90 29.79
t1ha_32be 6931.84 34.17
t1ha_crc 16757.73 28.69
t1ha_aes 37299.66 25.68 machine-specific
MUM 7763.68 41.28 machine-specific

Summary

I added some SSE assisted hashes and fast intel/arm CRC32-C and AES HW variants, but not the fastest crcutil yet. See our crcutil results. See also the old https://code.google.com/p/smhasher/w/list.

So the fastest hash functions on x86_64 without quality problems are:

  • falkhash (macho64 and elf64 nasm only, with HW AES extension)
  • t1ha + mum (machine specific, mum: different arch results)
  • FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
  • Metro (but not 64crc yet, WIP)
  • Spooky32
  • xxHash64
  • fasthash
  • City (deprecated)

Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit.

Typical median key size in perl5 is 20, the most common 4. See github.com/rurban/perl-hash-stats

When used in a hash table the instruction cache will usually beat the CPU and throughput measured here. In my tests the smallest FNV1A beats the fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash functions will lead to more collisions, the overall speed advantage beats the slightly worse quality. See e.g. A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing for a concise overview of the best hash table strategies, confirming that the simpliest Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash functions (Tabulation, Murmur, Farm, ...) when used in a hash table.

The fast hash functions tested here are recommendable as fast for file digests and maybe bigger databases, but not for 32bit hash tables. The "Quality problems" lead to less uniform distribution, i.e. more collisions and worse performance, but are rarely related to real security attacks, just the 2nd sanity test against \0 invariance is security relevant.

Other

TODO

Some popular SSE-improved FNV1 (sanmayce) variants, fletcher (ZFS), ... and slower cryptographic hashes or more secure hashes are still missing. BLAKE2, SHA-2, SHA-3 (Keccak), Grøstl, JH, Skein, ...

SECURITY

The hash table attacks described in SipHash against City, Murmur or Perl JenkinsOAAT or at Hash Function Lounge are not included here.

Such an attack avoidance cannot not be the problem of the hash function, but the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from collision timings and independly the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists. Linked lists chaining allows high load factors, but is very cache-unfriendly. The only recommendable linked list scheme is inlining the key or hash into the array. Nowadays everybody uses fast open addressing, even if the load factor needs to be ~50%, unless you use Cuckoo Hashing.

I.e. the usage of SipHash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. SipHash is not secure enough for security purposes and not fast enough for general usage. Brute-force generation of ~32k collisions need 2-4m for all these hashes. siphash being the slowest needs max 4m, other typically max 2m30s, with <10s for practical 16k collision attacks with all hash functions. Using Murmur is usually slower than a simple Mult, even in the worst case. Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic collision scheme (a tree) or a linear collision scheme, such as Robin Hood or Cockoo hashing with collision counting.

One more note regarding security: Nowadays even SHA1 can be solved in a solver, like Z3 (or faster ones) for practical hash table collision attacks (i.e. 14-20 bits). So all hash functions with less than 256 bits tested here cannot be considered "secure" at all.

The '\0' vulnerability attack with binary keys is tested in the 2nd Sanity test.