fork of smhasher from https://github.com/rurban/smhasher

vaeringjar fa740201ff Just adding the hashes results to te repo to be done with it. 7 年之前
doc 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 年之前
.gitignore 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 年之前
.travis.yml fa3d00541f make VERBOSE=1 7 年之前
AvalancheTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
AvalancheTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
Bitslice.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
Bitvec.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
Bitvec.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
CMakeLists.txt 30531c5697 CMakeLists.txt: todo #10 7 年之前
City.cpp 2b28880295 Successfully builds on FreeBSD 10.2 8 年之前
City.h c00a6f1993 Add experimental CityHash32WithSeed 9 年之前
CityCrc.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
CityTest.cpp c00a6f1993 Add experimental CityHash32WithSeed 9 年之前
DifferentialTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
DifferentialTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
FarmTest.cc 0414c0c2fe add Google FarmHash, deprecating CityHash 9 年之前
Hashes.cpp 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 年之前
Hashes.h 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 年之前
KeysetTest.cpp 9c9619c3be FNV32a_YoshimitsuTRIAD: Fix AppendedZeroes 9 年之前
KeysetTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
MurmurHash1.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
MurmurHash1.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
MurmurHash2.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
MurmurHash2.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
MurmurHash3.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
MurmurHash3.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
PMurHash.c 7308eb413a add JenkinsOOAT_perl (an old bad artefact, used in perl until 5.16) 10 年之前
PMurHash.h f834feeb52 fixes for Windows and building with MSVC. 8 年之前
Platform.cpp 2b28880295 Successfully builds on FreeBSD 10.2 8 年之前
Platform.h f834feeb52 fixes for Windows and building with MSVC. 8 年之前
README.md 00d5c185e9 Fix Typos in README summary table #34 7 年之前
Random.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
Random.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
SpeedTest.cpp 9903df95bf Serialize invocations of hash functions 8 年之前
SpeedTest.h 237a0eca39 Add Average results to SpeedTest 9 年之前
Spooky.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
Spooky.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
SpookyTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
Stats.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
Stats.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
SuperFastHash.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
Types.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
Types.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
build.sh 93169294cd Fix sha1 on BIG_ENDIAN 7 年之前
build32.sh 1237a2fd5e minor fixes 8 年之前
cmetrohash.h b2d8e49c0b Added C implementation of MetroHash64 9 年之前
cmetrohash64.c 978496eaac make cmetrohash optimized variant standalone 9 年之前
crc.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
crc32-generated-constants.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
crc32_hw.c b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
crc32_hw1.c b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
crc32c.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
crc32c.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
falkhash-elf64.o ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 年之前
falkhash-macho64.o ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 年之前
falkhash.asm ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 年之前
farmhash-c-test.cc 271d1b3e7b add farmhash (C99) 8 年之前
farmhash-c.c fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 年之前
farmhash-c.h 271d1b3e7b add farmhash (C99) 8 年之前
farmhash.cc fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 年之前
farmhash.h 79df5459ca update farmhash (1.1) 8 年之前
fasthash.cpp fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 年之前
fasthash.h aa7c4d9c37 add fasthash 9 年之前
fhtw.asm b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
fhtw.h b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
halfsiphash.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 年之前
hasshe2.c b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
jody_hash.c 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 年之前
jody_hash.h 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 年之前
log.hashes fa740201ff Just adding the hashes results to te repo to be done with it. 7 年之前
log.speed fa740201ff Just adding the hashes results to te repo to be done with it. 7 年之前
lookup3.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
main.cpp 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. 7 年之前
md5.cpp 3a8aac7ba9 Fixes for issue #24 (#25) 7 年之前
metrohash.h c063c385e8 add metrohash64crc, do crc hw detection for metro 9 年之前
metrohash128.cpp 84af152c2a add metrohash 9 年之前
metrohash128crc.cpp 84af152c2a add metrohash 9 年之前
metrohash64.cpp 84af152c2a add metrohash 9 年之前
metrohash64crc.cpp c063c385e8 add metrohash64crc, do crc hw detection for metro 9 年之前
mum.cc f8fbccfffc t1ha: adds MUM hash into SMHasher. 8 年之前
mum.h 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 年之前
opt_cmetrohash.h ba0f53885b Added my optimized cmetrohash64, noop oaat read (speed reference) 9 年之前
opt_cmetrohash64_1.c fe3f367e04 smhasher: fix MSVC2015 x64 build. 7 年之前
os_dependent_stuff.asm b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
sha1.cpp 93169294cd Fix sha1 on BIG_ENDIAN 7 年之前
sha1.h 3a8aac7ba9 Fixes for issue #24 (#25) 7 年之前
siphash.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 年之前
siphash.h 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 年之前
siphash_impl.h 1563d5ada2 add SipHash 10 年之前
siphash_sse2.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 年之前
siphash_ssse3.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 年之前
speed.sh 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 7 年之前
speedall.sh 1a94b0b8c2 add speedall.sh 9 年之前
split.pl b6f22125e2 initial extension of https://code.google.com/p/smhasher 10 年之前
t1ha.c 6cfc20e8db t1ha: update for mainstream (fixes for MSVC and GCC 4.4) (#27) 7 年之前
t1ha.h 6cfc20e8db t1ha: update for mainstream (fixes for MSVC and GCC 4.4) (#27) 7 年之前
testall.sh 6bf9739e99 testall.sh: use cmake in build dir 9 年之前
testspeed.sh ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash 9 年之前
xxhash.c be3c952c60 XXH_256: harmonize int types 9 年之前
xxhash.h 0a1aba751a add xxHash 9 年之前

README.md

SMhasher

Hash function MiB/sec cycles/hash Quality problems
donothing32 26868545.30 6.11 test NOP
donothing64 18810836.72 6.09 test NOP
donothing128 24694622.36 5.11 test NOP
NOP_OAAT_read64 2791.35 21.28 test NOP
BadHash 468.31 105.42 test FAIL
sumhash 11058.06 25.41 test FAIL
sumhash32 26601.48 15.61 test FAIL
--------------------------------------
crc32 352.29 143.32 insecure, 8589.93x collisions, distrib
md5_32a 277.56 716.47 8589.93x collisions, distrib
sha1_32a 324.16 1427.58 collisions, 36.6% distrib
hasshe2 1639.48 103.19 insecure,100% bias, collisions, distrib
crc32_hw 8103.92 32.36 insecure,100% bias, collisions, distrib
crc64_hw 7624.95 35.12 insecure,100% bias, collisions, distrib
crc32_hw1 22610.60 36.80 insecure,100% bias, collisions, distrib
FNV1a 791.84 69.21 zeros,100% bias, collisions, distrib
FNV1a_YT 8328.60 27.78 100% bias, collisions, distrib
FNV64 791.84 70.17 100% bias, collisions, distrib
bernstein 715.21 74.00 100% bias, collisions, distrib
sdbm 791.84 66.51 100% bias, collisions, distrib
x17 637.20 87.19 99.98% bias, collisions, distrib
JenkinsOOAT 412.39 154.66 53.5% bias, collisions, distrib
JenkinsOOAT_pl 452.50 118.25 1.5-11.5% bias, 7.2x collisions
MicroOAAT 689.16 83.45 100% bias, distrib
lookup3 1735.25 48.78 28% bias, collisions, 30% distr
superfast 2045.98 52.16 91% bias, 5273.01x collisions, 37% distr
MurmurOAAT 465.93 110.78 collisions, 99.998% distr
Crap8 2844.03 37.48 2.42% bias, collisions, 2% distrib
Murmur2 3147.50 40.85 1.7% bias, 81x coll, 1.7% distrib
Murmur2A 2843.03 47.78 12.7% bias
Murmur2B 6155.72 57.48 1.8% bias, collisions, 3.4% distrib
Murmur2C 3633.24 46.88 91% bias, collisions, distr
HalfSipHash 587.04 145.03 zeroes
--------------------------------------
GoodOAAT 929.59 79.06
SipHash 951.02 145.97
SipHash13 1678.76 115.11 0.9% bias
PMurHash32 2436.65 60.15
Murmur3A 2364.40 52.83
Murmur3C 2468.70 79.44
Murmur3F 4376.29 55.03
fasthash32 4881.26 55.11
fasthash64 5404.17 46.73
City32 3523.77 53.76
City64 8728.22 54.72 2 minor collisions
City128 9769.19 63.38
CityCrc128 13730.97 85.88
FarmHash64 8711.25 58.16 machine-specific
FarmHash128 9738.29 79.91 machine-specific
FarmHash32 24831.45 24.99 disabled. too machine-specific
farmhash32_c 24647.21 25.36
farmhash64_c 7886.79 64.48
farmhash128_c 9770.40 79.59
Spooky32 9944.88 60.12
Spooky64 9943.72 60.16
Spooky128 9936.26 60.10
xxHash32 4914.62 64.17 collisions with 4bit diff
xxHash64 8474.87 61.57
metrohash64_1 8177.73 56.05
metrohash64_2 9064.45 50.83
metrohash128_1 7931.50 65.88
metrohash128_2 8779.11 59.36
metrohash64crc_1 15827.71 55.72 cyclic collisions 8 byte
metrohash64crc_2 16072.41 56.79 cyclic collisions 8 byte
metrohash128crc_1 15468.70 66.62
metrohash128crc_2 14100.80 71.89
cmetrohash64_1_o 9054.06 50.66
cmetrohash64_1 8135.24 55.96
cmetrohash64_2 9046.35 50.95
falkhash 19888.45 173.92
t1ha 15480.28 26.41
t1ha_64be 5203.00 53.69
t1ha_32le 8930.90 29.79
t1ha_32be 6931.84 34.17
t1ha_crc 16757.73 28.69
t1ha_aes 37299.66 25.68 machine-specific
MUM 7763.68 41.28 machine-specific

Summary

I added some SSE assisted hashes and fast intel/arm CRC32-C and AES HW variants, but not the fastest crcutil yet. See our crcutil results. See also the old https://code.google.com/p/smhasher/w/list.

So the fastest hash functions on x86_64 without quality problems are:

  • falkhash (macho64 and elf64 nasm only, with HW AES extension)
  • t1ha + mum (machine specific, mum: different arch results)
  • FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
  • Metro (but not 64crc yet, WIP)
  • Spooky32
  • xxHash64
  • fasthash
  • City (deprecated)

Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit.

Typical median key size in perl5 is 20, the most common 4. See github.com/rurban/perl-hash-stats

When used in a hash table the instruction cache will usually beat the CPU and throughput measured here. In my tests the smallest FNV1A beats the fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash functions will lead to more collisions, the overall speed advantage beats the slightly worse quality. See e.g. A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing for a concise overview of the best hash table strategies, confirming that the simpliest Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash functions (Tabulation, Murmur, Farm, ...) when used in a hash table.

The fast hash functions tested here are recommendable as fast for file digests and maybe bigger databases, but not for 32bit hash tables. The "Quality problems" lead to less uniform distribution, i.e. more collisions and worse performance, but are rarely related to real security attacks, just the 2nd sanity test against \0 invariance is security relevant.

Other

TODO

Some popular SSE-improved FNV1 (sanmayce) variants, fletcher (ZFS), ... and slower cryptographic hashes or more secure hashes are still missing. BLAKE2, SHA-2, SHA-3 (Keccak), Grøstl, JH, Skein, ...

SECURITY

The hash table attacks described in SipHash against City, Murmur or Perl JenkinsOAAT or at Hash Function Lounge are not included here.

Such an attack avoidance cannot not be the problem of the hash function, but the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from collision timings and independly the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists. Linked lists chaining allows high load factors, but is very cache-unfriendly. The only recommendable linked list scheme is inlining the key or hash into the array. Nowadays everybody uses fast open addressing, even if the load factor needs to be ~50%, unless you use Cuckoo Hashing.

I.e. the usage of SipHash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. SipHash is not secure enough for security purposes and not fast enough for general usage. Brute-force generation of ~32k collisions need 2-4m for all these hashes. siphash being the slowest needs max 4m, other typically max 2m30s, with <10s for practical 16k collision attacks with all hash functions. Using Murmur is usually slower than a simple Mult, even in the worst case. Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic collision scheme (a tree) or a linear collision scheme, such as Robin Hood or Cockoo hashing with collision counting.

One more note regarding security: Nowadays even SHA1 can be solved in a solver, like Z3 (or faster ones) for practical hash table collision attacks (i.e. 14-20 bits). So all hash functions with less than 256 bits tested here cannot be considered "secure" at all.

The '\0' vulnerability attack with binary keys is tested in the 2nd Sanity test.