fork of smhasher from https://github.com/rurban/smhasher

vaeringjar fa740201ff Just adding the hashes results to te repo to be done with it. преди 7 години
doc 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 преди 7 години
.gitignore 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. преди 7 години
.travis.yml fa3d00541f make VERBOSE=1 преди 7 години
AvalancheTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
AvalancheTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
Bitslice.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
Bitvec.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
Bitvec.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
CMakeLists.txt 30531c5697 CMakeLists.txt: todo #10 преди 7 години
City.cpp 2b28880295 Successfully builds on FreeBSD 10.2 преди 8 години
City.h c00a6f1993 Add experimental CityHash32WithSeed преди 9 години
CityCrc.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
CityTest.cpp c00a6f1993 Add experimental CityHash32WithSeed преди 9 години
DifferentialTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
DifferentialTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
FarmTest.cc 0414c0c2fe add Google FarmHash, deprecating CityHash преди 9 години
Hashes.cpp 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. преди 7 години
Hashes.h 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. преди 7 години
KeysetTest.cpp 9c9619c3be FNV32a_YoshimitsuTRIAD: Fix AppendedZeroes преди 9 години
KeysetTest.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
MurmurHash1.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
MurmurHash1.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
MurmurHash2.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
MurmurHash2.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
MurmurHash3.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
MurmurHash3.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
PMurHash.c 7308eb413a add JenkinsOOAT_perl (an old bad artefact, used in perl until 5.16) преди 10 години
PMurHash.h f834feeb52 fixes for Windows and building with MSVC. преди 8 години
Platform.cpp 2b28880295 Successfully builds on FreeBSD 10.2 преди 8 години
Platform.h f834feeb52 fixes for Windows and building with MSVC. преди 8 години
README.md 00d5c185e9 Fix Typos in README summary table #34 преди 7 години
Random.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
Random.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
SpeedTest.cpp 9903df95bf Serialize invocations of hash functions преди 8 години
SpeedTest.h 237a0eca39 Add Average results to SpeedTest преди 9 години
Spooky.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
Spooky.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
SpookyTest.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
Stats.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
Stats.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
SuperFastHash.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
Types.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
Types.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
build.sh 93169294cd Fix sha1 on BIG_ENDIAN преди 7 години
build32.sh 1237a2fd5e minor fixes преди 8 години
cmetrohash.h b2d8e49c0b Added C implementation of MetroHash64 преди 9 години
cmetrohash64.c 978496eaac make cmetrohash optimized variant standalone преди 9 години
crc.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
crc32-generated-constants.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
crc32_hw.c b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
crc32_hw1.c b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
crc32c.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
crc32c.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
falkhash-elf64.o ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash преди 9 години
falkhash-macho64.o ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash преди 9 години
falkhash.asm ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash преди 9 години
farmhash-c-test.cc 271d1b3e7b add farmhash (C99) преди 8 години
farmhash-c.c fe3f367e04 smhasher: fix MSVC2015 x64 build. преди 7 години
farmhash-c.h 271d1b3e7b add farmhash (C99) преди 8 години
farmhash.cc fe3f367e04 smhasher: fix MSVC2015 x64 build. преди 7 години
farmhash.h 79df5459ca update farmhash (1.1) преди 8 години
fasthash.cpp fe3f367e04 smhasher: fix MSVC2015 x64 build. преди 7 години
fasthash.h aa7c4d9c37 add fasthash преди 9 години
fhtw.asm b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
fhtw.h b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
halfsiphash.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 преди 7 години
hasshe2.c b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
jody_hash.c 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. преди 7 години
jody_hash.h 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. преди 7 години
log.hashes fa740201ff Just adding the hashes results to te repo to be done with it. преди 7 години
log.speed fa740201ff Just adding the hashes results to te repo to be done with it. преди 7 години
lookup3.cpp b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
main.cpp 1a2fecf496 Added to the root dir the jodyhash .c and .h by convention of the project. преди 7 години
md5.cpp 3a8aac7ba9 Fixes for issue #24 (#25) преди 7 години
metrohash.h c063c385e8 add metrohash64crc, do crc hw detection for metro преди 9 години
metrohash128.cpp 84af152c2a add metrohash преди 9 години
metrohash128crc.cpp 84af152c2a add metrohash преди 9 години
metrohash64.cpp 84af152c2a add metrohash преди 9 години
metrohash64crc.cpp c063c385e8 add metrohash64crc, do crc hw detection for metro преди 9 години
mum.cc f8fbccfffc t1ha: adds MUM hash into SMHasher. преди 8 години
mum.h 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 преди 7 години
opt_cmetrohash.h ba0f53885b Added my optimized cmetrohash64, noop oaat read (speed reference) преди 9 години
opt_cmetrohash64_1.c fe3f367e04 smhasher: fix MSVC2015 x64 build. преди 7 години
os_dependent_stuff.asm b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
sha1.cpp 93169294cd Fix sha1 on BIG_ENDIAN преди 7 години
sha1.h 3a8aac7ba9 Fixes for issue #24 (#25) преди 7 години
siphash.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 преди 7 години
siphash.h 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 преди 7 години
siphash_impl.h 1563d5ada2 add SipHash преди 10 години
siphash_sse2.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 преди 7 години
siphash_ssse3.c 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 преди 7 години
speed.sh 6654927581 HalfSipHash, SipHash13, fix crc32_hw1 преди 7 години
speedall.sh 1a94b0b8c2 add speedall.sh преди 9 години
split.pl b6f22125e2 initial extension of https://code.google.com/p/smhasher преди 10 години
t1ha.c 6cfc20e8db t1ha: update for mainstream (fixes for MSVC and GCC 4.4) (#27) преди 7 години
t1ha.h 6cfc20e8db t1ha: update for mainstream (fixes for MSVC and GCC 4.4) (#27) преди 7 години
testall.sh 6bf9739e99 testall.sh: use cmake in build dir преди 9 години
testspeed.sh ff0eab7f6c add falkhash.asm from https://github.com/gamozolabs/falkhash преди 9 години
xxhash.c be3c952c60 XXH_256: harmonize int types преди 9 години
xxhash.h 0a1aba751a add xxHash преди 9 години

README.md

SMhasher

Hash function MiB/sec cycles/hash Quality problems
donothing32 26868545.30 6.11 test NOP
donothing64 18810836.72 6.09 test NOP
donothing128 24694622.36 5.11 test NOP
NOP_OAAT_read64 2791.35 21.28 test NOP
BadHash 468.31 105.42 test FAIL
sumhash 11058.06 25.41 test FAIL
sumhash32 26601.48 15.61 test FAIL
--------------------------------------
crc32 352.29 143.32 insecure, 8589.93x collisions, distrib
md5_32a 277.56 716.47 8589.93x collisions, distrib
sha1_32a 324.16 1427.58 collisions, 36.6% distrib
hasshe2 1639.48 103.19 insecure,100% bias, collisions, distrib
crc32_hw 8103.92 32.36 insecure,100% bias, collisions, distrib
crc64_hw 7624.95 35.12 insecure,100% bias, collisions, distrib
crc32_hw1 22610.60 36.80 insecure,100% bias, collisions, distrib
FNV1a 791.84 69.21 zeros,100% bias, collisions, distrib
FNV1a_YT 8328.60 27.78 100% bias, collisions, distrib
FNV64 791.84 70.17 100% bias, collisions, distrib
bernstein 715.21 74.00 100% bias, collisions, distrib
sdbm 791.84 66.51 100% bias, collisions, distrib
x17 637.20 87.19 99.98% bias, collisions, distrib
JenkinsOOAT 412.39 154.66 53.5% bias, collisions, distrib
JenkinsOOAT_pl 452.50 118.25 1.5-11.5% bias, 7.2x collisions
MicroOAAT 689.16 83.45 100% bias, distrib
lookup3 1735.25 48.78 28% bias, collisions, 30% distr
superfast 2045.98 52.16 91% bias, 5273.01x collisions, 37% distr
MurmurOAAT 465.93 110.78 collisions, 99.998% distr
Crap8 2844.03 37.48 2.42% bias, collisions, 2% distrib
Murmur2 3147.50 40.85 1.7% bias, 81x coll, 1.7% distrib
Murmur2A 2843.03 47.78 12.7% bias
Murmur2B 6155.72 57.48 1.8% bias, collisions, 3.4% distrib
Murmur2C 3633.24 46.88 91% bias, collisions, distr
HalfSipHash 587.04 145.03 zeroes
--------------------------------------
GoodOAAT 929.59 79.06
SipHash 951.02 145.97
SipHash13 1678.76 115.11 0.9% bias
PMurHash32 2436.65 60.15
Murmur3A 2364.40 52.83
Murmur3C 2468.70 79.44
Murmur3F 4376.29 55.03
fasthash32 4881.26 55.11
fasthash64 5404.17 46.73
City32 3523.77 53.76
City64 8728.22 54.72 2 minor collisions
City128 9769.19 63.38
CityCrc128 13730.97 85.88
FarmHash64 8711.25 58.16 machine-specific
FarmHash128 9738.29 79.91 machine-specific
FarmHash32 24831.45 24.99 disabled. too machine-specific
farmhash32_c 24647.21 25.36
farmhash64_c 7886.79 64.48
farmhash128_c 9770.40 79.59
Spooky32 9944.88 60.12
Spooky64 9943.72 60.16
Spooky128 9936.26 60.10
xxHash32 4914.62 64.17 collisions with 4bit diff
xxHash64 8474.87 61.57
metrohash64_1 8177.73 56.05
metrohash64_2 9064.45 50.83
metrohash128_1 7931.50 65.88
metrohash128_2 8779.11 59.36
metrohash64crc_1 15827.71 55.72 cyclic collisions 8 byte
metrohash64crc_2 16072.41 56.79 cyclic collisions 8 byte
metrohash128crc_1 15468.70 66.62
metrohash128crc_2 14100.80 71.89
cmetrohash64_1_o 9054.06 50.66
cmetrohash64_1 8135.24 55.96
cmetrohash64_2 9046.35 50.95
falkhash 19888.45 173.92
t1ha 15480.28 26.41
t1ha_64be 5203.00 53.69
t1ha_32le 8930.90 29.79
t1ha_32be 6931.84 34.17
t1ha_crc 16757.73 28.69
t1ha_aes 37299.66 25.68 machine-specific
MUM 7763.68 41.28 machine-specific

Summary

I added some SSE assisted hashes and fast intel/arm CRC32-C and AES HW variants, but not the fastest crcutil yet. See our crcutil results. See also the old https://code.google.com/p/smhasher/w/list.

So the fastest hash functions on x86_64 without quality problems are:

  • falkhash (macho64 and elf64 nasm only, with HW AES extension)
  • t1ha + mum (machine specific, mum: different arch results)
  • FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
  • Metro (but not 64crc yet, WIP)
  • Spooky32
  • xxHash64
  • fasthash
  • City (deprecated)

Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit.

Typical median key size in perl5 is 20, the most common 4. See github.com/rurban/perl-hash-stats

When used in a hash table the instruction cache will usually beat the CPU and throughput measured here. In my tests the smallest FNV1A beats the fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash functions will lead to more collisions, the overall speed advantage beats the slightly worse quality. See e.g. A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing for a concise overview of the best hash table strategies, confirming that the simpliest Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash functions (Tabulation, Murmur, Farm, ...) when used in a hash table.

The fast hash functions tested here are recommendable as fast for file digests and maybe bigger databases, but not for 32bit hash tables. The "Quality problems" lead to less uniform distribution, i.e. more collisions and worse performance, but are rarely related to real security attacks, just the 2nd sanity test against \0 invariance is security relevant.

Other

TODO

Some popular SSE-improved FNV1 (sanmayce) variants, fletcher (ZFS), ... and slower cryptographic hashes or more secure hashes are still missing. BLAKE2, SHA-2, SHA-3 (Keccak), Grøstl, JH, Skein, ...

SECURITY

The hash table attacks described in SipHash against City, Murmur or Perl JenkinsOAAT or at Hash Function Lounge are not included here.

Such an attack avoidance cannot not be the problem of the hash function, but the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from collision timings and independly the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists. Linked lists chaining allows high load factors, but is very cache-unfriendly. The only recommendable linked list scheme is inlining the key or hash into the array. Nowadays everybody uses fast open addressing, even if the load factor needs to be ~50%, unless you use Cuckoo Hashing.

I.e. the usage of SipHash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. SipHash is not secure enough for security purposes and not fast enough for general usage. Brute-force generation of ~32k collisions need 2-4m for all these hashes. siphash being the slowest needs max 4m, other typically max 2m30s, with <10s for practical 16k collision attacks with all hash functions. Using Murmur is usually slower than a simple Mult, even in the worst case. Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic collision scheme (a tree) or a linear collision scheme, such as Robin Hood or Cockoo hashing with collision counting.

One more note regarding security: Nowadays even SHA1 can be solved in a solver, like Z3 (or faster ones) for practical hash table collision attacks (i.e. 14-20 bits). So all hash functions with less than 256 bits tested here cannot be considered "secure" at all.

The '\0' vulnerability attack with binary keys is tested in the 2nd Sanity test.