Problem
The hashes returned by most hash functions tend to use a lot of memory. For instance, a length-0 Vector{Int64} (e.g. as returned by LpHash) is 40 bytes:
julia> Base.summarysize(Vector{Int64}(undef, 0))
40
Moreover, using these hashes as a key into a database or hash table is difficult since in general they may not understand the datatype being used for the key.
Proposed solution
The solution I'm proposing is to add a function compress_hash that accepts a Vector{<:Integer} or BitArray{1} and converts it into a UInt32, UInt64, or Vector{UInt8}.
- For instance, we could use Julia's built-in
hash function and simple let compress_hash(x) = hash(x), which returns UInt64.
- Alternatively, we could reinterpret
x as an Array{UInt8} and use sha256(x), which returns Vector{UInt8}.
Notes
- It's worth considering whether or not
compress_hash needs to be cryptographically secure. I suspect that it should be in order to be on the safe side for various potential applications of this package. In that case, we will need to define a type such as
struct HashCompressor
salt :: Vector{UInt8}
end
(hashfn::HashCompressor)(x::Vector{UInt8}) = hcat(hashfn.salt, x) |> sha256
- Adding on to the last bullet point: it may be worth looking at the new BLAKE3 as a fast alternative to
sha256, though it's unlikely that we'll be hashing anything large enough to justify going to great lengths in order to do this.
Problem
The hashes returned by most hash functions tend to use a lot of memory. For instance, a length-0
Vector{Int64}(e.g. as returned byLpHash) is 40 bytes:Moreover, using these hashes as a key into a database or hash table is difficult since in general they may not understand the datatype being used for the key.
Proposed solution
The solution I'm proposing is to add a function
compress_hashthat accepts aVector{<:Integer}orBitArray{1}and converts it into aUInt32,UInt64, orVector{UInt8}.hashfunction and simple letcompress_hash(x) = hash(x), which returnsUInt64.xas anArray{UInt8}and usesha256(x), which returnsVector{UInt8}.Notes
compress_hashneeds to be cryptographically secure. I suspect that it should be in order to be on the safe side for various potential applications of this package. In that case, we will need to define a type such assha256, though it's unlikely that we'll be hashing anything large enough to justify going to great lengths in order to do this.