3 Commits

Author SHA1 Message Date
6a5b40c097 docs: replaced instances of "bucket" with "table"
- Removed instances of `growthFactor`, as it is unexported.
- Typo in `HashTable.String()`.
2026-04-13 20:49:33 -04:00
395a3560c7 refactor: constructors, update docs
- NewCustomTable -> NewCustom
- NewTableBy -> NewBy
- NewTable -> New
2026-04-04 12:27:53 +02:00
2fd9da973b refactor: bucket -> table, Table -> HashTable 2026-04-04 12:22:42 +02:00
12 changed files with 334 additions and 937 deletions

View File

@@ -114,9 +114,6 @@ linters:
# Reports uses of functions with replacement inside the testing package.
- usetesting
# Reports mixed receiver types in structs/interfaces.
- recvcheck
settings:
revive:
rules:

View File

@@ -1,542 +0,0 @@
# Designing an Idiomatic API Interface
We (the maintainers) built `go-cuckoo`'s API interface without design intent.
Up until now, we paid more attention implementing the underlying functionality of the cuckoo hashing.
With the fundamentals of the algorithm built, we should revisit the interface.
It should align closer to the following principles:
- **Congruency**
A `go-cuckoo` table should have the same core functionality as Go's built-in map.
- **Familiarity**
A `go-cuckoo` table should behave similarly to Go's standard map, so users will intuitively know how to use it.
In effect, its users will carry less cognitive load.
## Current State
### Interface of the built-in Map
Listed below is every interface provided by Go to the built-in map object.
Also included, are the functions from the package `maps` in the standard library.
<details>
<summary>Interfaces</summary>
| # | built-in Interface | Description |
| --- | ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | `m := make(map[K]V)` | Returns an empty map using the built-in `make()` function. |
| 2 | `m := make(map[K]V, hint)` | Returns an empty map using `make()`, with a capacity 'hint'. This hint is how many items the map expects to hold, _not_ a measure of how large it is. |
| 3 | `m := map[K]V{...}` | Returns a map, which may be filled with entries in the ellipsis (optional). |
| 4 | `var m map[K]V` | Defines an empty _variable_ that holds a map. This differs from #1 because `m` is uninitialized (nil) here. |
| 5 | `m[k] := v` | Assigns the value of `k` to `v`. |
| 6 | `v := m[k]` | Returns the value of `k` if it exists. Otherwise, `v` is uninitialized. |
| 7 | `v, ok := m[k]` | Similar to #6, except `ok` is equal to whether `v` is initialized. This is comma-ok notation. |
| 8 | `for k, v := range m` | Iterates over every key-value pair in `m`. The order is random. |
| 9 | `delete(m, k)` | Unassigns the value `k`. Returns no value. |
| 10 | `clear(m)` | Unassigns all keys in `m`. Returns no value. |
| 11 | `n := len(m)` | Returns the number of entries in `m`. If nil, `m` returns 0. |
| 12 | `m2 := maps.Clone(m)` | Returns a copy of `m`. |
| 13 | `maps.Copy(dst, src)` | Assigns every entry of `src` in `dst`. |
| 14 | `ok := maps.Equal(m1, m2)` | Returns true iff `m1` and `m2` the same entries. |
| 15 | `ok := maps.EqualFunc(m1, m2, fn)` | Like #14, but with a custom comparator for non-comparable values. |
| 16 | `maps.DeleteFunc(m, fn)` | Removes every entry in `m` which satisfies `fn`. Returns no value. |
| 17 | `it2 := maps.All(m)` | Returns an 2D iterator over every key-value pair. |
| 18 | `it := maps.Keys(m)` | Returns an iterator over every key. |
| 19 | `it := maps.Values(m)` | Returns an iterator over every value. There can be duplicates. |
| 20 | `m := maps.Collect(seq)` | Returns a map, with every entry defined in a 2D iterator over key-value pairs. |
| 21 | `maps.Insert(m, seq)` | Assigns to `m` all key-value pairs in 2D iterator `seq`. Returns no value. |
</details>
### Interface of `go-cuckoo`
On the other hand, here is the current contract for `go-cuckoo`.
<details>
<summary>Interfaces</summary>
| # | built-in Interface | Description |
| --- | -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| 1 | `m := New(opts...)` | Creates a table using the default hash and equal function. The options configure its behavior. Confined to comparable keys. |
| 2 | `m := NewBy(keyFunc, opts...)` | Like #1, but allows any key type. A `keyFunc` is used to derive a comparable key. |
| 3 | `m := NewCustom(hashA, hashB, equalFunc, opts...)` | Like #1, but allows control over the hashes used to allow any key type. An `equalFunc` determines key equality. |
| 4 | `seq := m.Entries()` | Returns an unordered 2D iterator of all key-value pairs in the table. |
| 5 | `v := m.Find(k)` | Removes the value for `k`. Returns true if `k` existed. |
| 6 | `v, ok := m.Get(k)` | Returns the value for `k` in the table. Also, returns true if the `k` exists, otherwise false. When false, `v` is undefined. |
| 7 | `ok := m.Has(k)` | Returns true if `k` is in the table. |
| 8 | `err := m.Put(k, v)` | Sets value `v` for key `k`. Otherwise, returns error. |
| 9 | `n := m.Size()` | Returns the number of items in `m`. |
| 10 | `str := m.String()` | Returns `m` as a string in the format "table[k1:v1 k2:v2 ...]". |
| 11 | `cap := m.TotalCapacity()` | Returns how many slots `m` has allocated. |
| 12 | `ok := m.Drop(k)` | Removes `k` from the table. Returns whether the key had existed. |
</details>
### Determining Congruency
So, how does the core functionality compare?
Listed below is an analysis of every interface in Go's standard map.
Each is compared against what `go-cuckoo` offers, and categorized into the following groups:
- ✅ Covered: an analog exists.
- ⚠️ Partial: workaround available.
- ❌ Gap: no analog yet; addressed in [Target State](#solving-congruency).
Specifically, here we are checking for functionality.
Is there functionality that this offers which `go-cuckoo` does not?
We are checking accessibility, but not discoverability.
The latter will be considered later.
<details>
<summary>✅ <code>m := make(map[K]V)</code></summary>
The analog is `m := New()`.
</details>
<details>
<summary>⚠️ <code>m := make(map[K]V, hint)</code></summary>
This has no simple analog.
It is close to `m := New(Capacity(hint))`, but it assigns starting capacity, not expected size.
For the built-in map, these are two separate things.
- Capacity is an internal measure, used to optimize space/speed.
It is hidden from the user because it depends on the underlying implementation, which may change.
- Expected size requires the map must hold a number of items before resizing.
This is tangeable and agnostic to implementation, hence why it is given to the user.
In short, this interface defines expected size, but `Capacity()` defines capacity.
</details>
<details>
<summary>❌ <code>m := map[K]V{...}</code></summary>
This has no simple analog, the closest being:
```go
m := New[K, V]()
for k, v := range startingEntries {
m.Put(k, v)
}
```
It is idiomatic, but far less ergonomic.
</details>
<details>
<summary>✅ <code>var m map[K]V</code></summary>
The analog is `var m Table[K, V]`.
</details>
<details>
<summary>✅ <code>m[k] := v</code></summary>
The analog is `err := m.Put(k, v)`.
</details>
<details>
<summary>✅ <code>v := m[k]</code></summary>
The analog is `v := m.Find(k)`.
</details>
<details>
<summary>✅ <code>v, ok := m[k]</code></summary>
The analog is `v, ok := m.Get(k)`.
</details>
<details>
<summary>✅ <code>for k, v := range m</code></summary>
The analog is `for k, v := range m.Entries()`.
</details>
<details>
<summary>✅ <code>delete(m, k)</code></summary>
The analog is `ok := m.Drop(k)`.
</details>
<details>
<summary>❌ <code>clear(m)</code></summary>
There is no analog.
The easiest may to do this is to delete all items individually:
```go
for k := range m.Entries() {
m.Drop(k)
}
```
</details>
<details>
<summary>✅ <code>n := len(m)</code></summary>
The analog is `n := m.Size()`.
</details>
<details>
<summary>❌ <code>m2 := maps.Clone(m)</code></summary>
There is no analog.
The easiest way to do this currently is to make a new map, and manually add the items.
```go
m2 := cuckoo.Table[K, V]()
for k, v := range m.Entries() {
m2.Put(k, v)
}
```
This gets complicated by the various options available to the user.
Furthermore, any custom `EqualFunc`, `keyFunc` or `Hash` is not transferred.
</details>
<details>
<summary>❌ <code>maps.Copy(dst, src)</code></summary>
There is no analog.
The simplest way to do this is with a for-loop.
```go
for k, v := range src.Entries() {
dst.Put(k, v)
}
```
</details>
<details>
<summary>❌ <code>ok := maps.Equal(m1, m2)</code></summary>
There is no analog.
Users have to manually check the key-value pairs to determine equality.
</details>
<details>
<summary>❌ <code>ok := maps.EqualFunc(m1, m2, fn)</code></summary>
There is no analog.
Users have to manually check the key-value pairs to determine equality.
</details>
<details>
<summary>❌ <code>maps.DeleteFunc(m, fn)</code></summary>
There is no analog.
Users have to manually delete keys.
</details>
<details>
<summary>✅ <code>it2 := maps.All(m)</code></summary>
The analog is `it2 := m.Entries()`.
</details>
<details>
<summary>⚠️ <code>it := maps.Keys(m)</code></summary>
There is no simple analog.
A close neighbor is `it2 := m.Entries()`.
Users can use this in a for-loop, and pick out just the keys:
```go
for k := range m.Entries() {
// ...
}
```
</details>
<details>
<summary>⚠️ <code>it := maps.Values(m)</code></summary>
There is no simple analog.
A close neighbor is `it2 := m.Entries()`.
Users can use this in a for-loop, and pick out just the values:
```go
for _, v := range m.Entries() {
// ...
}
```
</details>
<details>
<summary>❌ <code>m := maps.Collect(seq)</code></summary>
There is no analog.
</details>
<details>
<summary>❌ <code>maps.Insert(m, seq)</code></summary>
There is no analog.
</details>
## Target State
### Solving Congruency
We should make the following changes to accomodate for congruency:
<details>
<summary><code>ok := maps.EqualFunc(m1, m2, fn)</code></summary>
We should implement a new function:
```go
func EqualFunc[K, V1, V2 any](t1 *Table[K, V1], t2 *Table[K, V2], eq func(V1, V2) bool) bool
```
This function is free, and not bound as a receiver function.
(It is called `cuckoo.Equal(t1, t2)`, not `t1.Equals(t2)`.)
The latter implies `t1` has authority, when in fact neither do.
We define equality as:
1. Neither table has a key the other doesn't.
2. Each key has the same value in each table.
Parameter `eq` determines this equality.
Custom `EqualFunc`'s complicate this, as they modulate key identity in tables.
If two tables may differ on whether two keys are different, this function might break.
So, we must assume that:
- Both tables have `EqualFunc`'s which 'agree' on the identity of the keys present in the tables.
Agreement is defined as: if two keys are distinct in one table, they are distinct in the other.
The name `EqualFunc` is already taken by `EqualFunc[K, V]`: an alias for `func(a, b K) bool`.
Inlining `EqualFunc[K, V]` would solve this problem.
We will move the documentation attached to it to `DefaultEqualFunc`.
</details>
<details>
<summary><code>ok := maps.Equal(m1, m2)</code></summary>
We should implement a new function, to conform with the standard library:
```go
func Equal[K any, V comparable](t1, t2 *Table[K, V]) bool
```
It uses the same equality check as in `EqualFunc`.
Once again, the function is free because it is symmetric.
</details>
<details>
<summary><code>maps.Insert(m, seq)</code></summary>
We should implement a new receiver for the table:
```go
func (t *Table[K, V]) Insert(seq iter.Seq2[K, V]) error
```
A receiver fits better even though `maps.Insert` is a free function, because copying it is asymmetric.
Map `dst` receives entries from map `src`.
It's only free because Go's standard map is built into the language, and so cannot have receivers.
In terms of naming, `t.Extend` is more accurate, and has precedent in [Python](docs.python.org/3/tutorial/datastructures.html#more-on-lists) and [Rust](https://doc.rust-lang.org/std/iter/trait.Extend.html).
When [adding iterator function](https://github.com/golang/go/issues/61900) to the `maps` package, the Go team chose to frame it as 'sources' and 'sinks'.
With this model, `maps.Insert` made more sense than `maps.Extend`.
Ultimately, `t.Insert()` is a better choice to be consistent with `maps`.
</details>
<details>
<summary><code>maps.Copy(dst, src)</code></summary>
We should implement a new receiver for the table:
```go
func (t *Table[K, V]) Copy(src *Table[K, V]) error
```
It's functionality should match that of `t.Insert()`.
A receiver fits better even though `maps.Copy` is a free function, 'copying' it is asymmetric: `dst` is writen into by `src`.
It is only free because Go's standard map is built into the language, and so cannot have receivers.
The name `t.Merge()` might be more accurate, but it does work because:
- `t.Copy()` matches Go's built-in `copy()`, and `io.Copy()`. The Go team used [the same logic](https://github.com/golang/go/discussions/47330#discussioncomment-1167799) to name `maps.Copy()`.
In this case, `t.Merge()` would be an outlier.
- `t.Merge()` implies some sort of conflict-resolution, when there is not.
It simply overwrites the values.
</details>
<details>
<summary><code>maps.DeleteFunc(m, fn)</code></summary>
We should implement a new receiver for the table:
```go
func (t *Table[K, V]) DeleteFunc(del func(K, V) bool)
```
It would have the same functionality as `maps.DeleteFunc`.
A free function could work here, but `t` has clear authority over `del`.
Other than being consistent with the `maps` package, `t.DeleteFunc` follows the Go convention of appending `Func` to higher-order equivalents of functions.
This trumps names like `t.DeleteIf`, which lend more to [Java](https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html#removeIf-java.util.function.Predicate-) or [C++](https://en.cppreference.com/cpp/algorithm/remove).
The word `Delete` is also convention, tying back to the built-in `delete()`.
</details>
<details>
<summary><code>m := maps.Collect(seq)</code></summary>
We should implement a new constructor.
```go
func Collect[K comparable, V any](seq iter.Seq2[K, V]) (*Table[K, V], error)
```
It would create a `New()` table, and insert all entries in `seq`.
This reveicer only supports the standard table constructor, with comparable keys.
It is tempting to add `CollectBy` or `CollectCustom` to support all table types, but doing so would pollute the public interface.
It would be just one more line to initialize the table and then call `t.Insert` directly:
```go
t := // ...
err := t.Insert(seq)
```
</details>
<details>
<summary><code>m := map[K]V{...}</code></summary>
We should make a new constructor, because entries are generic.
So, creating an option with inialized entries doesn't work.
With the previous additions, users have a few options.
If they want to use a `New()` table, `t.Collect` matches well:
```go
t, err := cuckoo.Collect(func(yield func(K, V) bool) {
yield(key1, val1)
yield(key2, val2)
})
```
For `NewCustom()` or `NewBy()` tables, users can call `t.Insert` after initialization:
```go
t := // ...
err := t.Insert(func(yield func(K, V) bool) {
yield(key1, val1)
yield(key2, val2)
})
```
It is one more line.
But, the alternative is polluting the public interface with corresponding `*WithEntries` constuctors.
</details>
<details>
<summary><code>m := make(map[K]V, hint)</code></summary>
We should add a new option:
```go
func ExpectedSize(n int) Option
```
When fed to a table, it will allocate enough space to hold `n` entries without a resize.
</details>
<details>
<summary><code>clear(m)</code></summary>
We should implement a new receiver:
```go
func (t *Table[K, V]) Clear()
```
It will remove all entries from the table.
</details>
<details>
<summary><code>m2 := maps.Clone(m)</code></summary>
We should implement a matching function:
```go
func (t *Table[K, V]) Clone() *Table[K, V]
```
Also, it will copy the hash, equality function, and options used in the table.
</details>
<details>
<summary><code>it := maps.Keys(m)</code></summary>
We should implement a matching function:
```go
func (t *Table[K, V]) Keys() iter.Seq[K]
```
It is tempting to just have `All()`, but it returns a `Seq2`, not a `Seq`.
There is no iterator adaptor between `Seq` and `Seq2`, and will not be for the foreseeable future.
This function, while it feels superfluous, is required.
</details>
<details>
<summary><code>it := maps.Values(m)</code></summary>
We should implement a matching function:
```go
func (t *Table[K, V]) Values() iter.Seq[V]
```
For the same reason we need `Keys()`, we also need `Values()`.
</details>

View File

@@ -1,11 +1,11 @@
package cuckoo
// An EqualFunc determines whethers two keys are 'equal'. Keys that are 'equal'
// are teated as the same by the [Table]. A good EqualFunc is pure,
// are teated as the same by the [HashTable]. A good EqualFunc is pure,
// deterministic, and fast. By default, [New] uses [DefaultEqualFunc].
//
// This function MUST NOT return true if the [Hash] digest of two keys
// are different: the [Table] will not work.
// are different: the [HashTable] will not work.
type EqualFunc[K any] = func(a, b K) bool
// DefaultEqualFunc compares two keys by strict equality. Returns true if the

View File

@@ -68,22 +68,21 @@ func FuzzInsertLookup(f *testing.F) {
for _, step := range scenario.steps {
if step.drop {
ok := actual.Drop(step.key)
_, has := expected[step.key]
assert.Equal(ok, has)
err := actual.Drop(step.key)
assert.NoError(err)
delete(expected, step.key)
_, ok = actual.Get(step.key)
assert.False(ok)
_, err = actual.Get(step.key)
assert.Error(err)
} else {
err := actual.Put(step.key, step.value)
assert.NoError(err)
expected[step.key] = step.value
found, ok := actual.Get(step.key)
assert.True(ok)
found, err := actual.Get(step.key)
assert.NoError(err)
assert.Equal(step.value, found)
}

View File

@@ -108,12 +108,12 @@ func TestGetMany(t *testing.T) {
}
for i := range 2_000 {
value, ok := table.Get(i)
value, err := table.Get(i)
if i < 1_000 {
assert.True(ok)
assert.NoError(err)
assert.Equal(value, true)
} else {
assert.False(ok)
assert.Error(err)
}
}
}
@@ -124,9 +124,9 @@ func TestDropExistingItem(t *testing.T) {
table := cuckoo.New[int, bool]()
(table.Put(key, value))
had := table.Drop(key)
err := table.Drop(key)
assert.True(had)
assert.NoError(err)
assert.Equal(0, table.Size())
assert.False(table.Has(key))
}
@@ -136,9 +136,9 @@ func TestDropNoItem(t *testing.T) {
key := 0
table := cuckoo.New[int, bool]()
had := table.Drop(key)
err := table.Drop(key)
assert.False(had)
assert.NoError(err)
assert.Equal(0, table.Size())
assert.False(table.Has(key))
}
@@ -152,9 +152,10 @@ func TestDropItemCapacity(t *testing.T) {
)
startingCapacity := table.TotalCapacity()
table.Drop(key)
err := table.Drop(key)
endingCapacity := table.TotalCapacity()
assert.NoError(err)
assert.Equal(0, table.Size())
assert.Equal(uint64(128), startingCapacity)
assert.Equal(uint64(64), endingCapacity)
@@ -202,9 +203,9 @@ func TestDropResizeCapacity(t *testing.T) {
err1 := table.Put(0, true)
err2 := table.Put(1, true)
table.Drop(1)
err3 := table.Drop(1)
assert.NoError(errors.Join(err1, err2))
assert.NoError(errors.Join(err1, err2, err3))
assert.Equal(uint64(20), table.TotalCapacity())
}

3
doc.go
View File

@@ -5,8 +5,5 @@
// a table with any key type using [NewCustom]. Custom [Hash] functions and
// key comparison are also supported.
//
// NOTE: The [Table] is a look-up structure, and not a source of truth. If
// [ErrBadHash] occurs, the data cannot be restored.
//
// See more: https://en.wikipedia.org/wiki/Cuckoo_hashing
package cuckoo

View File

@@ -14,19 +14,19 @@ func Example_basic() {
fmt.Println("Put error:", err)
}
if item, ok := table.Get(1); !ok {
fmt.Println("Not Found 1!")
if item, err := table.Get(1); err != nil {
fmt.Println("Error:", err)
} else {
fmt.Println("Found 1:", item)
}
if item, ok := table.Get(0); !ok {
fmt.Println("Not Found 0!")
if item, err := table.Get(0); err != nil {
fmt.Println("Error:", err)
} else {
fmt.Println("Found 0:", item)
}
// Output:
// Found 1: Hello, World!
// Not Found 0!
// Error: key '0' not found
}

View File

@@ -7,9 +7,9 @@ import (
// A Hash function maps any data to a fixed-length value (in this case, a
// [uint64]).
//
// It is used by the [Table] to evenly distribute values
// It is used by the [HashTable] to evenly distribute values
// amongst its slots. A good hash function is uniform, [chaotic], and
// deterministic. [Table] uses [NewDefaultHash] by default, which is built on
// deterministic. [HashTable] uses [NewDefaultHash] by default, which is built on
// [maphash.Comparable].
//
// [chaotic]: https://en.wikipedia.org/wiki/Avalanche_effect

237
hash_table.go Normal file
View File

@@ -0,0 +1,237 @@
package cuckoo
import (
"fmt"
"iter"
"math/bits"
"strings"
)
// A HashTable which uses cuckoo hashing to resolve collision. Create
// one with [New]. Or if you want more granularity, use [NewBy] or
// [NewCustom].
type HashTable[K, V any] struct {
tableA, tableB table[K, V]
growthFactor uint64
minLoadFactor float64
}
// TotalCapacity returns the number of slots allocated for the [HashTable]. To get the
// number of slots filled, look at [HashTable.Size].
func (t *HashTable[K, V]) TotalCapacity() uint64 {
return t.tableA.capacity + t.tableB.capacity
}
// Size returns how many slots are filled in the [HashTable].
func (t *HashTable[K, V]) Size() int {
return int(t.tableA.size + t.tableB.size)
}
func log2(n uint64) (m int) {
return max(0, bits.Len64(n)-1)
}
func (t *HashTable[K, V]) maxEvictions() int {
return 3 * log2(t.TotalCapacity())
}
func (t *HashTable[K, V]) load() float64 {
// When there are no slots in the table, we still treat the load as 100%.
// Every slot in the table is full.
if t.TotalCapacity() == 0 {
return 1.0
}
return float64(t.Size()) / float64(t.TotalCapacity())
}
// resize clears all tables, changes the sizes of them to a specific capacity,
// and fills them back up again. It is a helper function for [HashTable.grow] and
// [HashTable.shrink]; use them instead.
func (t *HashTable[K, V]) resize(capacity uint64) error {
entries := make([]entry[K, V], 0, t.Size())
for k, v := range t.Entries() {
entries = append(entries, entry[K, V]{k, v})
}
t.tableA.resize(capacity)
t.tableB.resize(capacity)
for _, entry := range entries {
if err := t.Put(entry.key, entry.value); err != nil {
return err
}
}
return nil
}
// grow increases the table's capacity by the growth factor. If the
// capacity is 0, it increases it to 1.
func (t *HashTable[K, V]) grow() error {
var newCapacity uint64
if t.TotalCapacity() == 0 {
newCapacity = 1
} else {
newCapacity = t.tableA.capacity * t.growthFactor
}
return t.resize(newCapacity)
}
// shrink reduces the table's capacity by the growth factor. It may
// reduce it down to 0.
func (t *HashTable[K, V]) shrink() error {
return t.resize(t.tableA.capacity / t.growthFactor)
}
// Get fetches the value for a key in the [HashTable]. Returns an error if no value
// is found.
func (t *HashTable[K, V]) Get(key K) (value V, err error) {
if item, ok := t.tableA.get(key); ok {
return item, nil
}
if item, ok := t.tableB.get(key); ok {
return item, nil
}
return value, fmt.Errorf("key '%v' not found", key)
}
// Has returns true if a key has a value in the table.
func (t *HashTable[K, V]) Has(key K) (exists bool) {
_, err := t.Get(key)
return err == nil
}
// Put sets the value for a key. Returns error if its value cannot be set.
func (t *HashTable[K, V]) Put(key K, value V) (err error) {
if t.tableA.update(key, value) {
return nil
}
if t.tableB.update(key, value) {
return nil
}
entry, eviction := entry[K, V]{key, value}, false
for range t.maxEvictions() {
if entry, eviction = t.tableA.evict(entry); !eviction {
return nil
}
if entry, eviction = t.tableB.evict(entry); !eviction {
return nil
}
}
if t.load() < t.minLoadFactor {
return fmt.Errorf("bad hash: resize on load %d/%d = %f", t.Size(), t.TotalCapacity(), t.load())
}
if err := t.grow(); err != nil {
return err
}
return t.Put(entry.key, entry.value)
}
// Drop removes a value for a key in the table. Returns an error if its value
// cannot be removed.
func (t *HashTable[K, V]) Drop(key K) (err error) {
t.tableA.drop(key)
t.tableB.drop(key)
if t.load() < t.minLoadFactor {
return t.shrink()
}
return nil
}
// Entries returns an unordered sequence of all key-value pairs in the table.
func (t *HashTable[K, V]) Entries() iter.Seq2[K, V] {
return func(yield func(K, V) bool) {
for _, slot := range t.tableA.slots {
if slot.occupied {
if !yield(slot.key, slot.value) {
return
}
}
}
for _, slot := range t.tableB.slots {
if slot.occupied {
if !yield(slot.key, slot.value) {
return
}
}
}
}
}
// String returns the entries of the table as a string in the format:
// "table[k1:v1 k2:v2 ...]".
func (t *HashTable[K, V]) String() string {
var sb strings.Builder
sb.WriteString("table[")
first := true
for k, v := range t.Entries() {
if !first {
sb.WriteString(" ")
}
fmt.Fprintf(&sb, "%v:%v", k, v)
first = false
}
sb.WriteString("]")
return sb.String()
}
// NewCustom creates a [HashTable] with custom [Hash] and [EqualFunc]
// functions, along with any [Option] the user provides.
func NewCustom[K, V any](hashA, hashB Hash[K], compare EqualFunc[K], options ...Option) *HashTable[K, V] {
settings := &settings{
growthFactor: DefaultGrowthFactor,
bucketSize: DefaultCapacity,
minLoadFactor: defaultMinimumLoad,
}
for _, option := range options {
option(settings)
}
return &HashTable[K, V]{
growthFactor: settings.growthFactor,
minLoadFactor: settings.minLoadFactor,
tableA: newTable[K, V](settings.bucketSize, hashA, compare),
tableB: newTable[K, V](settings.bucketSize, hashB, compare),
}
}
func pipe[X, Y, Z any](a func(X) Y, b func(Y) Z) func(X) Z {
return func(x X) Z { return b(a(x)) }
}
// NewBy creates a [HashTable] for any key type by using keyFunc to derive a
// comparable key. Two keys with the same derived key are treated as equal.
func NewBy[K, V any, C comparable](keyFunc func(K) C, options ...Option) *HashTable[K, V] {
return NewCustom[K, V](
pipe(keyFunc, NewDefaultHash[C]()),
pipe(keyFunc, NewDefaultHash[C]()),
func(a, b K) bool { return keyFunc(a) == keyFunc(b) },
options...,
)
}
// New creates a [HashTable] using the default [Hash] and [EqualFunc]. Use
// the [Option] functions to configure its behavior. Note that this constructor
// is only provided for comparable keys. For arbitrary keys, consider
// [NewBy] or [NewCustom].
func New[K comparable, V any](options ...Option) *HashTable[K, V] {
return NewCustom[K, V](NewDefaultHash[K](), NewDefaultHash[K](), DefaultEqualFunc[K], options...)
}

View File

@@ -2,39 +2,34 @@ package cuckoo
import "fmt"
// DefaultCapacity is the initial capacity of a [Table]. It is inspired from
// DefaultCapacity is the initial capacity of a [HashTable]. It is inspired from
// Java's [HashMap] implementation, which also uses 16.
//
// [HashMap]: https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html#HashMap--
const DefaultCapacity uint64 = 16
// DefaultGrowthFactor is the standard resize multiplier for a [Table]. Most
// DefaultGrowthFactor is the standard resize multiplier for a [HashTable]. Most
// implementations use 2.
const DefaultGrowthFactor uint64 = 2
// defaultMinimumLoad is the default lowest acceptable occupancy of a [Table].
// The higher the minimum load, the more likely that a [Table.Put] will not
// defaultMinimumLoad is the default lowest acceptable occupancy of a [HashTable].
// The higher the minimum load, the more likely that a [HashTable.Put] will not
// succeed. The value of 5% is taken from [libcuckoo].
//
// [libcuckoo]: https://github.com/efficient/libcuckoo/blob/656714705a055df2b7a605eb3c71586d9da1e119/libcuckoo/cuckoohash_config.hh#L21
const defaultMinimumLoad float64 = 0.05
// defaultGrowthLimit is the maximum number of times a [Table] can grow in a
// single [Table.Put], before the library infers it will lead to a stack
// overflow. The value of '64' was chosen arbirarily.
const defaultGrowthLimit uint64 = 64
type settings struct {
growthFactor uint64
minLoadFactor float64
bucketSize uint64
}
// An Option modifies the settings of a [Table]. It is used in its constructors
// An Option modifies the settings of a [HashTable]. It is used in its constructors
// like [New], for example.
type Option func(*settings)
// Capacity modifies the starting capacity of each subtable of the [Table]. The
// Capacity modifies the starting capacity of each table of the [HashTable]. The
// value must be non-negative.
func Capacity(value int) Option {
if value < 0 {
@@ -44,7 +39,7 @@ func Capacity(value int) Option {
return func(s *settings) { s.bucketSize = uint64(value) }
}
// GrowthFactor controls how much the capacity of the [Table] multiplies when
// GrowthFactor controls how much the capacity of the [HashTable] multiplies when
// it must resize. The value must be greater than 1.
func GrowthFactor(value int) Option {
if value < 2 {

View File

@@ -1,107 +0,0 @@
package cuckoo
// An entry is a key-value pair.
type entry[K, V any] struct {
key K
value V
}
type slot[K, V any] struct {
entry[K, V]
occupied bool
}
type subtable[K, V any] struct {
hash Hash[K]
slots []slot[K, V]
capacity, size uint64
compare EqualFunc[K]
}
// location determines where in the subtable a certain key would be placed. If
// the capacity is 0, this will panic.
func (t *subtable[K, V]) location(key K) uint64 {
return t.hash(key) % t.capacity
}
func (t *subtable[K, V]) get(key K) (value V, found bool) {
if t.capacity == 0 {
return
}
slot := t.slots[t.location(key)]
return slot.value, slot.occupied && t.compare(slot.key, key)
}
func (t *subtable[K, V]) drop(key K) (occupied bool) {
if t.capacity == 0 {
return
}
slot := &t.slots[t.location(key)]
if slot.occupied && t.compare(slot.key, key) {
slot.occupied = false
t.size--
return true
}
return false
}
func (t *subtable[K, V]) resized(capacity uint64) *subtable[K, V] {
return &subtable[K, V]{
slots: make([]slot[K, V], capacity),
capacity: capacity,
hash: t.hash,
compare: t.compare,
}
}
func (t *subtable[K, V]) update(key K, value V) (updated bool) {
if t.capacity == 0 {
return
}
slot := &t.slots[t.location(key)]
if slot.occupied && t.compare(slot.key, key) {
slot.value = value
return true
}
return false
}
func (t *subtable[K, V]) insert(insertion entry[K, V]) (evicted entry[K, V], eviction bool) {
if t.capacity == 0 {
return insertion, true
}
slot := &t.slots[t.location(insertion.key)]
if !slot.occupied {
slot.entry = insertion
slot.occupied = true
t.size++
return
}
if t.compare(slot.key, insertion.key) {
slot.value = insertion.value
return
}
insertion, slot.entry = slot.entry, insertion
return insertion, true
}
func newSubtable[K, V any](capacity uint64, hash Hash[K], compare EqualFunc[K]) *subtable[K, V] {
return &subtable[K, V]{
hash: hash,
capacity: capacity,
compare: compare,
size: 0,
slots: make([]slot[K, V], capacity),
}
}

300
table.go
View File

@@ -1,283 +1,103 @@
package cuckoo
import (
"errors"
"fmt"
"iter"
"math/bits"
"strings"
)
// ErrBadHash occurs when the hashes given to a [Table] cause too many key
// collisions. Discard the old table, rebuild it from your source data, and try:
//
// 1. Different hash seeds. Equal seeds produce equal hash functions, which
// always cycle.
// 2. A different [Hash] algorithm.
var ErrBadHash = errors.New("bad hash")
// A Table which uses cuckoo hashing to resolve collision. Create
// one with [New]. Or if you want more granularity, use [NewBy] or
// [NewCustom].
type Table[K, V any] struct {
tableA, tableB *subtable[K, V]
growthFactor uint64
minLoadFactor float64
type entry[K, V any] struct {
key K
value V
}
// TotalCapacity returns the number of slots allocated for the [Table]. To get the
// number of slots filled, look at [Table.Size].
func (t *Table[K, V]) TotalCapacity() uint64 {
return t.tableA.capacity + t.tableB.capacity
type slot[K, V any] struct {
entry[K, V]
occupied bool
}
// Size returns how many slots are filled in the [Table].
func (t *Table[K, V]) Size() int {
return int(t.tableA.size + t.tableB.size)
type table[K, V any] struct {
hash Hash[K]
slots []slot[K, V]
capacity, size uint64
compare EqualFunc[K]
}
func log2(n uint64) (m int) {
return max(0, bits.Len64(n)-1)
// location determines where in the table a certain key would be placed. If the
// capacity is 0, this will panic.
func (t table[K, V]) location(key K) uint64 {
return t.hash(key) % t.capacity
}
func (t *Table[K, V]) maxEvictions() int {
return 3 * log2(t.TotalCapacity())
}
func (t *Table[K, V]) load() float64 {
// When there are no slots in the table, we still treat the load as 100%.
// Every slot in the table is full.
if t.TotalCapacity() == 0 {
return 1.0
}
return float64(t.Size()) / float64(t.TotalCapacity())
}
// insert attempts to put/update an entry in the table, without modifying the
// size of the table. Returns a displaced entry and 'homeless = true' if an
// entry could not be placed after exhausting evictions.
func (t *Table[K, V]) insert(entry entry[K, V]) (displaced entry[K, V], homeless bool) {
if t.tableA.update(entry.key, entry.value) {
func (t table[K, V]) get(key K) (value V, found bool) {
if t.capacity == 0 {
return
}
if t.tableB.update(entry.key, entry.value) {
return
}
for range t.maxEvictions() {
if entry, homeless = t.tableA.insert(entry); !homeless {
return
}
if entry, homeless = t.tableB.insert(entry); !homeless {
return
}
}
return entry, true
slot := t.slots[t.location(key)]
return slot.value, slot.occupied && t.compare(slot.key, key)
}
// resized creates an empty copy of the table, with a new capacity for each
// bucket.
func (t *Table[K, V]) resized(capacity uint64) *Table[K, V] {
return &Table[K, V]{
growthFactor: t.growthFactor,
minLoadFactor: t.minLoadFactor,
tableA: t.tableA.resized(capacity),
tableB: t.tableB.resized(capacity),
}
}
// resize creates a new [Table.resized] with 'capacity', inserts all items into
// the array, and replaces the current table. It is a helper function for
// [Table.grow] and [Table.shrink]; use them instead.
func (t *Table[K, V]) resize(capacity uint64) bool {
updated := t.resized(capacity)
for k, v := range t.Entries() {
if _, failed := updated.insert(entry[K, V]{k, v}); failed {
return false
}
func (t *table[K, V]) drop(key K) (occupied bool) {
if t.capacity == 0 {
return
}
*t = *updated
slot := &t.slots[t.location(key)]
if slot.occupied && t.compare(slot.key, key) {
slot.occupied = false
t.size--
return true
}
// grow increases the table's capacity by the growth factor. If the
// capacity is 0, it increases it to 1.
func (t *Table[K, V]) grow() bool {
var newCapacity uint64
if t.TotalCapacity() == 0 {
newCapacity = 1
} else {
newCapacity = t.tableA.capacity * t.growthFactor
}
return t.resize(newCapacity)
return false
}
// shrink reduces the table's capacity by the growth factor. It may
// reduce it down to 0.
func (t *Table[K, V]) shrink() bool {
return t.resize(t.tableA.capacity / t.growthFactor)
func (t *table[K, V]) resize(capacity uint64) {
t.slots = make([]slot[K, V], capacity)
t.capacity = capacity
t.size = 0
}
// Get fetches the value for a key in the [Table]. Matches the comma-ok pattern
// of a builtin map; see [Table.Find] for plain indexing.
func (t *Table[K, V]) Get(key K) (value V, ok bool) {
if item, ok := t.tableA.get(key); ok {
return item, true
}
if item, ok := t.tableB.get(key); ok {
return item, true
}
return
}
// Find fetches the value of a key. Matches direct indexing of a builtin map;
// see [Table.Get] for a comma-ok pattern.
func (t *Table[K, V]) Find(key K) (value V) {
value, _ = t.Get(key)
return
}
// Has returns true if a key has a value in the table.
func (t *Table[K, V]) Has(key K) (exists bool) {
_, exists = t.Get(key)
return
}
// Put sets the value for a key. If it cannot be set, an error is returned.
func (t *Table[K, V]) Put(key K, value V) (err error) {
var (
entry = entry[K, V]{key, value}
homeless bool
)
for range defaultGrowthLimit {
if entry, homeless = t.insert(entry); !homeless {
func (t table[K, V]) update(key K, value V) (updated bool) {
if t.capacity == 0 {
return
}
// Both this and the growth limit are necessary: this catches bad hashes
// early when the table is sparse, while the latter catches cases where
// growing never helps.
if t.load() < t.minLoadFactor {
return fmt.Errorf("hash functions produced a cycle at load %d/%d: %w", t.Size(), t.TotalCapacity(), ErrBadHash)
slot := &t.slots[t.location(key)]
if slot.occupied && t.compare(slot.key, key) {
slot.value = value
return true
}
// It is theoretically possible to have a table with a larger capacity
// that is valid. But this chance is astronomically small, so we ignore
// it in this implementation.
if grew := t.grow(); !grew {
return fmt.Errorf("could not redistribute entries into larger table: %w", ErrBadHash)
}
}
return fmt.Errorf("could not place entry after %d resizes: %w", defaultGrowthLimit, ErrBadHash)
return false
}
// Drop removes a value for a key in the table. Returns whether the key had
// existed.
func (t *Table[K, V]) Drop(key K) bool {
occupied := t.tableA.drop(key) || t.tableB.drop(key)
if t.load() < t.minLoadFactor {
// The error is not handled here, because table-shrinking is an internal
// optimization.
t.shrink()
func (t *table[K, V]) evict(insertion entry[K, V]) (evicted entry[K, V], eviction bool) {
if t.capacity == 0 {
return insertion, true
}
return occupied
}
slot := &t.slots[t.location(insertion.key)]
// Entries returns an unordered sequence of all key-value pairs in the table.
func (t *Table[K, V]) Entries() iter.Seq2[K, V] {
return func(yield func(K, V) bool) {
for _, slot := range t.tableA.slots {
if slot.occupied {
if !yield(slot.key, slot.value) {
if !slot.occupied {
slot.entry = insertion
slot.occupied = true
t.size++
return
}
}
}
for _, slot := range t.tableB.slots {
if slot.occupied {
if !yield(slot.key, slot.value) {
if t.compare(slot.key, insertion.key) {
slot.value = insertion.value
return
}
}
}
}
insertion, slot.entry = slot.entry, insertion
return insertion, true
}
// String returns the entries of the table as a string in the format:
// "table[k1:v1 k2:v2 ...]".
func (t *Table[K, V]) String() string {
var sb strings.Builder
sb.WriteString("table[")
first := true
for k, v := range t.Entries() {
if !first {
sb.WriteString(" ")
}
fmt.Fprintf(&sb, "%v:%v", k, v)
first = false
}
sb.WriteString("]")
return sb.String()
}
// NewCustom creates a [Table] with custom [Hash] and [EqualFunc]
// functions, along with any [Option] the user provides.
func NewCustom[K, V any](hashA, hashB Hash[K], compare EqualFunc[K], options ...Option) *Table[K, V] {
settings := &settings{
growthFactor: DefaultGrowthFactor,
bucketSize: DefaultCapacity,
minLoadFactor: defaultMinimumLoad,
}
for _, option := range options {
option(settings)
}
return &Table[K, V]{
growthFactor: settings.growthFactor,
minLoadFactor: settings.minLoadFactor,
tableA: newSubtable[K, V](settings.bucketSize, hashA, compare),
tableB: newSubtable[K, V](settings.bucketSize, hashB, compare),
func newTable[K, V any](capacity uint64, hash Hash[K], compare EqualFunc[K]) table[K, V] {
return table[K, V]{
hash: hash,
capacity: capacity,
compare: compare,
size: 0,
slots: make([]slot[K, V], capacity),
}
}
func pipe[X, Y, Z any](a func(X) Y, b func(Y) Z) func(X) Z {
return func(x X) Z { return b(a(x)) }
}
// NewBy creates a [Table] for any key type by using keyFunc to derive a
// comparable key. Two keys with the same derived key are treated as equal.
func NewBy[K, V any, C comparable](keyFunc func(K) C, options ...Option) *Table[K, V] {
return NewCustom[K, V](
pipe(keyFunc, NewDefaultHash[C]()),
pipe(keyFunc, NewDefaultHash[C]()),
func(a, b K) bool { return keyFunc(a) == keyFunc(b) },
options...,
)
}
// New creates a [Table] using the default [Hash] and [EqualFunc]. Use
// the [Option] functions to configure its behavior. Note that this constructor
// is only provided for comparable keys. For arbitrary keys, consider
// [NewBy] or [NewCustom].
func New[K comparable, V any](options ...Option) *Table[K, V] {
return NewCustom[K, V](NewDefaultHash[K](), NewDefaultHash[K](), DefaultEqualFunc[K], options...)
}