Files
go-cuckoo/adr/001_interface_design.md
2026-05-04 19:16:42 -04:00

478 lines
18 KiB
Markdown

# Designing an Idiomatic Interface
Currently, the contract for package was built without design.
More attention was paid to implementing the underlying functionality of the cuckoo hashing.
With the fundamentals of the algorithm built, our API contract should be revisited.
It should align closer to the following principles:
- **Congruency to the builtin map.**
Our cuckoo table should have the same core functionality as Go's built-in map.
- **Familiarity to the builtin map.**
If our cuckoo table behaves similarly to Go's standard map, our user will intuitively know how to use it.
This lowers the cognitive load our developers must carry.
## Current State
### Interface of the Builtin Map
Listed below is every interface provided by Go to the built-in map object.
Also included, are the functions from the package `maps` in the standard library.
<details>
<summary>Interfaces</summary>
| # | Builtin Interface | Description |
| --- | ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | `m := make(map[K]V)` | Returns an empty map using the built-in `make()` function. |
| 2 | `m := make(map[K]V, hint)` | Returns an empty map using `make()`, with a capacity 'hint'. This hint is how many items the map expects to hold, _not_ a measure of how large it is. |
| 3 | `m := map[K]V{...}` | Returns a map, which may be filled with entries in the ellipsis (optional). |
| 4 | `var m map[K]V` | Defines an empty _variable_ that holds a map. This differs from #1 because `m` is uninitialized (nil) here. |
| 5 | `m[k] := v` | Assigns the value of `k` to `v`. |
| 6 | `v := m[k]` | Returns the value of `k` if it exists. Otherwise, `v` is uninitialized. |
| 7 | `v, ok := m[k]` | Similar to #6, except `ok` is equal to whether `v` is initialized. This is comma-ok notation. |
| 8 | `for k, v := range m` | Iterates over every key-value pair in `m`. The order is random. |
| 9 | `delete(m, k)` | Unassigns the value `k`. Returns no value. |
| 10 | `clear(m)` | Unassigns all keys in `m`. Returns no value. |
| 11 | `n := len(m)` | Returns the number of entries in `m`. If nil, `m` returns 0. |
| 12 | `m2 := maps.Clone(m)` | Returns a copy of `m`. |
| 13 | `maps.Copy(dst, src)` | Assigns every entry of `src` in `dst`. |
| 14 | `ok := maps.Equal(m1, m2)` | Returns true iff `m1` and `m2` the same entries. |
| 15 | `ok := maps.EqualFunc(m1, m2, fn)` | Like #14, but with a custom comparator for non-comparable values. |
| 16 | `maps.DeleteFunc(m, fn)` | Removes every entry in `m` which satisfies `fn`. Returns no value. |
| 17 | `it2 := maps.All(m)` | Returns an 2D iterator over every key-value pair. |
| 18 | `it := maps.Keys(m)` | Returns an iterator over every key. |
| 19 | `it := maps.Values(m)` | Returns an iterator over every value. There can be duplicates. |
| 20 | `m := maps.Collect(seq)` | Returns a map, with every entry defined in a 2D iterator over key-value pairs. |
| 21 | `maps.Insert(m, seq)` | Assigns to `m` all key-value pairs in 2D iterator `seq`. Returns no value. |
</details>
### Interface of `go-cuckoo`
On the other hand, here is the current contract for `go-cuckoo`.
<details>
<summary>Interfaces</summary>
| # | Builtin Interface | Description |
| --- | -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| 1 | `m := New(opts...)` | Creates a table using the default hash and equal function. The options configure its behavior. Confined to comparable keys. |
| 2 | `m := NewBy(keyFunc, opts...)` | Like #1, but allows any key type. A `keyFunc` is used to derive a comparable key. |
| 3 | `m := NewCustom(hashA, hashB, equalFunc, opts...)` | Like #1, but allows control over the hashes used to allow any key type. An `equalFunc` determines key equality. |
| 4 | `seq := m.Entries()` | Returns an unordered 2D iterator of all key-value pairs in the table. |
| 5 | `v := m.Find(k)` | Removes the value for `k`. Returns true if `k` existed. |
| 6 | `v, ok := m.Get(k)` | Returns the value for `k` in the table. Also, returns true if the `k` exists, otherwise false. When false, `v` is undefined. |
| 7 | `ok := m.Has(k)` | Returns true if `k` is in the table. |
| 8 | `err := m.Put(k, v)` | Sets value `v` for key `k`. Otherwise, returns error. |
| 9 | `n := m.Size()` | Returns the number of items in `m`. |
| 10 | `str := m.String()` | Returns `m` as a string in the format "table[k1:v1 k2:v2 ...]". |
| 11 | `cap := m.TotalCapacity()` | Returns how many slots `m` has allocated. |
| 12 | `ok := m.Drop(k)` | Removes `k` from the table. Returns whether the key had existed. |
</details>
### Determining Congruency
So, how does the core functionality compare?
Listed below is an analysis of every interface in Go's standard map.
Each is compared against what `go-cuckoo` offers, and categorized into the following groups:
- ✅ Covered: an analog exists.
- ⚠️ Partial: workaround available.
- ❌ Gap: no analog yet; addressed in [Target State](#solving-congruency).
Specifically, here we are checking for functionality.
Is there functionality that this offers which `go-cuckoo` does not?
We are checking accessibility, but not discoverability.
The latter will be considered later.
<details>
<summary>✅ <code>m := make(map[K]V)</code></summary>
The analog is `m := New()`.
</details>
<details>
<summary>⚠️ <code>m := make(map[K]V, hint)</code></summary>
This has no simple analog.
It is close to `m := New(Capacity(hint))`, but it assigns starting capacity, not expected size.
For the built-in map, these are two separate things.
- Capacity is an internal measure, used to optimize space/speed.
It is hidden from the user because it depends on the underlying implementation, which may change.
- Expected size requires the map must hold a number of items before resizing.
This is tangeable and agnostic to implementation, hence why it is given to the user.
In short, this interface defines expected size, but `Capacity()` defines capacity.
</details>
<details>
<summary>❌ <code>m := map[K]V{...}</code></summary>
This has no simple analog, the closest being:
```go
m := New[K, V]()
for k, v := range startingEntries {
m.Put(k, v)
}
```
It is idiomatic, but far less ergonomic.
</details>
<details>
<summary>✅ <code>var m map[K]V</code></summary>
The analog is `var m Table[K, V]`.
</details>
<details>
<summary>✅ <code>m[k] := v</code></summary>
The analog is `err := m.Put(k, v)`.
</details>
<details>
<summary>✅ <code>v := m[k]</code></summary>
The analog is `v := m.Find(k)`.
</details>
<details>
<summary>✅ <code>v, ok := m[k]</code></summary>
The analog is `v, ok := m.Get(k)`.
</details>
<details>
<summary>✅ <code>for k, v := range m</code></summary>
The analog is `for k, v := range m.Entries()`.
</details>
<details>
<summary>✅ <code>delete(m, k)</code></summary>
The analog is `ok := m.Drop(k)`.
</details>
<details>
<summary>❌ <code>clear(m)</code></summary>
There is no analog.
The easiest may to do this is to delete all items individually:
```go
for k := range m.Entries() {
m.Drop(k)
}
```
</details>
<details>
<summary>✅ <code>n := len(m)</code></summary>
The analog is `n := m.Size()`.
</details>
<details>
<summary>❌ <code>m2 := maps.Clone(m)</code></summary>
There is no analog.
The easiest way to do this currently is to make a new map, and manually add the items.
```go
m2 := cuckoo.Table[K, V]()
for k, v := range m.Entries() {
m2.Put(k, v)
}
```
This gets complicated by the various options available to the user.
Furthermore, any custom `EqualFunc`, `keyFunc` or `Hash` is not transferred.
</details>
<details>
<summary>❌ <code>maps.Copy(dst, src)</code></summary>
There is no analog.
The simplest way to do this is with a for-loop.
```go
for k, v := range src.Entries() {
dst.Put(k, v)
}
```
</details>
<details>
<summary>❌ <code>ok := maps.Equal(m1, m2)</code></summary>
There is no analog.
Users have to manually check the key-value pairs to determine equality.
</details>
<details>
<summary>❌ <code>ok := maps.EqualFunc(m1, m2, fn)</code></summary>
There is no analog.
Users have to manually check the key-value pairs to determine equality.
</details>
<details>
<summary>❌ <code>maps.DeleteFunc(m, fn)</code></summary>
There is no analog.
Users have to manually delete keys.
</details>
<details>
<summary>✅ <code>it2 := maps.All(m)</code></summary>
The analog is `it2 := m.Entries()`.
</details>
<details>
<summary>⚠️ <code>it := maps.Keys(m)</code></summary>
There is no simple analog.
A close neighbor is `it2 := m.Entries()`.
Users can use this in a for-loop, and pick out just the keys:
```go
for k := range m.Entries() {
// ...
}
```
</details>
<details>
<summary>⚠️ <code>it := maps.Values(m)</code></summary>
There is no simple analog.
A close neighbor is `it2 := m.Entries()`.
Users can use this in a for-loop, and pick out just the values:
```go
for _, v := range m.Entries() {
// ...
}
```
</details>
<details>
<summary>❌ <code>m := maps.Collect(seq)</code></summary>
There is no analog.
</details>
<details>
<summary>❌ <code>maps.Insert(m, seq)</code></summary>
There is no analog.
</details>
## Target State
### Solving Congruency
The following changes will be made to accomodate for congruency:
<details>
<summary><code>ok := maps.EqualFunc(m1, m2, fn)</code></summary>
To solve this, we need a new function:
```go
func EqualFunc[K, V1, V2 any](t1 *Table[K, V1], t2 *Table[K, V2], eq func(V1, V2) bool) bool {
if t1.Size() != t2.Size() {
return false
}
for k, v1 := range t1.Entries() {
if v2, ok := t2.Get(k); !ok || eq(v1, v2) {
return false
}
}
return true
}
```
This function is free, and not bound as a receiver function.
(It is called `cuckoo.Equal(t1, t2)`, not `t1.Equals(t2)`.)
The latter implies `t1` has authority, when in fact neither do.
Equality will be defined as:
1. Neither table has a key the other doesn't.
2. Each key has the same value in each table.
Parameter `eq` determines this equality.
Custom `EqualFunc`'s complicate this, as they modulate key identity in tables.
If two tables may differ on whether two keys are different, this function might break.
So, we must assume that:
- Both tables have `EqualFunc`'s which 'agree' on the identity of the keys present in the tables.
Agreement is defined as: if two keys are distinct in one table, they are distinct in the other.
The name `EqualFunc` is already taken by `EqualFunc[K, V]`: an alias for `func(a, b K) bool`.
Inlining `EqualFunc[K, V]` would solve this problem.
The documentation attached to it would be moved to `DefaultEqualFunc`.
</details>
<details>
<summary><code>ok := maps.Equal(m1, m2)</code></summary>
The addition of `cuckoo.EqualFunc` makes an implementation trivial:
```go
func Equal[K any, V comparable](t1, t2 *Table[K, V]) bool {
return EqualFunc(t1, t2, DefaultEqualFunc[V])
}
```
To conform with the standard library, a new function should be added.
Once again, the function is free because it is symmetric.
</details>
<details>
<summary><code>maps.Insert(m, seq)</code></summary>
This functionality requires a new receiver:
```go
func (t *Table[K, V]) Insert(seq iter.Seq2[K, V]) error {
for k, v := range seq {
if err := t.Put(k, v); err != nil {
return err
}
}
return nil
}
```
A receiver fits better even though `maps.Insert` is a free function, because copying it is asymmetric.
Map `dst` receives entries from map `src`.
It's only free because Go's standard map is built into the language, and so cannot have receivers.
In terms of naming, `t.Extend` is more accurate, and has precedent in [Python](docs.python.org/3/tutorial/datastructures.html#more-on-lists) and [Rust](https://doc.rust-lang.org/std/iter/trait.Extend.html).
When [adding iterator function](https://github.com/golang/go/issues/61900) to the `maps` package, the Go team chose to frame it as 'sources' and 'sinks'.
With this model, `maps.Insert` made more sense than `maps.Extend`.
Ultimately, `t.Insert()` is a better choice to be consistent with `maps`.
</details>
<details>
<summary><code>maps.Copy(dst, src)</code></summary>
To solve this, we must implement a new receiver.
Luckily, `t.Insert` makes it trivial:
```go
func (t *Table[K, V]) Copy(src *Table[K, V]) error {
return t.Insert(src.Entries())
}
```
A receiver fits better even though `maps.Copy` is a free function, 'copying' it is asymmetric: `dst` is writen into by `src`.
It is only free because Go's standard map is built into the language, and so cannot have receivers.
The name `t.Merge()` might be more accurate, but it does work because:
- `t.Copy()` matches Go's builtin `copy()`, and `io.Copy()`. The Go team used [the same logic](https://github.com/golang/go/discussions/47330#discussioncomment-1167799) to name `maps.Copy()`.
In this case, `t.Merge()` would be an outlier.
- `t.Merge()` implies some sort of conflict-resolution, when there is not.
It simply overwrites the values.
</details>
<details>
<summary><code>maps.DeleteFunc(m, fn)</code></summary>
A few function can fill this gap:
```go
func (t *Table[K, V]) DeleteFunc(del func(K, V) bool) {
for k, v := range t.Entries() {
if del(k, v) {
t.Drop(k)
}
}
}
```
It would have the same functionality as `maps.DeleteFunc`.
A free function could work here, but `t` has clear authority over `del`.
Other than being consistent with the `maps` package, `t.DeleteFunc` follows the Go convention of appending `Func` to higher-order equivalents of functions.
This trumps names like `t.DeleteIf`, which lend more to [Java](https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html#removeIf-java.util.function.Predicate-) or [C++](https://en.cppreference.com/cpp/algorithm/remove).
The word `Delete` is also convention, tying back to the built-in `delete()`.
</details>
<details>
<summary><code>m := maps.Collect(seq)</code></summary>
This functionality would benefit from a new constructor.
Luckily, `t.Insert` makes this easy:
```go
func Collect[K comparable, V any](seq iter.Seq2[K, V]) (*Table[K, V], error) {
t := New[K, V]()
err := t.Insert(seq)
return t, err
}
```
</details>
<details>
<summary><code>m := map[K]V{...}</code></summary>
This functionality is complicated, because entries are generic; their addition cannot be through table options.
A new constructor must support this functionality.
Should it support options or custom hashes or `keyFunc`'s?
No, because
</details>