find_near_duplicates tool ¶Read-only diagnostic for memory cleanup. Returns clusters of active
memories whose embeddings are at least threshold similar to
each other. Clusters are formed by union-find over qualifying pairs
(so transitively similar rows a~b~c land in the same group) and
sorted strongest-first by the max pair-similarity within each
cluster. Soft-deleted and expired rows are excluded.
Pair candidates are looked up via pgvector’s HNSW index using a
fixed top-10 per row, which is more than enough at the cutoffs
typical for cleanup (threshold >= 0.85).
thresholdNumber, 0.5–1.0. Optional, default 0.92. Cosine-similarity cutoff for pair candidates. 0.95+ is typical for “effectively identical” rows.
max_groupsInteger. Optional, default 50. Hard cap 500.
{
"ok": true,
"threshold": 0.92,
"groups": [
{
"ids": [4, 5],
"max_similarity": 0.984,
"pair_count": 1,
"memories": [
{ "id": 4, "text": "...", "tags": ["family"], ... },
{ "id": 5, "text": "...", "tags": ["family"], ... }
]
}
],
"summary": { "groups": 5, "pairs": 8 }
}
pair_count is the number of qualifying pairs inside that
cluster (always at least 1; higher means the cluster is denser).
pairs in the summary is the total raw pair count across all
returned clusters.
INVALID_INPUT (out-of-range threshold or max_groups),
DATABASE_ERROR.