Skip to content

Implement VarOptItemsSketch for Go #98

@Fengzdadi

Description

@Fengzdadi

Overview

This issue tracks the implementation of VarOptItemsSketch (Variance Optimal Sampling) for the Go library, following the Java and C++ implementations.

VarOpt provides weighted sampling with optimal variance for subset sum estimation - useful for network traffic monitoring, ad analytics, and log sampling.

Proposed Implementation Plan

We plan to split this into multiple PRs for easier review:

PR 1: Core VarOptItemsSketch (basic functionality)

  • Add VarOptItems family (ID=13) to internal/family.go
  • Implement VarOptItemsSketch[T] struct with fields: k, n, h, m, r, totalWtR, data, weights
  • Implement warmup mode (n ≤ k): store all items
  • Implement estimation mode (n > k): weighted sampling with tau threshold
  • Min-heap operations for H region
  • Basic unit tests

PR 2: Serialization & Compatibility

  • Implement Java-compatible serialization (equivalent to encodeVarOptItemsSketch + VarOptItemsSketchEncoder.Encode())
  • Implement NewVarOptItemsSketchFromSlice()
  • Add cross-language compatibility tests
  • Generate test data files for validation

PR 3: Subset Sum Estimation

  • Implement EstimateSubsetSum(predicate) returning bounds
  • Pseudo-hypergeometric confidence intervals
  • Tests for estimation accuracy

PR 4: VarOptItemsUnion (optional, can be deferred)

  • Implement union with gadget/marks mechanism
  • Serialization for union state

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions