Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splay tree amortized time issues #923

Open
matthewsot opened this issue Feb 22, 2025 · 3 comments
Open

Splay tree amortized time issues #923

matthewsot opened this issue Feb 22, 2025 · 3 comments

Comments

@matthewsot
Copy link

Hello,

Thanks for your work on this package !

The documentation at https://juliacollections.github.io/DataStructures.jl/stable/splay_tree/#Splay-Tree-1 indicates the splay tree implementation should guarantee

Operations such as search, insert and delete can be done in O(log n) amortized time, where n is the number of nodes in the SplayTree

But I think this can be violated by the current implementation because it doesn't splay on unsuccessful searches, or on redundant insertions.

Here's a short test script to show the behavior; I have a particularly slow computer so you might need to increase the "10000" to see useful timing info:

using DataStructures

function test(N)
    if true
        t = SplayTree{Int}()
    else
        t = AVLTree{Int}()
    end

    for i in 1:N
        push!(t, i)
    end

    for i in 1:N
        if true
            push!(t, 1)
        else
            push!(t, -i)
        end
    end
end

for i in 1:10
    @time test(i * 10000)
end

In its default configuration, this:

  1. Pushes 1, 2, 3, ..., N, which (correctly) creates a splay tree that's just one long list (root is N, left child N-1, left child N-2, etc.).
  2. Tries to repeatedly re-push 1, which is already in the data structure. This causes it to traverse the entire length-N chain, but since the implementation doesn't splay on redundant pushes no reorganization happens and so this expensive operation happens every single time we search.

The result (at least on my machine) is very poor scaling behavior.

Notably, it is significantly faster if an AVL tree is used or if we push nonredundant elements, so I think the explanation above of the cause of the slowdown is correct. And I think it also affects, e.g., sequences of unsuccessful haskeys (replace push!(t, 1) with haskey(t, 0)).

I think the basic fix is to have search_node always splay the very last node it encounters, as suggested in the splay tree paper. Alternatively, if this is expected behavior, it would be nice IMO to document that. If this package is still maintained + there's interest in either of those two changes I'm happy to send a PR.

Also sorry if I've misunderstood something about the implementation or use of the package; I'm new to Julia.

@StephenVavasis
Copy link
Contributor

Just for your information: search, delete, and insert in SortedSet, SortedDict, and SortedMultiDict are all O(log n) operations. However, these data structures do not have the splay-tree property of faster access time to items that are accessed more frequently, so this doesn't directly fix your issue.

@eulerkochy
Copy link
Member

Great catch, btw!

I think the basic fix is to have search_node always splay the very last node it encounters, as suggested in the splay tree paper

Yeah, go ahead and implement this change. It's been a while since I last looked at this implementation in this much depth. Will be happy to review it. Make sure to change the function signature to search_node!(...) since you'll be modifying the structure of the tree.

On a related note, if you have some time on your hand, you can also add a benchmark script for different kind of trees in the benchmark/ folder.

@matthewsot
Copy link
Author

Thanks for the fast reply; I'll send something tomorrow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants