-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using key in list for performance needs more qualification #70
Comments
Yes, you are right, we need an additional of the pros and cons of lists vs. keys. I will write it within the next week. Would be perfect if you could be available as a reviewer :) By the way: I just tried to reproduce your numbers with a slightly different testcase: def f_a():
my_list = list()
for i in range(500):
my_list.add(i)
bool(3 in my_list)
def f_b():
my_set = set()
for i in range(500):
my_set.add(i)
bool(3 in my_set)
%timeit f_a()
# 10000 loops, best of 3: 27.6 µs per loop
%timeit f_b()
# 10000 loops, best of 3: 33 µs per loop This time the difference in execution speed is not that big... |
Yes, ping me in the PR. [edit because I am wrong] |
Python sets are implemented with hash tables, so their creation is still O(n). Although there's more constant-time overhead for sets vs. lists - as well as for adding/appending vs. pre-allocating - it's all asymptotically O(n). And the timing examples are combining the creation time with just 1 membership test. Naturally if you already have a list, and are only testing membership once, then there's no point in converting to a set. But I agree there's a general anti-pattern here; it just needs more context, such as repeated membership testing in a loop. In mentoring Pythoneers, I have definitely seen the pattern of over list-ifying, instead of using the right data structure. It seems to stem from not trusting duck-typing and iteration as a protocol. |
Yeah, I am wrong. Should not post comments before coffee. A bit more context was all I was asking for. |
Ok, we will provide more context in the article. It would be great to have some links concerning the time complexities. I only know https://wiki.python.org/moin/TimeComplexity. But this page does not contain any informations about the time needed for creation of |
I found this issue having just read and tested this anti-pattern, so I agree some further clarification would be good. Perhaps simply changing the title to 'repeatedly using key in list', with an explanation of the complexities and practicalities below (O(n) set creation + O(1) checks vs O(n) checks)? |
http://docs.quantifiedcode.com/python-anti-patterns/performance/using_key_in_list_to_check_if_key_is_contained_in_a_list.html
gives
It is only worth paying the cost of creating a set (
O(n log n)
) to get theO(log n)
loop up speed if you are going to be doing it many times.The text was updated successfully, but these errors were encountered: