-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve docstrings for unique() #20800
Conversation
base/multidimensional.jl
Outdated
Returns an array containing only the unique elements of the iterable `itr`, in | ||
the order that the first of each set of equivalent elements originally appears. | ||
If `dim` is specified, returns unique regions of the array `itr` along `dim`. | ||
Return an array containing only the unique elements of array `A`, as determined by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A
being identified as an AbstractArray
in the docstring signature, perhaps nix the redundant "array" just before A
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it doesn't hurt, can still be useful if you didn't read the signature carefully.
base/multidimensional.jl
Outdated
the order that the first of each set of equivalent elements originally appears. | ||
If `dim` is specified, returns unique regions of the array `itr` along `dim`. | ||
Return an array containing only the unique elements of array `A`, as determined by | ||
[`isequal`](@ref). If `dim` is specified, returns unique regions of `A` along |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the second sentence be imperative as well?
I'm reluctant not to guarantee the ordering property since it is quite useful. cc @JeffBezanson |
@StefanKarpinski Then I hadn't understood what you meant. I'm fine with keeping that guarantee (though I've yet to find a case where it's really useful), but then we need another solution to get the unique values of |
How about this... Elements in the collection returned by
|
That makes sense, but there's still something which bothers me: the order of appearance is also defined for categorical arrays, so if there are cases in which that ordering is indeed useful, not following it is a problem. Conversely, when working with categorical data (e.g. in StatsModels), we would like to get either the levels in the user-defined custom ordering (for categorical arrays), or the levels in their lexical ordering (for So I'd rather take a more consistent approach. If we think order of appearance can be useful, let's always follow it when it exists (i.e. all the time except for sets, where order is arbitrary); else let's just drop it completely. Then, is the idea of a "natural ordering" of arrays something that should live in Base? If so, we could add a boolean argument to |
@StefanKarpinski Your final word? :-) |
FWIW Pandas's categorical type preserves the order of appearance with |
Fix the signature of the AbstractArray method, add a mention of isequal() and of ordering, and use imperative form.
I've abandoned the idea of changing the definition of Though, since we've spent some time reviewing the docstrings, I have pushed a commit with some minor changes, which should be uncontroversial. |
Remove any mention of ordering of the result, since for some custom array
types an order can be more natural, useful or efficient than the order of
appearance. Also fix the signature of the AbstractArray method, and add
a mention of isequal() to it. Use imperative form.