This repository was archived by the owner on May 4, 2019. It is now read-only.
Change unique() to return values in the same ordering as levels for PDAs #237
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While the generic unique() method says it preserves the order of appearance,
the ordering of levels is more likely to be useful. In particular, it will
allow StatsModels to use unique() to get levels present in the data in the
user-defined order, with the first level as reference by default.
The new code (inspired by CategoricalArrays) is also more efficient in the
common case where all values are encountered well before the end of the array,
by doing a periodic short-circuiting check.
See JuliaStats/StatsModels.jl#13 (comment). The current behavior was chosen after discussion at #92, but it looks like the issues where mainly about
DataArray
, notPooledDataArray
. It's slightly annoying to deviate from the standard behavior ofunique
, but I can't think of cases where the order of appearance would be more useful than the ordering of levels, which should be carefully chosen (else using a PDA doesn't make much sense).The other solution is to provide a separate function for this, but that sounds overkill.