-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Preparation for Julia 0.6 #1164
Conversation
Too bad I just made the same changes in DataTables (JuliaData/DataTables.jl#9, JuliaData/DataTables.jl#8; FWIW the latter has a few more fixes). Maintaining the two repos in parallel leads to some waste of time... One issue is whether StatsModels is going to be able to support both DataFrames and DataTables at the same time. We likely need the three of them to use a common AbstractTable interface for that. Until then, I'd rather keep the modeling functions in DataFrames. Is the Juno dep a problem on 0.6? |
src/subdataframe/subdataframe.jl
Outdated
@@ -51,7 +51,7 @@ immutable SubDataFrame{T <: AbstractVector{Int}} <: AbstractDataFrame | |||
parent::DataFrame | |||
rows::T # maps from subdf row indexes to parent row indexes | |||
|
|||
function SubDataFrame(parent::DataFrame, rows::T) | |||
@compat function SubDataFrame{T}(parent::DataFrame, rows::T) where T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is T
really needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I just found the notes in https://github.com/JuliaLang/julia/pull/20308/files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet, shouldn't T<:AbstractVector{Int}
? Not sure what would happen without it, probably an error a bit later. In his PR Jeff added these (duplicated) constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I just followed the deprecation warning. Shouldn't harm to make that restriction though.
I agree. That is why I've been asking all the questions about
I concluded that it wasn't really feasible because
Yes. If you want to use the REPL 😄 (see my last comment). Juno is not sufficiently tested to capture these bugs so it is up to us to detect that |
That's cool, but we still need to define the functions in a separate interface package so that StatsModels can use them without DataFrames and DataTables conflicting. AbstractTables is the most logical candidate, but I'm not sure how stable is its current API. I'm not really in favor of keeping the Juno code, I just wanted to know whether it was one of the reasons for the test failures on 0.6. |
test/join.jl
Outdated
left = outer[!isna(outer[:Name]), :] | ||
inner = left[!isna(left[:Job]), :] | ||
right = outer[.!isna(outer[:Job]), [:Name, :ID, :Job]] | ||
left = outer[.!isna(outer[:Name]), :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will that work on 0.5? I would try with (!).(...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it won't. (!).(...)
is indeed necessary. Just realized that while fixing another package.
Updated. I've dropped 0.4 support, deleted the statsmodels and formula code but kept the contrasts code. However, the contrasts tests rely on |
src/subdataframe/subdataframe.jl
Outdated
@@ -51,14 +51,14 @@ immutable SubDataFrame{T <: AbstractVector{Int}} <: AbstractDataFrame | |||
parent::DataFrame | |||
rows::T # maps from subdf row indexes to parent row indexes | |||
|
|||
@compat function SubDataFrame{T}(parent::DataFrame, rows::T) where T | |||
function (::Type{SubDataFrame{T}}){T}(parent::DataFrame, rows::T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the new recommended syntax for inner constructors? I'm still confused about this, and the manual doesn't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure it is recommended but it works without warnings across 0.5 and 0.6 so I think it the right solution until we drop 0.5 support.
``` | ||
|
||
""" | ||
complete_cases!(df::AbstractDataFrame) = deleterows!(df, find(!complete_cases(df))) | ||
completecases!(df::AbstractDataFrame) = deleterows!(df, find(!completecases(df))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better do find(!, completecases(df))
as in DataTables to avoid allocating a copy.
src/abstractdataframe/io.jl
Outdated
@@ -181,7 +181,7 @@ end | |||
write(io, "</tr>") | |||
write(io, "</thead>") | |||
write(io, "<tbody>") | |||
tty_rows, tty_cols = _displaysize(io) | |||
tty_rows, tty_cols = Base.displaysize(io) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for Base.
here and below.
@@ -292,7 +292,6 @@ function contrasts_matrix(C::HelmertCoding, baseind, n) | |||
mat = mat[[baseind; 1:(baseind-1); (baseind+1):end], :] | |||
return mat | |||
end | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removes trailing whitespace. Note that the diff shows that this line had the whitespace removed in favor of an empty line below.
src/abstractdataframe/show.jl
Outdated
end | ||
|
||
@render Inline df::AbstractDataFrame _render(df) | ||
# using Juno |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove this altogether, that's easy to find in the git history if needed.
I remove all of the code related to modeling, or keep all of it. I don't like commenting out code. Other than that, looks good to me. |
test/runtests.jl
Outdated
"statsmodel.jl", | ||
"contrasts.jl"] | ||
"show.jl"#, | ||
# "contrasts.jl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line and I would say it's good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And remove the corresponding file...)
Until StatsModels works flawlessly with DataFrames, I think we should just copy the definition of |
The right solution might then be to make an update for 0.5 before these fixes where we deprecate the use of |
That sounds good to me. We should start an issue to track everything that's needed to get StatsModels working generically, or at least across DataFrames and DataTables. |
src/deprecated.jl
Outdated
@deprecate DataArray(df::AbstractDataFrame, T::DataType) convert(DataArray{T}, df) | ||
|
||
@deprecate read_rda(args...) FileIO.load(args...) | ||
|
||
@deprecate complete_cases(df) complete_cases(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deprecate complete_cases(df) completecases(df)
src/deprecated.jl
Outdated
@deprecate DataArray(df::AbstractDataFrame, T::DataType) convert(DataArray{T}, df) | ||
|
||
@deprecate read_rda(args...) FileIO.load(args...) | ||
|
||
@deprecate complete_cases(df) complete_cases(df) | ||
@deprecate complete_cases!(df) complete_cases!(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deprecate complete_cases!(df) completecases!(df)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given JuliaData/DataTables.jl#6 (comment) and linked issues could this be changed to deprecate complete_cases!(df)
to dropna!(df)
?
would need dropna
too I guess JuliaData/DataTables.jl#6 (comment)
|
Heh, whoops. Missed that. Thanks! |
Bump, what is the status here? Would be great if we had a tagged version of DataFrames that works on julia 0.6. I'm trying to move ExcelReaders (and then Query) to 0.6, but my CI builds won't pass until there is a tagged version of DataFrames that works on julia 0.6. |
DataArrays master works on 0.6 so once that's tagged, I think that version of DataArrays + this PR = a complete working DataFrames on 0.6. |
Though model formula support still isn't there AFAIK. I guess we could still tag a beta version without it to fix packages which don't need formulas (most of them). |
I've been traveling and busy but I hope to have time work a little on this again next week. |
I think it would be great if there was a version tagged with a REQUIRE line |
I've dealt with the formulas situation temporarily in #1170 by copying over the requisite definitions from StatsModels. That way we have something here that will work in the interim while StatsModels is made to work with DataFrames. |
head and tail from DataArrays since they are no longer defined there.
Disable Formula tests
Commment out statmodels tests
Remove statsmodels and formula
The plan is to remove that formula and modeling functionality and hopefully
StatsModels
will be able to handle that part. For now, I've commented out the formula and modeling related tests. To make this easier, I've loosened the requirement on theVector
input to theDataFrame
constructors such that the element type doesn't have to beAny
. With that change, theModelFrame
method inStatsModels
works.There are also various routine syntax updates that should be uncontroversial.
Finally, I plan to remove the dependency on Juno. It was controversial when added and has caused problems since added. E.g., right now
Juno
breaks the REPL and this is not captured byJuno
's tests. Getting nice printing ofDataFrames
inJuno
must be handled elsewhere.