Skip to content

Commit a4c0ea1

Browse files
authored
Merge pull request #282 from queryverse/naoperators
Add @dropna, @replacena and @dissallowna
2 parents 198c050 + 642d1e9 commit a4c0ea1

File tree

7 files changed

+244
-6
lines changed

7 files changed

+244
-6
lines changed

.github/workflows/jlpkgbutler-ci-master-workflow.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
runs-on: ${{ matrix.os }}
1111
strategy:
1212
matrix:
13-
julia-version: [1.0.5, 1.1.1, 1.2.0, 1.3.0]
13+
julia-version: [1.3.0]
1414
julia-arch: [x64, x86]
1515
os: [ubuntu-latest, windows-latest, macOS-latest]
1616
exclude:

.github/workflows/jlpkgbutler-ci-pr-workflow.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ jobs:
99
runs-on: ${{ matrix.os }}
1010
strategy:
1111
matrix:
12-
julia-version: [1.0.5, 1.1.1, 1.2.0, 1.3.0]
12+
julia-version: [1.3.0]
1313
julia-arch: [x64, x86]
1414
os: [ubuntu-latest, windows-latest, macOS-latest]
1515
exclude:

Project.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
1616

1717
[compat]
1818
IterableTables = "0.8.2, 0.9, 0.10, 0.11, 1"
19-
julia = "1"
19+
julia = "1.3"
2020
QueryOperators = "0.9.1"
21-
DataValues = "0.4.4"
22-
MacroTools = "0.4.4"
21+
DataValues = "0.4.4"
22+
MacroTools = "0.4.4, 0.5"
2323

2424
[targets]
2525
test = ["Statistics", "Test", "DataFrames"]

docs/src/standalonequerycommands.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -365,3 +365,168 @@ println(q)
365365
│ 2 │ Banana │ 6 │ 10.0 │ false │
366366
│ 3 │ Cherry │ 1000 │ 1000.8 │ false │
367367
```
368+
369+
## The `@dropna` command
370+
371+
The `@dropna` command has the form `source |> @dropna(columns...)`. `source` can be any source that can be queried and that has a table structure. If `@dropna()` is called without any arguments, it will drop any row from `source` that has a missing `NA` value in _any_ of its columns. Alternatively one can pass a list of column names to `@dropna`, in which case it will only drop rows that have a `NA` value in one of those columns.
372+
373+
Our first example uses the simple version of `@dropna()` that drops rows that have a missing value in any column:
374+
375+
```jldoctest
376+
using Query, DataFrames
377+
378+
df = DataFrame(a=[1,2,3], b=[4,missing,5])
379+
380+
q = df |> @dropna() |> DataFrame
381+
382+
println(q)
383+
384+
# output
385+
386+
2×2 DataFrame
387+
│ Row │ a │ b │
388+
│ │ Int64 │ Int64 │
389+
├─────┼───────┼───────┤
390+
│ 1 │ 1 │ 4 │
391+
│ 2 │ 3 │ 5 │
392+
```
393+
394+
The next example only drops rows that have a missing value in the `b` column:
395+
396+
```jldoctest
397+
using Query, DataFrames
398+
399+
df = DataFrame(a=[1,2,3], b=[4,missing,5])
400+
401+
q = df |> @dropna(:b) |> DataFrame
402+
403+
println(q)
404+
405+
# output
406+
407+
2×2 DataFrame
408+
│ Row │ a │ b │
409+
│ │ Int64 │ Int64 │
410+
├─────┼───────┼───────┤
411+
│ 1 │ 1 │ 4 │
412+
│ 2 │ 3 │ 5 │
413+
```
414+
415+
We can specify as many columns as we want:
416+
417+
```jldoctest
418+
using Query, DataFrames
419+
420+
df = DataFrame(a=[1,2,3], b=[4,missing,5])
421+
422+
q = df |> @dropna(:b, :a) |> DataFrame
423+
424+
println(q)
425+
426+
# output
427+
428+
2×2 DataFrame
429+
│ Row │ a │ b │
430+
│ │ Int64 │ Int64 │
431+
├─────┼───────┼───────┤
432+
│ 1 │ 1 │ 4 │
433+
│ 2 │ 3 │ 5 │
434+
```
435+
436+
## The `@dissallowna` command
437+
438+
The `@dissallowna` command has the form `source |> @dissallowna(columns...)`. `source` can be any source that can be queried and that has a table structure. If `@dissallowna()` is called without any arguments, it will check that there are no missing `NA` values in any column in any row of the input table and convert the element type of each column to one that cannot hold missing values. Alternatively one can pass a list of column names to `@dissallowna`, in which case it will only check for `NA` values in those columns, and only convert those columns to a type that cannot hold missing values.
439+
440+
Our first example uses the simple version of `@dissallowna()` that makes sure there are no missing values anywhere in the table. Note how the column type for column `a` is changed to `Int64` in this example, i.e. an element type that does not support missing values:
441+
442+
```jldoctest
443+
using Query, DataFrames
444+
445+
df = DataFrame(a=[1,missing,3], b=[4,5,6])
446+
447+
q = df |> @filter(!isna(_.a)) |> @dissallowna() |> DataFrame
448+
449+
println(q)
450+
451+
# output
452+
453+
2×2 DataFrame
454+
│ Row │ a │ b │
455+
│ │ Int64 │ Int64 │
456+
├─────┼───────┼───────┤
457+
│ 1 │ 1 │ 4 │
458+
│ 2 │ 3 │ 6 │
459+
```
460+
461+
The next example only checks the `b` column for missing values:
462+
463+
```jldoctest
464+
using Query, DataFrames
465+
466+
df = DataFrame(a=[1,2,missing], b=[4,missing,5])
467+
468+
q = df |> @filter(!isna(_.b)) |> @dissallowna(:b) |> DataFrame
469+
470+
println(q)
471+
472+
# output
473+
474+
2×2 DataFrame
475+
│ Row │ a │ b │
476+
│ │ Int64⍰ │ Int64 │
477+
├─────┼─────────┼───────┤
478+
│ 1 │ 1 │ 4 │
479+
│ 2 │ missing │ 5 │
480+
```
481+
482+
## The `@replacena` command
483+
484+
The `@replacena` command has a simple and full version.
485+
486+
The simple form is `source |> @replacena(replacement_value)`. `source` can be any source that can be queried and that has a table structure. In this case all missing `NA` values in the source table will be replaced with `replacement_value`. Not that this version only works properly, if all columns that contain missing values have the same element type.
487+
488+
The full version has the form `source |> @replacena(replacement_specifier...)`. `source` can again be any source that can be queried that has a table structure. Each `replacement_specifier` should be a `Pair` of the form `column_name => replacement_value`. For example `:b => 3` means that all missing values in column `b` should be replaced with the value 3. One can specify as many `replacement_specifier`s as one wishes.
489+
490+
The first example uses the simple form:
491+
492+
```jldoctest
493+
using Query, DataFrames
494+
495+
df = DataFrame(a=[1,missing,3], b=[4,5,6])
496+
497+
q = df |> @replacena(0) |> DataFrame
498+
499+
println(q)
500+
501+
# output
502+
503+
3×2 DataFrame
504+
│ Row │ a │ b │
505+
│ │ Int64 │ Int64 │
506+
├─────┼───────┼───────┤
507+
│ 1 │ 1 │ 4 │
508+
│ 2 │ 0 │ 5 │
509+
│ 3 │ 3 │ 6 │
510+
```
511+
512+
The next example uses a different replacement value for column `a` and `b`:
513+
514+
```jldoctest
515+
using Query, DataFrames
516+
517+
df = DataFrame(a=[1,2,missing], b=["One",missing,"Three"])
518+
519+
q = df |> @replacena(:b=>"Unknown", :a=>0) |> DataFrame
520+
521+
println(q)
522+
523+
# output
524+
525+
3×2 DataFrame
526+
│ Row │ a │ b │
527+
│ │ Int64 │ String │
528+
├─────┼───────┼─────────┤
529+
│ 1 │ 1 │ One │
530+
│ 2 │ 2 │ Unknown │
531+
│ 3 │ 0 │ Three │
532+
```

src/Query.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ export @from, @query, @count, Grouping, key
1010
export @map, @filter, @groupby, @orderby, @orderby_descending, @unique,
1111
@thenby, @thenby_descending, @groupjoin, @join, @mapmany, @take, @drop
1212

13-
export @select, @rename, @mutate
13+
export @select, @rename, @mutate, @dissallowna, @dropna, @replacena
1414

1515
export isna, NA
1616

src/table_query_macros.jl

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,3 +186,40 @@ macro mutate(args...)
186186

187187
return :( Query.@map( $prev ) ) |> esc
188188
end
189+
190+
our_get(x) = x
191+
our_get(x::DataValue) = get(x)
192+
193+
our_get(x, y) = x
194+
our_get(x::DataValue, y) = get(x, y)
195+
196+
macro dissallowna()
197+
return :( Query.@map(map(our_get, _)) )
198+
end
199+
200+
macro dissallowna(columns...)
201+
return :( Query.@mutate( $( ( :( $(columns[i].value) = our_get(_.$(columns[i].value)) ) for i=1:length(columns) )... ) ) )
202+
end
203+
204+
macro dropna()
205+
return :( i-> i |> Query.@filter(!any(isna, _)) |> Query.@dissallowna() )
206+
end
207+
208+
macro dropna(columns...)
209+
return :( i-> i |> Query.@filter(!any(($((:(isna(_.$(columns[i].value))) for i in 1:length(columns) )...),))) |> Query.@dissallowna($(columns...)) )
210+
end
211+
212+
macro replacena(arg, args...)
213+
if length(args)==0 && !(arg isa Expr && arg.head==:call && length(arg.args)==3 && arg.args[1]==:(=>))
214+
return :( Query.@map(map(i->our_get(i, $arg), _)) )
215+
else
216+
args = [arg; args...]
217+
218+
all(i isa Expr && i.head==:call && length(i.args)==3 && i.args[1]==:(=>) for i in args) || error("Invalid syntax.")
219+
220+
columns = map(i->i.args[2].value, args)
221+
replacement_values = map(i->i.args[3], args)
222+
223+
return :( Query.@mutate( $( ( :( $(columns[i]) = our_get(_.$(columns[i]), $(replacement_values[i])) ) for i=1:length(columns) )... ) ) )
224+
end
225+
end

test/test_macros.jl

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,39 @@ end
4949
closure_val = 1
5050
@test DataFrame(df |> @mutate(foo = closure_val)) == DataFrame(foo=[1,1,1], bar=[3.,2.,1.], bat=["a","b","c"])
5151
end
52+
53+
@testset "@dropna" begin
54+
55+
df = DataFrame(a=[1,missing,3], b=[1.,2.,3.])
56+
57+
@test df |> @dropna() |> collect == [(a=1,b=1.), (a=3, b=3.)]
58+
@test df |> @filter(!any(isna, _)) |> @dropna() |> collect == [(a=1,b=1.), (a=3, b=3.)]
59+
@test df |> @select(:b) |> @dropna() |> collect == [(b=1.,),(b=2.,),(b=3.,)]
60+
61+
@test df |> @dropna(:a) |> collect == [(a=1,b=1.), (a=3, b=3.)]
62+
@test df |> @dropna(:b) |> collect == [(a=DataValue(1),b=1.), (a=DataValue{Int}(),b=2.),(a=DataValue(3), b=3.)]
63+
@test df |> @dropna(:a, :b) |> collect == [(a=1,b=1.), (a=3, b=3.)]
64+
end
65+
66+
@testset "@replacena" begin
67+
68+
df = DataFrame(a=[1,missing,3], b=[1.,2.,3.])
69+
70+
@test df |> @replacena(2) |> collect == [(a=1,b=1.), (a=2, b=2.), (a=3, b=3.)]
71+
@test df |> @dropna() |> @replacena(2) |> collect == [(a=1,b=1.), (a=3, b=3.)]
72+
@test df |> @select(:b) |> @replacena(2) |> collect == [(b=1.,),(b=2.,),(b=3.,)]
73+
74+
@test df |> @replacena(:a=>2) |> collect == [(a=1,b=1.), (a=2, b=2.), (a=3, b=3.)]
75+
@test df |> @replacena(:b=>2) |> collect == [(a=DataValue(1),b=1.), (a=DataValue{Int}(),b=2.),(a=DataValue(3), b=3.)]
76+
@test df |> @replacena(:a=>2, :b=>8) |> collect == [(a=1,b=1.), (a=2, b=2.), (a=3, b=3.)]
77+
end
78+
79+
@testset "@dissallowna" begin
80+
81+
df = DataFrame(a=[1,missing,3], b=[1.,2.,3.])
82+
83+
@test_throws DataValueException df |> @dissallowna() |> collect
84+
@test df |> @filter(!any(isna, _)) |> @dissallowna() |> collect == [(a=1,b=1.), (a=3, b=3.)]
85+
@test_throws DataValueException df |> @dissallowna(:a) |> collect
86+
@test df |> @dissallowna(:b) |> collect == [(a=DataValue(1),b=1.), (a=DataValue{Int}(),b=2.),(a=DataValue(3), b=3.)]
87+
end

0 commit comments

Comments
 (0)