Skip to content

Commit 1fec7ef

Browse files
committed
Update collapse cheat sheet to v2.0.3.
1 parent fd98cd8 commit 1fec7ef

File tree

4 files changed

+47
-34
lines changed

4 files changed

+47
-34
lines changed

collapse.pdf

-6.84 KB
Binary file not shown.

latex/collapse/collapse_cheat_sheet.Rnw

+47-34
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ iris2 <- copyv(iris, NA, NA)
113113

114114
{
115115
{\fontsize{22}{30}\selectfont \textcolor{Gray}{Advanced and Fast Data Transformation with \emph{collapse}}}{\Huge\ \textcolor{darkgray}{: : CHEAT SHEET}} %\\%\small{by Sebastian Krantz} %
116-
\vspace{2mm}
116+
% \vspace{2mm}
117117
}
118118

119119
%\begin{adjustbox}{totalheight=0.5\textheight} % -2\baselineskip
@@ -128,9 +128,9 @@ iris2 <- copyv(iris, NA, NA)
128128
%\colorbox{gray}{
129129
\textbf{\emph{collapse}} is a C/C++ based package supporting advanced (grouped, weighted, time series, panel data and recursive) statistical operations in R, with very efficient low-level vectorizations across both groups and columns. \\ [0.8em]
130130

131-
It also offers a flexible, class-agnostic, approach to data transformation in R: handling matrix and data frame based objects in a uniform, attribute preserving, way, and ensuring seamless compatibility with \emph{dplyr} / (grouped) \emph{tibble}, \emph{data.table}, \emph{xts}, \emph{sf} and \emph{plm} classes for panel data ('pseries', 'pdata.frame'). \\ [0.8em]
131+
It also offers a flexible, class-agnostic, approach to data transformation in R: handling matrix and data frame based objects in a uniform, attribute preserving, way, and ensuring seamless compatibility with base R, \emph{dplyr} / (grouped) \emph{tibble}, \emph{data.table}, \emph{xts/zoo}, \emph{sf}, and \emph{plm} classes for panel data. \\ [0.8em]
132132

133-
\emph{collapse} provides full control to the user for statistical programming - with several ways to reach the same outcome and rich optimization possibilities. Its default is \code{na.rm = TRUE}, and implemented at very low cost at the algorithm level. \\ [0.8em]
133+
\emph{collapse} provides full control to the user for statistical programming - with several ways to reach the same outcome and rich optimization possibilities. It is globally configurable using \code{set\_collapse()} which includes algorithm defaults, multithreading, and the exported namespace (see below). \\ [0.8em]
134134

135135
Calling \code{help("collapse-documentation")} brings up a detailed documentation, which is also available \href{https://sebkrantz.github.io/collapse/reference/index.html}{online}. See also the \href{https://fastverse.github.io/fastverse/}{\emph{fastverse}} package/project for a recommended set of complimentary packages and easy package management.
136136
%}
@@ -195,7 +195,7 @@ Sweeping out Statistics (by Reference)}
195195
\itxt{Fast functions to perform column–wise grouped and weighted computations on matrix-like objects}
196196
\newline
197197

198-
\quad \code{fmean, fmedian, fmode, fsum, fprod, fsd, fvar} \\
198+
\quad \code{fmean, fmedian, fmode, fsum, fprod, fsd, fvar,} \\
199199
\quad \code{fmin, fmax, fnth, ffirst, flast, fnobs, fndistinct} \newline
200200

201201
\textbf{Syntax} \newline
@@ -221,25 +221,32 @@ Sweeping out Statistics (by Reference)}
221221
fmean(AirPassengers) # Vector
222222
fmean(AirPassengers, w = cycle(AirPassengers)) # Weighted mean
223223
fmean(EuStockMarkets) # Matrix
224-
fmean(EuStockMarkets, drop = FALSE) # Don't drop dimensions
225-
fmean(airquality) # Data Frame (can also use drop = FALSE)
224+
fmean(airquality) # Data Frame (use drop = FALSE to keep frame)
226225
fmean(iris[1:4], g = iris$Species) # Grouped
227226
X = iris[1:4]; g = iris$Species; w <- abs(rnorm(nrow(X)))
228227
fmean(X, g, w) # Grouped and weighted (random weights)
229228
## Transfomrations: here centering data on the weighted group median
230-
TRA(X, fmedian(X, g, w), "-", g) |> head(3)
231-
fmedian(X, g, w, TRA = "-") |> head(3) # Same thing: more compact
229+
TRA(X, fmedian(X, g, w), "-", g) |> head(2)
230+
fmedian(X, g, w, TRA = "-") |> head(2) # Same thing: more compact
232231
fmedian(X, g, w, "-", set = TRUE) # Modify in-place (same as setTRA())
233-
head(iris, 3) # Changed iris too, as X = iris[1:4] did a shallow copy
234232
@
235-
236233
% \begin{addmargin}[2em]{0em}
237234
% \code{fmean(data[3:5], data\$grp1, data\$weights)\\
238235
% data \%>\% fgroup\_by(grp1) \%>\% fmean(weights)\\
239236
% TRA(mat, fmedian(mat, g), "-", g)\\
240237
% fmedian(mat, g, TRA = "-") \# same thing
241238
% }
242239
% \end{addmargin}
240+
\vspace{-1mm}
241+
\hrrule
242+
\section{Other Statistical Functions}
243+
% \vspace{1mm}
244+
\itxt{Fast (weighted) sample quantiles, range, and distances}\\
245+
\setstretch{1.5}
246+
\code{fquantile(x, probs, w, o, na.rm = TRUE, type = 7)}\\
247+
\code{frange(x, na.rm = TRUE)} \\
248+
\code{fdist(x, v, method = "euclidean", nthreads = 1)}\\
249+
\setstretch{1}
243250

244251
<<echo=FALSE, include=FALSE>>=
245252
iris <- iris2
@@ -260,7 +267,7 @@ iris <- iris2
260267

261268
\section{Grouping and Ordering}
262269
% \vspace{1mm}
263-
\itxt{Optimized functions for grouping, ordering, unique values, splitting \& recombining, and dealing with factors}
270+
\itxt{Optimized functions for grouping, ordering, unique values, matching, splitting, and dealing with factors}
264271
\newline
265272

266273
\code{GRP()} - create a grouping object (class 'GRP'): pass to \code{g} arg. %\newline
@@ -275,11 +282,11 @@ fndistinct(iris[1:4], g) # Computation without grouping overhead
275282
mtcars |> fgroup_by(cyl, vs, am) |> ss(1:2)
276283
# Group Stats: [N. groups | mean (sd) min-max of group sizes]
277284
# Fast Functions also have a grouped_df method: here wt-weighted medians
278-
mtcars |> fgroup_by(cyl, vs, am) |> fmedian(wt) |> head(3)
285+
mtcars |> fgroup_by(cyl, vs, am) |> fmedian(wt) |> head(2)
279286
@
280287
%\qquad {\scriptsize \textcolor{darkgray}{\emph{Group Stats:} N. groups $|$ Mean (Std. Dev.) Min-Max of group sizes}} \newline
281288

282-
\code{GRPN(), fgroup\_vars(), fungroup()} - get group count,\\ \qquad grouping columns/variables, and ungroup data\\ [0.5em]
289+
\code{GRPN(), fcount[v](), fgroup\_vars(), fungroup()} - get group count, grouping columns, and ungroup data\\ [0.5em]
283290
\code{qF(), qG()} - quick \code{as.factor}, and vector grouping object\\ \qquad of class 'qG': a factor-light without levels attribute\\
284291
\setstretch{1.5}
285292
\code{group()} - (multivariate) group id ('qG') in appearance order\\
@@ -289,7 +296,8 @@ mtcars |> fgroup_by(cyl, vs, am) |> fmedian(wt) |> head(3)
289296
\code{radixorder[v]()} - (multivariate) radix-based ordering\\
290297
\code{finteraction()} - fast factor interactions (or return 'qG')\\
291298
\code{fdroplevels()} - fast removal of unused factor levels\\
292-
\code{f[n]unique()} - fast unique values / rows (by columns)\\
299+
\code{f[n]unique(), fduplicated()} - fast unique values / rows\\
300+
\code{fmatch(), \%[!][i]in\%} - fast matching of values / rows\\
293301
\code{gsplit()} - fast splitting vector based on 'GRP' objects\\
294302
\code{greorder()} - efficiently reorder \code{y = unlist(gsplit(x, g))}\\ \qquad such that \code{identical(greorder(y, g), x)}
295303
\setstretch{1}
@@ -303,11 +311,11 @@ f <- qF(v, na.exclude = FALSE) # Adds 'na.included' class: no NA checks
303311
gv <- group(v) # 'qG' object: first appearance order, with 'na.included'
304312
microbenchmark(fmode(X, v), fmode(X, f), fmode(X, gv), fmode(X, g))
305313
@
306-
% \vspace{-2mm}
314+
\vspace{-1mm}
307315

308316
\hrrule
317+
\vspace{-2mm}
309318
\section{Quick Conversions}
310-
% \vspace{1mm}
311319
\itxt{Fast and exact conversion of common data objects} \\ [0.5em]
312320
\code{qM(), qDF(), qDT(), qTBL()} - convert vectors, arrays, data.frames or lists to matrix, data.frame, data.table or tibble\\ [0.5em]
313321
\code{m[r|c]tl()} - matrix rows/cols to list, data.frame or data.table\\ [0.5em]
@@ -333,7 +341,9 @@ microbenchmark(fmode(X, v), fmode(X, f), fmode(X, gv), fmode(X, g))
333341
\code{get\_vars[<-]()} - select/replace columns (standard eval.)\\ [0.5em]
334342
\setstretch{1}
335343
\code{[num|cat|char|fact|logi|date]\_vars[<-]()} - select/\\ \qquad replace columns by data type or retrieve names/indices\\ [0.5em]
336-
\code{add\_vars[<-]()} - add or column-bind columns \newline
344+
\code{add\_vars[<-]()} - add or column-bind columns\\
345+
\code{rowbind()} - row-bind lists / data frame-like objects\\ [0.5em]
346+
\code{join(), pivot()} - join and reshape data frame-like objects \newline
337347

338348
\textbf{Examples}
339349
<<>>=
@@ -353,16 +363,15 @@ mtcars %>% ftransform(fselect(., hp:qsec) %>% fmedian(cyl, TRA = 1) %>%
353363
fsum(TRA = "/", set = TRUE)) %>% i()
354364
# Aggregation: weighted standard deviations
355365
mtcars |> fgroup_by(vs) |> fsummarise(across(disp:drat, fsd, w = wt))
356-
# Grouped linear models: .apply = FALSE applies functions to DF subset
357-
qTBL(mtcars) |> fgroup_by(vs) |> fsummarise(across(disp:drat,
358-
function(x) list(models = list(lm(disp ~., x))), .apply = FALSE))
366+
# Grouped linear models (one way of doing it)
367+
qTBL(mtcars) |> fgroup_by(vs) |> fsummarise(reg = list(lm(mpg ~ carb)))
359368
# Adding some columns. Use ftransform<- to also replace existing ones
360369
add_vars(iris) <- num_vars(iris) |> fsum(TRA = '%') |> add_stub("perc_")
361370
@
362371
<<echo=FALSE, include=FALSE>>=
363372
iris <- iris2
364373
@
365-
% \vspace{-2mm}
374+
\vspace{-2mm}
366375

367376

368377
\hrrule
@@ -378,17 +387,21 @@ iris <- iris2
378387
# Population weighted mean (PCGDP, LIFEEX) & mode (country), and sum(POP)
379388
collap(wlddev, country + PCGDP + LIFEEX ~ income, w = ~ POP)
380389
@
390+
%\vspace{-2mm}
381391
\end{multicols} % \vspace{-20mm}
392+
%\vspace{20mm}
382393

383394
% \end{adjustbox}
384395
%}
385396
% \hrrule
386397
\vspace{-5mm}
387398
\textcolor{lightgray}{\hrulefill}\\
388399
{\scriptsize \vspace{-0.5mm}
389-
Page 1 of 2 \hfill \href{https://creativecommons.org/licenses/by-sa/4.0/}{CC-BY-SA}\ Sebastian Krantz\ \textbullet\ Learn more at \href{https://sebkrantz.github.io/collapse/}{sebkrantz.github.io/collapse}\ \textbullet\ Source code at \href{https://github.com/SebKrantz/collapse}{github.com/SebKrantz/collapse}\ \textbullet\ Updates announced at \href{https://twitter.com/collapse\_R}{twitter.com/collapse\_R} - \#rcollapse\ \textbullet\ Cheatsheet created for \emph{collapse} version 1.8.8\ \textbullet\ Updated: 2022-08
400+
Page 1 of 2 \hfill \href{https://creativecommons.org/licenses/by-sa/4.0/}{CC-BY-SA}\ Sebastian Krantz\ \textbullet\ Learn more at \href{https://sebkrantz.github.io/collapse/}{sebkrantz.github.io/collapse}\ \textbullet\ Source code at \href{https://github.com/SebKrantz/collapse}{github.com/SebKrantz/collapse}\ \textbullet\ Updates announced at \href{https://twitter.com/collapse\_R}{twitter.com/collapse\_R} - \#rcollapse\ \textbullet\ Cheatsheet created for \emph{collapse} version 2.0.3\ \textbullet\ Updated: 2023-10
390401
}
391402

403+
404+
392405
\newpage
393406

394407
% ------------------------------------------------------------------
@@ -600,7 +613,7 @@ nest_coef |> unlist2d(c("vs", "am"), row.names = "variable") |> head(2)
600613
\section{(Memory) Efficient Programming}
601614
\itxt{Functions for (memory) efficient R programming}\\ [0.5em]
602615

603-
\code{any|all[v|NA]}, \code{which[v|NA]}, \code{\%[=|!]=\%}, \code{copyv}, \code{setv}, \code{alloc} \code{missing\_cases}, \code{na\_[insert|rm|omit]}, \code{vlengths}, \code{vtypes}, \code{vgcd}, \code{frange}, \code{fnlevels}, \code{fn[row|col]}, \code{fdim}, \code{seq\_[row|col]}\\
616+
\code{any|all[v|NA]}, \code{which[v|NA]}, \code{\%[=|!]=\%}, \code{copyv}, \code{setv}, \code{alloc} \code{missing\_cases}, \code{na\_[insert|rm|omit]}, \code{vlengths}, \code{vtypes}, \code{vgcd}, \code{fnlevels}, \code{fn[row|col]}, \code{fdim}, \code{seq\_[row|col]}, \code{vec}\\
604617
<<eval = FALSE>>=
605618
fsubset(wlddev, year %==% 2010) # 2x faster fsubset(wlddev, year == 2010)
606619
attach(mtcars) # Efficient sub-assignment by reference, various options...
@@ -620,7 +633,7 @@ setv(am, 0, vs); setv(am, 1:10, vs); setv(am, 1:10, vs[10:20])
620633
\hrrule
621634
\section{Small (Helper) Functions}
622635
\itxt{Functions for (meta-)programming and attributes}\\ [0.5em]
623-
\code{.c}, \code{massign}, \code{\%=\%}, \code{vlabels[<-]}, \code{setLabels}, \code{vclasses}, \code{namlab}, \code{[add|rm]\_stub}, \code{\%!in\%}, \code{ckmatch}, \code{all\_identical}, \code{all\_obj\_equal}, \code{all\_funs}, \code{set[Dim|Row|Col]names}, \code{unattrib}, \code{setAttrib}, \code{copyAttrib}, \code{copyMostAttrib} %, \code{is\_categorical}, \code{is\_date}
636+
\code{.c}, \code{massign}, \code{\%=\%}, \code{vlabels[<-]}, \code{setLabels}, \code{vclasses}, \code{namlab}, \code{[add|rm]\_stub}, \code{all\_identical}, \code{all\_obj\_equal}, \code{all\_funs}, \code{set[Dim|Row|Col]names}, \code{unattrib}, \code{setAttrib}, \code{copyAttrib}, \code{copyMostAttrib}, \code{is\_categorical}, \code{is\_date}
624637

625638
<<include=FALSE, echo=FALSE>>=
626639
wlddev <- wlddev2
@@ -635,23 +648,23 @@ namlab(wlddev[c(2, 9)], N = TRUE, Ndist = TRUE, class = TRUE)
635648

636649

637650
\hrrule
638-
\section{API Extensions}
651+
\section{API Extensions and Global Options}
639652
\itxt{Shorthands for frequently used functions}\\ [0.5em]
640653
\code{fselect -> slt, fsubset -> sbt, fmutate -> mtt, [f/set]transform[v] -> [set]tfm[v], fsummarise -> smr,
641654
across -> acr, fgroup\_by -> gby, finteraction -> itn, findex\_by -> iby, findex -> ix, frename -> rnm, get\_vars -> gv, num\_vars -> nv,
642655
add\_vars -> av} \newline
643656

644-
\itxt{Namespace masking}\\ [0.5em]
645-
Can set \code{option(collpse\_mask = c(...))} with a vector of functions starting with f-, to export versions without f-, masking base R or \emph{dplyr}. A few keywords exist to mask multiple functions, see \code{help("collapse-options")}. This allows clean \& fast code, but poses additional namespace challenges:
657+
\itxt{Namespace masking and other global options}\\ [0.5em]
658+
Use \code{set\_collpse(mask = c(...))} with a vector of functions starting with f-, to export versions without f-, masking base R and/or \emph{dplyr}. A few keywords exist to mask multiple functions, see \code{help("collapse-options")}. There are also many other global defaults and optimizations that can be controlled with \code{set\_collapse(...)}. Retrieve options using \code{get\_collapse()}.
646659

647660
<<eval = FALSE>>=
648-
# Masking all f- functions and specials n = GRPN and table = qtab
649-
options(collapse_mask = "all")
661+
# Masking all (f-)functions and changing some defaults (=optimizing)
650662
library(collapse)
651-
# The folowing is 100% collapse code, apart from the base pipe
652-
663+
set_collapse(mask = "all", na.rm = FALSE, sort = FALSE, nthreads = 4)
664+
# The following is now 100% collapse code and executed without regard for
665+
# missing values, using unsorted grouping and 4 threads (where applicable)
653666
wlddev |>
654-
subset(year >= 1990) |>
667+
subset(year >= 1990 & is.finite(GINI)) |>
655668
group_by(year) |>
656669
summarise(n = n(), across(PCGDP:GINI, mean, w = POP))
657670
@@ -671,14 +684,14 @@ wlddev |>
671684
PCGDP_growth = growth(PCGDP)) |> unindex()
672685
673686
@
674-
The best way to set this option is inside an \code{.Rprofile} file placed in the user or project directory. Use it carefully.
687+
675688

676689
\end{multicols}
677690

678691
\vspace{-5.5mm}
679692
\textcolor{lightgray}{\hrulefill}\\
680693
{\scriptsize \vspace{-0.5mm}
681-
Page 2 of 2 \hfill \href{https://creativecommons.org/licenses/by-sa/4.0/}{CC-BY-SA}\ Sebastian Krantz\ \textbullet\ Learn more at \href{https://sebkrantz.github.io/collapse/}{sebkrantz.github.io/collapse}\ \textbullet\ Source code at \href{https://github.com/SebKrantz/collapse}{github.com/SebKrantz/collapse}\ \textbullet\ Updates announced at \href{https://twitter.com/collapse\_R}{twitter.com/collapse\_R} - \#rcollapse\ \textbullet\ Cheatsheet created for \emph{collapse} version 1.8.8\ \textbullet\ Updated: 2022-08
694+
Page 2 of 2 \hfill \href{https://creativecommons.org/licenses/by-sa/4.0/}{CC-BY-SA}\ Sebastian Krantz\ \textbullet\ Learn more at \href{https://sebkrantz.github.io/collapse/}{sebkrantz.github.io/collapse}\ \textbullet\ Source code at \href{https://github.com/SebKrantz/collapse}{github.com/SebKrantz/collapse}\ \textbullet\ Updates announced at \href{https://twitter.com/collapse\_R}{twitter.com/collapse\_R} - \#rcollapse\ \textbullet\ Cheatsheet created for \emph{collapse} version 2.0.3\ \textbullet\ Updated: 2023-10
682695
}
683696

684697
\end{document}
-6.84 KB
Binary file not shown.

pngs/collapse.png

1.13 MB
Loading

0 commit comments

Comments
 (0)