Skip to content

Commit e0a7e84

Browse files
committed
first commit
1 parent a5793bc commit e0a7e84

File tree

160 files changed

+9844
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

160 files changed

+9844
-1
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.DS_Store

README.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,23 @@
1-
# Developing_Data_Products
1+
# Developing Data Products
2+
23
Developing Data Products Course from the Johns Hopkins Data Science Lab
4+
5+
Course Site: http://datasciencespecialization.github.io/Developing_Data_Products
6+
7+
### Contributors
8+
9+
* Brian Caffo
10+
* Jeff Leek
11+
* Roger Peng
12+
* Sean Kross
13+
14+
### License
15+
16+
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
17+
18+
### Building this Site
19+
20+
1. `cd` to this repository.
21+
2. `Rscript build.R`
22+
23+
`Rscript build.R clean` to start from scratch.

RPackages/example.R

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
#' Building a Model with Top Ten Features
2+
#'
3+
#' This function develops a prediction algorithm based on the top 10 features
4+
#' in 'x' that are most predictive of 'y'.
5+
#'
6+
#' @param x a n x p matrix of n observations and p predictors
7+
#' @param y a vector of length n representing the response
8+
#' @return a 'lm' object representing the linear model with the top 10 predictors
9+
#' @author Roger Peng
10+
#' @details
11+
#' This function runs a univariate regression of y on each predictor in x and
12+
#' calculates the p-value indicating the significance of the association. The
13+
#' final set of 10 predictors is the taken from the 10 smallest p-values.
14+
#' @seealso \code{lm}
15+
#' @import stats
16+
#' @export
17+
18+
topten <- function(x, y) {
19+
p <- ncol(x)
20+
if(p < 10)
21+
stop("there are less than 10 predictors")
22+
pvalues <- numeric(p)
23+
for(i in seq_len(p)) {
24+
fit <- lm(y ~ x[, i])
25+
summ <- summary(fit)
26+
pvalues[i] <- summ$coefficients[2, 4]
27+
}
28+
ord <- order(pvalues)
29+
x10 <- x[, ord]
30+
fit <- lm(y ~ x10)
31+
coef(fit)
32+
}
33+
34+
#' Prediction with Top Ten Features
35+
#'
36+
#' This function takes a set coefficients produced by the \code{topten}
37+
#' function and makes a prediction for each of the values provided in the
38+
#' input 'X' matrix.
39+
#'
40+
#' @param X a n x 10 matrix containing n observations
41+
#' @param b a vector of coefficients obtained from the \code{topten} function
42+
#' @return a numeric vector containing the predicted values
43+
44+
predict10 <- function(X, b) {
45+
X <- cbind(1, X)
46+
drop(X %*% b)
47+
}

RPackages/rpackages.Rmd

Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
---
2+
title : Building R Packages
3+
author : Roger D. Peng, Professor of Biostatistics
4+
output:
5+
ioslides_presentation:
6+
logo: ../../img/bloomberg_shield.png
7+
beamer_presentation: default
8+
always_allow_html: yes
9+
---
10+
11+
## What is an R Package?
12+
13+
- A mechanism for extending the basic functionality of R
14+
- A collection of R functions, or other (data) objects
15+
- Organized in a systematic fashion to provide a minimal amount of consistency
16+
- Written by users/developers everywhere
17+
18+
19+
20+
## Where are These R Packages?
21+
22+
- Primarily available from CRAN and Bioconductor
23+
24+
- Also available from GitHub, Bitbucket, Gitorious, etc. (and elsewhere)
25+
26+
- Packages from CRAN/Bioconductor can be installed with `install.packages()`
27+
28+
- Packages from GitHub can be installed using `install_github()` from
29+
the <b>devtools</b> package
30+
31+
You do not have to put a package on a central repository, but doing so
32+
makes it easier for others to install your package.
33+
34+
35+
36+
## What's the Point?
37+
38+
- "Why not just make some code available?"
39+
- Documentation / vignettes
40+
- Centralized resources like CRAN
41+
- Minimal standards for reliability and robustness
42+
- Maintainability / extension
43+
- Interface definition / clear API
44+
- Users know that it will at least load properly
45+
46+
47+
48+
## Package Development Process
49+
50+
- Write some code in an R script file (.R)
51+
- Want to make code available to others
52+
- Incorporate R script file into R package structure
53+
- Write documentation for user functions
54+
- Include some other material (examples, demos, datasets, tutorials)
55+
- Package it up!
56+
57+
58+
59+
## Package Development Process
60+
61+
- Submit package to CRAN or Bioconductor
62+
- Push source code repository to GitHub or other source code sharing web site
63+
- People find all kinds of problems with your code
64+
- Scenario #1: They tell you about those problems and expect you to fix it
65+
- Scenario #2: They fix the problem for you and show you the changes
66+
- You incorporate the changes and release a new version
67+
68+
69+
70+
## R Package Essentials
71+
72+
- An R package is started by creating a directory with the name of the R package
73+
- A DESCRIPTION file which has info about the package
74+
- R code! (in the R/ sub-directory)
75+
- Documentation (in the man/ sub-directory)
76+
- NAMESPACE
77+
- Full requirements in Writing R Extensions
78+
79+
80+
81+
## The DESCRIPTION File
82+
83+
- <b>Package</b>: Name of package (e.g. library(name))
84+
- <b>Title</b>: Full name of package
85+
- <b>Description</b>: Longer description of package in one sentence (usually)
86+
- <b>Version</b>: Version number (usually M.m-p format)
87+
- <b>Author</b>, <b>Authors@R</b>: Name of the original author(s)
88+
- <b>Maintainer</b>: Name + email of person who fixes problems
89+
- <b>License</b>: License for the source code
90+
91+
92+
93+
## The DESCRIPTION File
94+
95+
These fields are optional but commonly used
96+
97+
- <b>Depends</b>: R packages that your package depends on
98+
- <b>Suggests</b>: Optional R packages that users may want to have installed
99+
- <b>Date</b>: Release date in YYYY-MM-DD format
100+
- <b>URL</b>: Package home page
101+
- <b>Other</b> fields can be added
102+
103+
104+
105+
## DESCRIPTION File: `gpclib`
106+
107+
<b>Package</b>: gpclib<br />
108+
<b>Title</b>: General Polygon Clipping Library for R<br />
109+
<b>Description</b>: General polygon clipping routines for R based on Alan Murta's C library.<br />
110+
<b>Version</b>: 1.5-5<br />
111+
<b>Author</b>: Roger D. Peng <[email protected]> with contributions from Duncan Murdoch and Barry Rowlingson; GPC library by Alan Murta<br />
112+
<b>Maintainer</b>: Roger D. Peng <[email protected]><br />
113+
<b>License</b>: file LICENSE<br />
114+
<b>Depends</b>: R (>= 2.14.0), methods<br />
115+
<b>Imports</b>: graphics<br />
116+
<b>Date</b>: 2013-04-01<br />
117+
<b>URL</b>: http://www.cs.man.ac.uk/~toby/gpc/, http://github.com/rdpeng/gpclib
118+
119+
120+
121+
## R Code
122+
123+
- Copy R code into the R/ sub-directory
124+
- There can be any number of files in this directory
125+
- Usually separate out files into logical groups
126+
- Code for all functions should be included here and not anywhere else in the package
127+
128+
129+
130+
## The NAMESPACE File
131+
132+
- Used to indicate which functions are <b>exported</b>
133+
- Exported functions can be called by the user and are considered the public API
134+
- Non-exported functions cannot be called directly by the user (but the code can be viewed)
135+
- Hides implementation details from users and makes a cleaner package interface
136+
137+
138+
139+
## The NAMESPACE File
140+
141+
- You can also indicate what functions you <b>import</b> from other packages
142+
- This allows for your package to use other packages without making other packages visible to the user
143+
- Importing a function loads the package but does not attach it to the search list
144+
145+
146+
147+
## The NAMESPACE File
148+
149+
Key directives
150+
151+
- export("\<function>")
152+
- import("\<package>")
153+
- importFrom("\<package>", "\<function>")
154+
155+
Also important
156+
157+
- exportClasses("\<class>")
158+
- exportMethods("\<generic>")
159+
160+
161+
162+
## NAMESPACE File: `mvtsplot` package
163+
164+
```r
165+
export("mvtsplot")
166+
import(splines)
167+
import(RColorBrewer)
168+
importFrom("grDevices", "colorRampPalette", "gray")
169+
importFrom("graphics", "abline", "axis", "box", "image",
170+
"layout", "lines", "par", "plot", "points",
171+
"segments", "strwidth", "text", "Axis")
172+
importFrom("stats", "complete.cases", "lm", "na.exclude",
173+
"predict", "quantile")
174+
```
175+
176+
177+
178+
## NAMESPACE File: `gpclib` package
179+
180+
```r
181+
export("read.polyfile", "write.polyfile")
182+
183+
importFrom(graphics, plot)
184+
185+
exportClasses("gpc.poly", "gpc.poly.nohole")
186+
187+
exportMethods("show", "get.bbox", "plot", "intersect", "union",
188+
"setdiff", "[", "append.poly", "scale.poly",
189+
"area.poly", "get.pts", "coerce", "tristrip",
190+
"triangulate")
191+
```
192+
193+
194+
195+
## Documentation
196+
197+
- Documentation files (.Rd) placed in man/ sub-directory
198+
- Written in a specific markup language
199+
- Required for every exported function
200+
- Another reason to limit exported functions
201+
- You can document other things like concepts, package overview
202+
203+
204+
205+
## Help File Example: `line` Function
206+
207+
```
208+
\name{line}
209+
\alias{line}
210+
\alias{residuals.tukeyline}
211+
\title{Robust Line Fitting}
212+
\description{
213+
Fit a line robustly as recommended in \emph{Exploratory Data Analysis}.
214+
}
215+
```
216+
217+
218+
219+
## Help File Example: `line` Function
220+
221+
```
222+
\usage{
223+
line(x, y)
224+
}
225+
\arguments{
226+
\item{x, y}{the arguments can be any way of specifying x-y pairs. See
227+
\code{\link{xy.coords}}.}
228+
}
229+
```
230+
231+
232+
233+
## Help File Example: `line` Function
234+
235+
```
236+
\details{
237+
Cases with missing values are omitted.
238+
239+
Long vectors are not supported.
240+
}
241+
\value{
242+
An object of class \code{"tukeyline"}.
243+
244+
Methods are available for the generic functions \code{coef},
245+
\code{residuals}, \code{fitted}, and \code{print}.
246+
}
247+
```
248+
249+
250+
251+
## Help File Example: `line` Function
252+
253+
```
254+
\references{
255+
Tukey, J. W. (1977).
256+
\emph{Exploratory Data Analysis},
257+
Reading Massachusetts: Addison-Wesley.
258+
}
259+
```
260+
261+
262+
263+
## Building and Checking
264+
265+
- R CMD build is a command-line program that creates a package archive
266+
file (`.tar.gz`)
267+
268+
- R CMD check runs a battery of tests on the package
269+
270+
- You can run R CMD build or R CMD check from the command-line using a
271+
terminal or command-shell application
272+
273+
- You can also run them from R using the system() function
274+
275+
```r
276+
system("R CMD build newpackage")
277+
system("R CMD check newpackage")
278+
```
279+
280+
281+
282+
## Checking
283+
284+
- R CMD check runs a battery tests
285+
- Documentation exists
286+
- Code can be loaded, no major coding problems or errors
287+
- Run examples in documentation
288+
- Check docs match code
289+
- All tests must pass to put package on CRAN
290+
291+
292+
293+
294+
## Getting Started
295+
296+
- The `package.skeleton()` function in the utils package creates a "skeleton" R package
297+
- Directory structure (R/, man/), DESCRIPTION file, NAMESPACE file, documentation files
298+
- If there are functions visible in your workspace, it writes R code files to the R/ directory
299+
- Documentation stubs are created in man/
300+
- You need to fill in the rest!
301+
302+
303+
304+
## Summary
305+
306+
- R packages provide a systematic way to make R code available to others
307+
- Standards ensure that packages have a minimal amount of documentation and robustness
308+
- Obtained from CRAN, Bioconductor, Github, etc.
309+
310+
311+
312+
## Summary
313+
314+
- Create a new directory with R/ and man/ sub-directories (or just use package.skeleton())
315+
- Write a DESCRIPTION file
316+
- Copy R code into the R/ sub-directory
317+
- Write documentation files in man/ sub-directory
318+
- Write a NAMESPACE file with exports/imports
319+
- Build and check
320+

0 commit comments

Comments
 (0)