3.3 Programming
3.3.1 Operator
Operation | Description |
---|---|
x + y | Addition |
x - y | Subtraction |
x * y | Multiplication |
x / y | Division |
x ^ y | Exponentiation |
x %% y | Modular arithmatic |
x %/% y | Integer division |
x == y | Test for equality |
x <= y | Test for less than or equal to |
x >= y | Test for greater than or equal to |
x && y | Boolean AND for scalars |
x || y | Boolean OR for scalers |
x & y | Boolean AND for vectors |
x | y | Boolean OR for vectors |
!x | Boolean negation |
source: (Matloff 2011)
3.3.2 If else
if (1 > 0) {
print("result: if")
} else {
print("result: else")
}
## [1] "result: if"
# {} brackets can be used to combine multiple expressions
# They can be skipped for a single-expression if-else statement.
if (1 > 2) print("result: if") else print("result: else")
## [1] "result: else"
ifelse(c(1,2,3) > 2, 1, -1) # return 1 if TRUE and -1 if else
## [1] -1 -1 1
Sys.time()
## [1] "2018-06-26 17:07:55 CDT"
time <- Sys.time()
hour <- as.integer(substr(time, 12,13))
# sequential if-else statements
if (hour > 8 & hour < 12) {
print("morning")
} else if (hour < 18) {
print("afternoon")
} else {
print("private time")
}
## [1] "afternoon"
3.3.3 Loop
for (i in 1:3) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
for (i in c(1,3,5)) print(i)
## [1] 1
## [1] 3
## [1] 5
i <- 1
while (i < 5) {
print(i)
i <- i + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
3.3.4 Function
We can avoid repeating ourselves with writing similar lines of codes if we turn them into a function. Functions contain a series of tasks that can be applied to varying objects such as different vectors, matrices, characters, data frames, lists, and functions.
A function consists of input arguments, tasks (R expressions), and output as an object (e.g. a vector, matrix, character, data frame, list, or function etc.). It can be named or remain anonymous (typically used inside a function like lapply()
).
## name_to_be_assigned <- function(input args) {
## tasks
## }
The output of the function, aside from those that are printed, saved, or exported, is the very last task (expression). If variable result
is created inside the function, having result
at the very end will return this item as an output. When multiple objects are created, it is often convenient to return those as a list. Use return()
to return a specific item in the middle of the function and skip the rest of the evaluations. For checking errors and halting evaluations, use stop()
or stopifnot()
.
myfun1 <- function() print("Hello world") # just returning "Hello world"
myfun1()
## [1] "Hello world"
myfun2 <- function(var) var^2
myfun2(var = 3)
## [1] 9
myfun3 <- function(var = 1) ifelse(var>0, log(var), var)
myfun3() # default argument is var = 1
## [1] 0
myfun3(2)
## [1] 0.6931472
myfun4 <- function(x1, x2) {
if (!is.numeric(x1) | !is.numeric(x2)) stop('demo of error: numeric args needed')
x1*x2
}
# try(myfun4(1, "a"))
myfun4(4, 3)
## [1] 12
3.3.5 Environment
A function, formally known as a closure, consists of its arguments (called formals), a body, and an environment. An environment is a collection of existing R objects at the time when the function is created. Functions created at the top level have .GlobalEnv
as their environments (R may refer to it as R_GlobalEnv
as well).
environment() # .GlobalEnv (or R_GlobalEnv) is the top-level environment
## <environment: R_GlobalEnv>
f1 <- function(arg1) environment()
formals(f1) # arguments of f1()
## $arg1
body(f1) # body of f1()
## environment()
environment(f1) # environment of f1(), which is .GlobalEnv
## <environment: R_GlobalEnv>
f1() # inside f1 has its own enviornment
## <environment: 0x7fcfc4851118>
A function can access to the objects in its environment (i.e., global to the function) and those defined inside (i.e., local to the function) and generally cannot overwrite the global objects. It allows for using common names such as “x1”, “var1” etc. defined inside functions, but those objects are only accessible within the function.
a <- NULL # object named "a" in .GlobalEnv
f2 <- function() {
a <- 1 # object named "a" in an environment inside f2
print(a)
environment()
}
f2() # one instance creating an environment
## [1] 1
## <environment: 0x7fcfc49a2260>
f2() # another instance creating another environment
## [1] 1
## <environment: 0x7fcfc3e0dc70>
a # stays NULL
## NULL
ls() # ls() shows all objects of an environment (here .GlobablEnv)
## [1] "a" "ans1" "df" "df1"
## [5] "df2" "f1" "f2" "factor1"
## [9] "factor2" "factor3" "hour" "i"
## [13] "idx_num" "idx_num2" "mat1" "matrix1"
## [17] "matrix2" "matrix3" "mydf1" "mydf2"
## [21] "myfun1" "myfun2" "myfun3" "myfun4"
## [25] "mylist1" "mymat1" "set_to_na" "tbl_operator"
## [29] "time" "vector1" "vector2" "vector3"
## [33] "vector4"
rm(list = ls()) # rm() removes items of an environment (here .GlobablEnv)
ls() # all gone in GlobalEnv
## character(0)
Using global assignment <<-
operator, one can bend this general rule of not affecting global objects. This can be useful when it is desirable to make certain objects accessible across multiple functions without explicitly passing them through arguments.
a <- NULL
b <- NULL
f1 <- function() {
a <<- 1 # global assignment
# another way to assign to GlobalEnv
assign("b", 2, envir = .GlobalEnv)
}
f1()
a
## [1] 1
b
## [1] 2
a <- 2
f2 <- function() {
# Since there is no "a" local to f2, R looks for "a"
# in a parent environment, or .GlobalEnv
print(a)
# g() assigns a number to "a" in g()'s environment
g <- function() a <<- 5
a <- 0 # object called "a" local to f2
print(a)
# g() updates only the local "a" to f2(), but not "a" in GlobalEnv
# R's scope hierarchy starts from local to its environment
g()
print(a)
}
a <- 3
# the first "a" is in .GlobalEnv when f2() is called
# the second "a" is local to an instace of f2()
# the third "a" is the updated version of the local "a" by g()
f2()
## [1] 3
## [1] 0
## [1] 5
a # object "a" in GlobalEnv: unchanged by g()
## [1] 3
It is convenient to use <<-
if you are sure about which object to overwrite. Otherwise, the use of <<-
should be avoided.
3.3.6 Debugging
browser()
and traceback()
are common debugging tools. A debugging session starts where browser()
is inserted and allows for a line-by-line execution onward. Putting browser()
inside a loop or function is useful because it allows for accessing the objects at a particular moment of execution in its environment. After an error alert, executing traceback()
shows at which process the error occurred. Other tools include debug()
, debugger()
, and stopifnot()
.
3.3.7 Stat func.
What is going on here? What do we see?
Distribution | Density_pmf | cdf | Quantiles | Random_draw |
---|---|---|---|---|
Normal | dnorm( ) | pnorm( ) | qnorm( ) | rnorm( ) |
Chi square | dchisq( ) | pchisq( ) | qchisq( ) | rchisq( ) |
Binomial | dbinom( ) | pbinom( ) | qbinom( ) | rbinom() |
source: (Matloff 2011)
3.3.8 String func.
R has built-in string manipulation functions. They are commonly used for;
detecting a certain pattern in a vector (
grep()
returning a location index vector,grepl()
returning a logical vector)replacing a certain pattern with another (
gsub()
)counting the length of a string (
nchar()
)concatenating characters and numbers as a string (
paste()
,paste0()
,sprintf()
)extracting a segment of a string by character position range (
substr()
)splitting a string with a particular pattern (
strsplit()
)finding a character position of a pattern in a string (
regexpr()
)
oasis <- c("Liam Gallagher", "Noel Gallagher", "Paul Arthurs", "Paul McGuigan", "Tony McCarroll")
grep(pattern = "Paul", oasis)
## [1] 3 4
grepl(pattern = "Gall", oasis)
## [1] TRUE TRUE FALSE FALSE FALSE
gsub("Gallagher", "Gallag.", oasis)
## [1] "Liam Gallag." "Noel Gallag." "Paul Arthurs" "Paul McGuigan"
## [5] "Tony McCarroll"
nchar(oasis)
## [1] 14 14 12 13 14
paste(oasis)
## [1] "Liam Gallagher" "Noel Gallagher" "Paul Arthurs" "Paul McGuigan"
## [5] "Tony McCarroll"
paste(oasis, collapse=", ")
## [1] "Liam Gallagher, Noel Gallagher, Paul Arthurs, Paul McGuigan, Tony McCarroll"
sprintf("%s from %d to %d", "Oasis", 1991, 2009)
## [1] "Oasis from 1991 to 2009"
substr(oasis, 1, 6)
## [1] "Liam G" "Noel G" "Paul A" "Paul M" "Tony M"
strsplit(oasis, split=" ") # split by a blank space
## [[1]]
## [1] "Liam" "Gallagher"
##
## [[2]]
## [1] "Noel" "Gallagher"
##
## [[3]]
## [1] "Paul" "Arthurs"
##
## [[4]]
## [1] "Paul" "McGuigan"
##
## [[5]]
## [1] "Tony" "McCarroll"
regexpr("ll", oasis[1])[1]
## [1] 8
Common regular expressions used in R include;
"[char]"
(any string containing either “c”, “h”, “a”, or “r” )"a.c"
(any string containing “a” followed by any letter followed by “c”)"\\."
(any string containing symbol “.”).
grepl("[is]", oasis)
## [1] TRUE FALSE TRUE TRUE FALSE
grepl("P..l", oasis)
## [1] FALSE FALSE TRUE TRUE FALSE
grepl("\\.", c("Liam", "Noel", "Paul A.", "Paul M.", "Tony"))
## [1] FALSE FALSE TRUE TRUE FALSE
3.3.9 Set func.
The functions for common set operations include union()
, intersect()
, setdiff()
, and setequal()
. The most commonly used function is %in%
operator; X %in% Y
returns a logical vector indicating whether an each element of X
is a member of Y
.
c(1,2,3,4,5) %in% c(3,2,5)
c("a","b","t","s") %in% c("t","a","a")
References
Matloff, Norman. 2011. The Art of R Programming.