R Object-oriented Programming
上QQ阅读APP看书,第一时间看更新

Discrete data types

One of the features of the R environment is the rich collection of data types that are available. Here, we briefly list some of the built-in data types that describe discrete data. The four data types discussed are the integer, logical, character, and factor data types. We also introduce the idea of a vector, which is the default data structure for any variable. A list of the commands discussed here is given in Table 2 and Table 3.

It should be noted that the default data type in R, for a number, is a double precision number. Strings can be interpreted in a variety of ways, usually as either a string or a factor. You should be careful to make sure that R is storing information in the format that you want, and it is important to double-check this important aspect of how data is tracked.

Integer

The first discrete data type examined is the integer type. Values are 32-bit integers. In most circumstances, a number must be explicitly cast as being an integer, as the default type in R is a double precision number. There are a variety of commands used to cast integers as well as allocate space for integers. The integer command takes a number for an argument and will return a vector of integers whose length is given by the argument:

> bubba <- integer(12)
> bubba
 [1] 0 0 0 0 0 0 0 0 0 0 0 0
> bubba[1]
[1] 0
> bubba[2]
[1] 0
> bubba[[4]]
[1] 0
>  b[4] <- 15
> b
 [1] 0 0 0 15 0 0 0 0 0 0 0 0

In the preceding example, a vector of twelve integers was defined. The default values are zero, and the individual entries in the vector are accessed using braces. The first entry in the vector has index 1, so in this example, bubba[1] refers to the initial entry in the vector. Note that there are two ways to access an element in the vector: single versus double braces. For a vector, the two methods are nearly the same, but when we explore the use of lists as opposed to vectors, the meaning will change. In short, the double braces return objects of the same type as the elements within the vector, and the single braces return values of the same type as the variable itself. For example, using single braces on a list will return a list, while double braces may return a vector.

A number can be cast as an integer using the as.integer command. A variable's type can be checked using the typeof command. The typeof command indicates how R stores the object and is different from the class command, which is an attribute that you can change or query:

> as.integer(13.2)
[1] 13
> thisNumber <- as.integer(8/3)
> typeof(thisNumber)
[1] "integer"

Note that a sequence of numbers can be automatically created using either the : operator or the seq command:

> 1:5
[1] 1 2 3 4 5
> myNum <- as.integer(1:5)


> myNum[1]
[1] 1
> myNum[3]
[1] 3

> seq(4,11,by=2)
[1] 4 6 8 10
> otherNums <- seq(4,11,by=2)


> otherNums[3]
[1] 8

A common task is to determine whether or not a variable is of a certain type. For integers, the is.integer command is used to determine whether or not a variable has an integer type:

> a <- 1.2
> typeof(a)
[1] "double"
> is.integer(a)
[1] FALSE

> a <- as.integer(1.2)
> typeof(a)
[1] "integer"
> is.integer(a)
[1] TRUE

Logical

Logical data consists of variables that are either true or false. The words TRUE and FALSE are used to designate the two possible values of a logical variable. (The TRUE value can also be abbreviated to T, and the FALSE value can be abbreviated to F.) The basic commands associated with logical variables are similar to the commands for integers discussed in the previous subsection. The logical command is used to allocate a vector of Boolean values. In the following example, a logical vector of length 10 is created. The default value is FALSE, and the Boolean not operator is used to flip the values to evaluate to TRUE:

> b <- logical(10)
> b
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> b[3]
[1] FALSE
> !b
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> !b[5]
[1] TRUE
> typeof(b)
[1] "logical"
> mode(b)
[1] "logical"
> storage.mode(b)
[1] "logical"
>  b[3] <- TRUE
> b
 [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

To cast a value to a logical type, you can use the as.logical command. Note that zero is mapped to a value of FALSE and other numbers are mapped to a value of TRUE:

> a <- -1:1
> a
[1] -1 0 1
> as.logical(a)
[1] TRUE FALSE TRUE

To determine whether or not a value has a logical type, you use the is.logical command:

> b <- logical(4)
> b
[1] FALSE FALSE FALSE FALSE
> is.logical(b)
[1] TRUE

The standard operators for logical operations are available, and a list of some of the more common operations is given in Table 1. Note that there is a difference between operations such as & and &&. A single & is used to perform an and operation on each pairwise element of two vectors, while the double && returns a single logical result using only the first elements of the vectors:

> l1 <- c(TRUE,FALSE)
> l2 <- c(TRUE,TRUE)
> l1&l1
[1] TRUE FALSE
> l1&&l1
[1] TRUE
> l1|l2
[1] TRUE TRUE
> l1||l2
[1] TRUE

Tip

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. An additional source for the examples in this book can be found at https://github.com/KellyBlack/R-Object-Oriented-Programming. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

The following table shows various logical operators and their description:

Table 1 – list of operators for logical variables

Character

One common way to store information is to save data as characters or strings. Character data is defined using either single or double quotes:

> a <- "hello"
> a
[1] "hello"
> b <- 'there'
> b
[1] "there"
> typeof(a)
[1] "character"

The character command can be used to allocate a vector of character-valued strings, as follows:

> many <- character(3)
> many
[1] "" "" ""
> many[2] <- "this is the second"
> many[3] <- 'yo, third!'
> many[1] <- "and the first"
> many
[1] "and the first" "this is the second" "yo, third!" 

A value can be cast as a character using the as.character command, as follows:

> a <- 3.0
> a
[1] 3
> b <- as.character(a)
> b
[1] "3"

Finally, the is.character command takes a single argument, and it returns a value of TRUE if the argument is a string:

> a <- as.character(4.5)
> a
[1] "4.5"
> is.character(a)
[1] TRUE

Factors

Another common way to record data is to provide a discrete set of levels. For example, the results of an individual trial in an experiment may be denoted by a value of a, b, or c. Ordinal data of this kind is referred to as a factor in R. The commands and ideas are roughly parallel to the data types described previously. There are some subtle differences with factors, though. Factors are used to designate different levels and can be considered ordered or unordered. There are a large number of options, and it is wise to consult the help pages for factors using the (help(factor)) command. One thing to note, though, is that the typeof command for a factor will return an integer.

Factors can be defined using the factor command, as follows:

> lev <- factor(x=c("one","two","three","one"))
> lev
[1] one two three one 
Levels: one three two
> levels(lev)
[1] "one" "three" "two" 
> sort(lev)
[1] one one two three
Levels: one two three

>  lev <- factor(x=c("one","two","three","one"),levels=c("one","two","three"))
> lev
[1] one two three one 
Levels: one two three
> levels(lev)
[1] "one" "two" "three"
> sort(lev)
[1] one one two three
Levels: one two three

The techniques used to cast a variable to a factor or test whether a variable is a factor are similar to the previous examples. A variable can be cast as a factor using the as.factor command. Also, the is.factor command can be used to determine whether or not a variable has a type of factor.

Continuous data types

The data types for continuous data types are given here. The double and complex data types are given. A list of the commands discussed here is given in Table 2 and Table 3.

Double

The default numeric data type in R is a double precision number. The commands are similar to those of the integer data type discussed previously. The double command can be used to allocate a vector of double precision numbers, and the numbers within the vector are accessed using braces:

> d <- double(8)
> d
[1] 0 0 0 0 0 0 0 0
> typeof(d)
[1] "double"
> d[3] <- 17
> d
[1] 0 0 17 0 0 0 0 0

The techniques used to cast a variable to a double precision number and test whether a variable is a double precision number are similar to the examples seen previously. A variable can be cast as a double precision number using the as.double command. Also, to determine whether a variable is a double precision number, the as.double command can be used.

Complex

Arithmetic for complex numbers is supported in R, and most math functions will react properly when given a complex number. You can append i to the end of a number to force it to be the imaginary part of a complex number, as follows:

> 1i
[1] 0+1i
> 1i*1i
[1] -1+0i
> z <- 3+2i
> z
[1] 3+2i
> z*z
[1] 5+12i
> Mod(z)
[1] 3.605551
> Re(z)
[1] 3
> Im(z)
[1] 2
> Arg(z)
[1] 0.5880026
> Conj(z)
[1] 3-2i

The complex command can also be used to define a vector of complex numbers. There are a number of options for the complex command, so a quick check of the help page, (help(complex)), is recommended:

> z <- complex(3)
> z
[1] 0+0i 0+0i 0+0i
> typeof(z)
[1] "complex"
> z <- complex(real=c(1,2),imag=c(3,4))
> z
[1] 1+3i 2+4i
> Re(z)
[1] 1 2

The techniques to cast a variable to a complex number and to test whether or not a variable is a complex number are similar to the methods seen previously. A variable can be cast as complex using the as.complex command. Also, to test whether or not a variable is a complex number, the as.complex command can be used.

Special data types

There are two other common data types that occur that are important. We will discuss these two data types and provide a note about objects. The two data types are NA and NULL. These are brief comments, as these are recurring topics that we will revisit many times.

The first data type is a constant, NA. This is a type used to indicate a missing value. It is a constant in R, and a variable can be tested using the is.na command, as follows:

> n <- c(NA,2,3,NA,5)
> n
[1] NA 2 3 NA 5
> is.na(n)
[1] TRUE FALSE FALSE TRUE FALSE
> n[!is.na(n)]
[1] 2 3 5

Another special type is the NULL type. It has the same meaning as the null keyword in the C language. It is not an actual type but is used to determine whether or not an object exists:

> a <- NULL
> typeof(a)
[1] "NULL"

Finally, we'll quickly explore the term objects. The variables that we defined in all of the preceding examples are treated as objects within the R environment. When we start writing functions and creating classes, it will be important to realize that they are treated like variables. The names used to assign variables are just a shortcut for R to determine where an object is located.

For example, the complex command is used to allocate a vector of complex values. The command is defined to be a set of instructions, and there is an object called complex that points to those instructions:

> complex
function (length.out = 0L, real = numeric(), imaginary = numeric(),
    modulus = 1, argument = 0)
{
    if (missing(modulus) && missing(argument)) {
        .Internal(complex(length.out, real, imaginary))
    }
    else {
        n <- max(length.out, length(argument), length(modulus))
        rep_len(modulus, n) * exp((0+1i) * rep_len(argument,
            n))
    }
}
<bytecode: 0x2489c80>
<environment: namespace:base>

There is a difference between calling the complex()function and referring to the set of instructions located at complex.

Notes on the as and is functions

Two common tasks are to determine whether a variable is of a given type and to cast a variable to different types. The commands to determine whether a variable is of a given type generally start with the is prefix, and the commands to cast a variable to a different type generally start with the as prefix. The list of commands to determine whether a variable is of a given type are given in the following table:

Table 2 – commands to determine whether a variable is of a particular type

The commands used to cast a variable to a different type are given in Table 3. These commands take a single argument and return a variable of the given type. For example, the as.character command can be used to convert a number to a string.

The commands in the previous table are used to test what type a variable has. The following table provides the commands that are used to change a variable of one type to another type:

Table 3 – commands to cast a variable into a particular type