Let’s get started with dplyr

The dplyr package by Hadley Wickham is a very useful package that provides “A Grammar of Data Manipulation”. It aims to simplify common data manipulation tasks, and provides “verbs”, i.e. functions that correspond to the most common data manipulation tasks. Have fun playing with dplyr in the exercises below!

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Install and load the package dplyr package. Given the metadata:

Wt: weight of the subject (kg).

Dose: dose of theophylline administered orally to the subject (mg/kg).

Time: time since drug administration when the sample was drawn (hr).

conc: theophylline concentration in the sample (mg/L).

Copy and paste this code to get df

df=data.frame(Theoph)
library(dplyr)

Exercise 2

Use the names() function to get the column names of df.

Exercise 3

Let’s practice using the select() function. This allows you to work with just column names instead of indices.
a) Select only the columns starting from Subject to Dose
b) Only select the Wt and Dose columns now.

Learn more about dplyr in Section 5 Using dplyr on one and multiple Datasets of the online course R Data Pre-Processing & Data Management – Shape your Data! Rated 4.6 / 5 (45 ratings) 473 students enrolled

Exercise 4
Let’s look at the sample with Dose greater than 5 mg/kg. Use the filter command() to return df with Dose>5′

Exercise 5

Great. Now use filter command to return df with Dose>5 and Time greater than the mean Time.

Exercise 6

Now let’s try sorting the data. Use the arrange() function to
1) arrange df by weight (descending)
2) arrange df by weight (ascending)
3) arrange df by weight (ascending) and Time (descending)

Exercise 7

The mutate() command allows you to create a new column using conditions and data derived from other columns. Use mutate() command to create a new column called trend that equals to Time-mean(Time). This will tell you how far each time value is from its mean. Set na.rm=TRUE.

Exercise 8

Given the meta-data

76.2 kg Super-middleweight
72.57 kg Middleweight
69.85 kg Light-middleweight
66.68 kg Welterweight

Use the mutate function to classify the weight using the information above. For the purpose of this exercise, considering anything above 76.2 kg to be Super-middleweight and anything below 66.8 to be Welterweight. Anything below 76.2 to be middleweight and anything below 72.57 to be light-middleweight. Store the classifications under weight_cat. Hint: Use ifelse function() with mutate() to achieve this. Store this back into df.

Exercise 9

Use the groupby() command to group df by weight_cat. This allows us to use aggregated functions similar to group by in SQL. Store this in a df called weight_group

Exercise 10

Use the summarize() command on the weight_group created in Question 9 to find the mean Time and sum of Dose received by each weight categories.




Let’s get started with dplyr Solution

Below are the solutions to these exercises on dplyr.

Learn more about dplyr in Section 5 Using dplyr on one and multiple Datasets of the online course R Data Pre-Processing & Data Management – Shape your Data! Rated 4.6 / 5 (45 ratings) 473 students enrolled

###############
#             #
# Exercise 1  #
#             #
###############

df=data.frame(Theoph)
library(dplyr)


###############
#             #
# Exercise 2  #
#             #
###############
names(df)
## [1] "Subject" "Wt"      "Dose"    "Time"    "conc"
###############
#             #
# Exercise 3  #
#             #
###############

select(df,Subject:Dose)
##     Subject   Wt Dose
## 1         1 79.6 4.02
## 2         1 79.6 4.02
## 3         1 79.6 4.02
## 4         1 79.6 4.02
## 5         1 79.6 4.02
## 6         1 79.6 4.02
## 7         1 79.6 4.02
## 8         1 79.6 4.02
## 9         1 79.6 4.02
## 10        1 79.6 4.02
## 11        1 79.6 4.02
## 12        2 72.4 4.40
## 13        2 72.4 4.40
## 14        2 72.4 4.40
## 15        2 72.4 4.40
## 16        2 72.4 4.40
## 17        2 72.4 4.40
## 18        2 72.4 4.40
## 19        2 72.4 4.40
## 20        2 72.4 4.40
## 21        2 72.4 4.40
## 22        2 72.4 4.40
## 23        3 70.5 4.53
## 24        3 70.5 4.53
## 25        3 70.5 4.53
## 26        3 70.5 4.53
## 27        3 70.5 4.53
## 28        3 70.5 4.53
## 29        3 70.5 4.53
## 30        3 70.5 4.53
## 31        3 70.5 4.53
## 32        3 70.5 4.53
## 33        3 70.5 4.53
## 34        4 72.7 4.40
## 35        4 72.7 4.40
## 36        4 72.7 4.40
## 37        4 72.7 4.40
## 38        4 72.7 4.40
## 39        4 72.7 4.40
## 40        4 72.7 4.40
## 41        4 72.7 4.40
## 42        4 72.7 4.40
## 43        4 72.7 4.40
## 44        4 72.7 4.40
## 45        5 54.6 5.86
## 46        5 54.6 5.86
## 47        5 54.6 5.86
## 48        5 54.6 5.86
## 49        5 54.6 5.86
## 50        5 54.6 5.86
## 51        5 54.6 5.86
## 52        5 54.6 5.86
## 53        5 54.6 5.86
## 54        5 54.6 5.86
## 55        5 54.6 5.86
## 56        6 80.0 4.00
## 57        6 80.0 4.00
## 58        6 80.0 4.00
## 59        6 80.0 4.00
## 60        6 80.0 4.00
## 61        6 80.0 4.00
## 62        6 80.0 4.00
## 63        6 80.0 4.00
## 64        6 80.0 4.00
## 65        6 80.0 4.00
## 66        6 80.0 4.00
## 67        7 64.6 4.95
## 68        7 64.6 4.95
## 69        7 64.6 4.95
## 70        7 64.6 4.95
## 71        7 64.6 4.95
## 72        7 64.6 4.95
## 73        7 64.6 4.95
## 74        7 64.6 4.95
## 75        7 64.6 4.95
## 76        7 64.6 4.95
## 77        7 64.6 4.95
## 78        8 70.5 4.53
## 79        8 70.5 4.53
## 80        8 70.5 4.53
## 81        8 70.5 4.53
## 82        8 70.5 4.53
## 83        8 70.5 4.53
## 84        8 70.5 4.53
## 85        8 70.5 4.53
## 86        8 70.5 4.53
## 87        8 70.5 4.53
## 88        8 70.5 4.53
## 89        9 86.4 3.10
## 90        9 86.4 3.10
## 91        9 86.4 3.10
## 92        9 86.4 3.10
## 93        9 86.4 3.10
## 94        9 86.4 3.10
## 95        9 86.4 3.10
## 96        9 86.4 3.10
## 97        9 86.4 3.10
## 98        9 86.4 3.10
## 99        9 86.4 3.10
## 100      10 58.2 5.50
## 101      10 58.2 5.50
## 102      10 58.2 5.50
## 103      10 58.2 5.50
## 104      10 58.2 5.50
## 105      10 58.2 5.50
## 106      10 58.2 5.50
## 107      10 58.2 5.50
## 108      10 58.2 5.50
## 109      10 58.2 5.50
## 110      10 58.2 5.50
## 111      11 65.0 4.92
## 112      11 65.0 4.92
## 113      11 65.0 4.92
## 114      11 65.0 4.92
## 115      11 65.0 4.92
## 116      11 65.0 4.92
## 117      11 65.0 4.92
## 118      11 65.0 4.92
## 119      11 65.0 4.92
## 120      11 65.0 4.92
## 121      11 65.0 4.92
## 122      12 60.5 5.30
## 123      12 60.5 5.30
## 124      12 60.5 5.30
## 125      12 60.5 5.30
## 126      12 60.5 5.30
## 127      12 60.5 5.30
## 128      12 60.5 5.30
## 129      12 60.5 5.30
## 130      12 60.5 5.30
## 131      12 60.5 5.30
## 132      12 60.5 5.30
select(df,Subject,Weight)
## Error in eval(expr, envir, enclos): object 'Weight' not found
###############
#             #
# Exercise 4  #
#             #
###############

filter(df,Dose>5)
##    Subject   Wt Dose  Time  conc
## 1        5 54.6 5.86  0.00  0.00
## 2        5 54.6 5.86  0.30  2.02
## 3        5 54.6 5.86  0.52  5.63
## 4        5 54.6 5.86  1.00 11.40
## 5        5 54.6 5.86  2.02  9.33
## 6        5 54.6 5.86  3.50  8.74
## 7        5 54.6 5.86  5.02  7.56
## 8        5 54.6 5.86  7.02  7.09
## 9        5 54.6 5.86  9.10  5.90
## 10       5 54.6 5.86 12.00  4.37
## 11       5 54.6 5.86 24.35  1.57
## 12      10 58.2 5.50  0.00  0.24
## 13      10 58.2 5.50  0.37  2.89
## 14      10 58.2 5.50  0.77  5.22
## 15      10 58.2 5.50  1.02  6.41
## 16      10 58.2 5.50  2.05  7.83
## 17      10 58.2 5.50  3.55 10.21
## 18      10 58.2 5.50  5.05  9.18
## 19      10 58.2 5.50  7.08  8.02
## 20      10 58.2 5.50  9.38  7.14
## 21      10 58.2 5.50 12.10  5.68
## 22      10 58.2 5.50 23.70  2.42
## 23      12 60.5 5.30  0.00  0.00
## 24      12 60.5 5.30  0.25  1.25
## 25      12 60.5 5.30  0.50  3.96
## 26      12 60.5 5.30  1.00  7.82
## 27      12 60.5 5.30  2.00  9.72
## 28      12 60.5 5.30  3.52  9.75
## 29      12 60.5 5.30  5.07  8.57
## 30      12 60.5 5.30  7.07  6.59
## 31      12 60.5 5.30  9.03  6.11
## 32      12 60.5 5.30 12.05  4.57
## 33      12 60.5 5.30 24.15  1.17
###############
#             #
# Exercise 5  #
#             #
###############

filter(df,Dose>5 & Time>mean(Time))
##    Subject   Wt Dose  Time conc
## 1        5 54.6 5.86  7.02 7.09
## 2        5 54.6 5.86  9.10 5.90
## 3        5 54.6 5.86 12.00 4.37
## 4        5 54.6 5.86 24.35 1.57
## 5       10 58.2 5.50  7.08 8.02
## 6       10 58.2 5.50  9.38 7.14
## 7       10 58.2 5.50 12.10 5.68
## 8       10 58.2 5.50 23.70 2.42
## 9       12 60.5 5.30  7.07 6.59
## 10      12 60.5 5.30  9.03 6.11
## 11      12 60.5 5.30 12.05 4.57
## 12      12 60.5 5.30 24.15 1.17
###############
#             #
# Exercise 6  #
#             #
###############


arrange(df,desc(Wt))
##     Subject   Wt Dose  Time  conc
## 1         9 86.4 3.10  0.00  0.00
## 2         9 86.4 3.10  0.30  7.37
## 3         9 86.4 3.10  0.63  9.03
## 4         9 86.4 3.10  1.05  7.14
## 5         9 86.4 3.10  2.02  6.33
## 6         9 86.4 3.10  3.53  5.66
## 7         9 86.4 3.10  5.02  5.67
## 8         9 86.4 3.10  7.17  4.24
## 9         9 86.4 3.10  8.80  4.11
## 10        9 86.4 3.10 11.60  3.16
## 11        9 86.4 3.10 24.43  1.12
## 12        6 80.0 4.00  0.00  0.00
## 13        6 80.0 4.00  0.27  1.29
## 14        6 80.0 4.00  0.58  3.08
## 15        6 80.0 4.00  1.15  6.44
## 16        6 80.0 4.00  2.03  6.32
## 17        6 80.0 4.00  3.57  5.53
## 18        6 80.0 4.00  5.00  4.94
## 19        6 80.0 4.00  7.00  4.02
## 20        6 80.0 4.00  9.22  3.46
## 21        6 80.0 4.00 12.10  2.78
## 22        6 80.0 4.00 23.85  0.92
## 23        1 79.6 4.02  0.00  0.74
## 24        1 79.6 4.02  0.25  2.84
## 25        1 79.6 4.02  0.57  6.57
## 26        1 79.6 4.02  1.12 10.50
## 27        1 79.6 4.02  2.02  9.66
## 28        1 79.6 4.02  3.82  8.58
## 29        1 79.6 4.02  5.10  8.36
## 30        1 79.6 4.02  7.03  7.47
## 31        1 79.6 4.02  9.05  6.89
## 32        1 79.6 4.02 12.12  5.94
## 33        1 79.6 4.02 24.37  3.28
## 34        4 72.7 4.40  0.00  0.00
## 35        4 72.7 4.40  0.35  1.89
## 36        4 72.7 4.40  0.60  4.60
## 37        4 72.7 4.40  1.07  8.60
## 38        4 72.7 4.40  2.13  8.38
## 39        4 72.7 4.40  3.50  7.54
## 40        4 72.7 4.40  5.02  6.88
## 41        4 72.7 4.40  7.02  5.78
## 42        4 72.7 4.40  9.02  5.33
## 43        4 72.7 4.40 11.98  4.19
## 44        4 72.7 4.40 24.65  1.15
## 45        2 72.4 4.40  0.00  0.00
## 46        2 72.4 4.40  0.27  1.72
## 47        2 72.4 4.40  0.52  7.91
## 48        2 72.4 4.40  1.00  8.31
## 49        2 72.4 4.40  1.92  8.33
## 50        2 72.4 4.40  3.50  6.85
## 51        2 72.4 4.40  5.02  6.08
## 52        2 72.4 4.40  7.03  5.40
## 53        2 72.4 4.40  9.00  4.55
## 54        2 72.4 4.40 12.00  3.01
## 55        2 72.4 4.40 24.30  0.90
## 56        3 70.5 4.53  0.00  0.00
## 57        3 70.5 4.53  0.27  4.40
## 58        3 70.5 4.53  0.58  6.90
## 59        3 70.5 4.53  1.02  8.20
## 60        3 70.5 4.53  2.02  7.80
## 61        3 70.5 4.53  3.62  7.50
## 62        3 70.5 4.53  5.08  6.20
## 63        3 70.5 4.53  7.07  5.30
## 64        3 70.5 4.53  9.00  4.90
## 65        3 70.5 4.53 12.15  3.70
## 66        3 70.5 4.53 24.17  1.05
## 67        8 70.5 4.53  0.00  0.00
## 68        8 70.5 4.53  0.25  3.05
## 69        8 70.5 4.53  0.52  3.05
## 70        8 70.5 4.53  0.98  7.31
## 71        8 70.5 4.53  2.02  7.56
## 72        8 70.5 4.53  3.53  6.59
## 73        8 70.5 4.53  5.05  5.88
## 74        8 70.5 4.53  7.15  4.73
## 75        8 70.5 4.53  9.07  4.57
## 76        8 70.5 4.53 12.10  3.00
## 77        8 70.5 4.53 24.12  1.25
## 78       11 65.0 4.92  0.00  0.00
## 79       11 65.0 4.92  0.25  4.86
## 80       11 65.0 4.92  0.50  7.24
## 81       11 65.0 4.92  0.98  8.00
## 82       11 65.0 4.92  1.98  6.81
## 83       11 65.0 4.92  3.60  5.87
## 84       11 65.0 4.92  5.02  5.22
## 85       11 65.0 4.92  7.03  4.45
## 86       11 65.0 4.92  9.03  3.62
## 87       11 65.0 4.92 12.12  2.69
## 88       11 65.0 4.92 24.08  0.86
## 89        7 64.6 4.95  0.00  0.15
## 90        7 64.6 4.95  0.25  0.85
## 91        7 64.6 4.95  0.50  2.35
## 92        7 64.6 4.95  1.02  5.02
## 93        7 64.6 4.95  2.02  6.58
## 94        7 64.6 4.95  3.48  7.09
## 95        7 64.6 4.95  5.00  6.66
## 96        7 64.6 4.95  6.98  5.25
## 97        7 64.6 4.95  9.00  4.39
## 98        7 64.6 4.95 12.05  3.53
## 99        7 64.6 4.95 24.22  1.15
## 100      12 60.5 5.30  0.00  0.00
## 101      12 60.5 5.30  0.25  1.25
## 102      12 60.5 5.30  0.50  3.96
## 103      12 60.5 5.30  1.00  7.82
## 104      12 60.5 5.30  2.00  9.72
## 105      12 60.5 5.30  3.52  9.75
## 106      12 60.5 5.30  5.07  8.57
## 107      12 60.5 5.30  7.07  6.59
## 108      12 60.5 5.30  9.03  6.11
## 109      12 60.5 5.30 12.05  4.57
## 110      12 60.5 5.30 24.15  1.17
## 111      10 58.2 5.50  0.00  0.24
## 112      10 58.2 5.50  0.37  2.89
## 113      10 58.2 5.50  0.77  5.22
## 114      10 58.2 5.50  1.02  6.41
## 115      10 58.2 5.50  2.05  7.83
## 116      10 58.2 5.50  3.55 10.21
## 117      10 58.2 5.50  5.05  9.18
## 118      10 58.2 5.50  7.08  8.02
## 119      10 58.2 5.50  9.38  7.14
## 120      10 58.2 5.50 12.10  5.68
## 121      10 58.2 5.50 23.70  2.42
## 122       5 54.6 5.86  0.00  0.00
## 123       5 54.6 5.86  0.30  2.02
## 124       5 54.6 5.86  0.52  5.63
## 125       5 54.6 5.86  1.00 11.40
## 126       5 54.6 5.86  2.02  9.33
## 127       5 54.6 5.86  3.50  8.74
## 128       5 54.6 5.86  5.02  7.56
## 129       5 54.6 5.86  7.02  7.09
## 130       5 54.6 5.86  9.10  5.90
## 131       5 54.6 5.86 12.00  4.37
## 132       5 54.6 5.86 24.35  1.57
arrange(df,Wt)
##     Subject   Wt Dose  Time  conc
## 1         5 54.6 5.86  0.00  0.00
## 2         5 54.6 5.86  0.30  2.02
## 3         5 54.6 5.86  0.52  5.63
## 4         5 54.6 5.86  1.00 11.40
## 5         5 54.6 5.86  2.02  9.33
## 6         5 54.6 5.86  3.50  8.74
## 7         5 54.6 5.86  5.02  7.56
## 8         5 54.6 5.86  7.02  7.09
## 9         5 54.6 5.86  9.10  5.90
## 10        5 54.6 5.86 12.00  4.37
## 11        5 54.6 5.86 24.35  1.57
## 12       10 58.2 5.50  0.00  0.24
## 13       10 58.2 5.50  0.37  2.89
## 14       10 58.2 5.50  0.77  5.22
## 15       10 58.2 5.50  1.02  6.41
## 16       10 58.2 5.50  2.05  7.83
## 17       10 58.2 5.50  3.55 10.21
## 18       10 58.2 5.50  5.05  9.18
## 19       10 58.2 5.50  7.08  8.02
## 20       10 58.2 5.50  9.38  7.14
## 21       10 58.2 5.50 12.10  5.68
## 22       10 58.2 5.50 23.70  2.42
## 23       12 60.5 5.30  0.00  0.00
## 24       12 60.5 5.30  0.25  1.25
## 25       12 60.5 5.30  0.50  3.96
## 26       12 60.5 5.30  1.00  7.82
## 27       12 60.5 5.30  2.00  9.72
## 28       12 60.5 5.30  3.52  9.75
## 29       12 60.5 5.30  5.07  8.57
## 30       12 60.5 5.30  7.07  6.59
## 31       12 60.5 5.30  9.03  6.11
## 32       12 60.5 5.30 12.05  4.57
## 33       12 60.5 5.30 24.15  1.17
## 34        7 64.6 4.95  0.00  0.15
## 35        7 64.6 4.95  0.25  0.85
## 36        7 64.6 4.95  0.50  2.35
## 37        7 64.6 4.95  1.02  5.02
## 38        7 64.6 4.95  2.02  6.58
## 39        7 64.6 4.95  3.48  7.09
## 40        7 64.6 4.95  5.00  6.66
## 41        7 64.6 4.95  6.98  5.25
## 42        7 64.6 4.95  9.00  4.39
## 43        7 64.6 4.95 12.05  3.53
## 44        7 64.6 4.95 24.22  1.15
## 45       11 65.0 4.92  0.00  0.00
## 46       11 65.0 4.92  0.25  4.86
## 47       11 65.0 4.92  0.50  7.24
## 48       11 65.0 4.92  0.98  8.00
## 49       11 65.0 4.92  1.98  6.81
## 50       11 65.0 4.92  3.60  5.87
## 51       11 65.0 4.92  5.02  5.22
## 52       11 65.0 4.92  7.03  4.45
## 53       11 65.0 4.92  9.03  3.62
## 54       11 65.0 4.92 12.12  2.69
## 55       11 65.0 4.92 24.08  0.86
## 56        3 70.5 4.53  0.00  0.00
## 57        3 70.5 4.53  0.27  4.40
## 58        3 70.5 4.53  0.58  6.90
## 59        3 70.5 4.53  1.02  8.20
## 60        3 70.5 4.53  2.02  7.80
## 61        3 70.5 4.53  3.62  7.50
## 62        3 70.5 4.53  5.08  6.20
## 63        3 70.5 4.53  7.07  5.30
## 64        3 70.5 4.53  9.00  4.90
## 65        3 70.5 4.53 12.15  3.70
## 66        3 70.5 4.53 24.17  1.05
## 67        8 70.5 4.53  0.00  0.00
## 68        8 70.5 4.53  0.25  3.05
## 69        8 70.5 4.53  0.52  3.05
## 70        8 70.5 4.53  0.98  7.31
## 71        8 70.5 4.53  2.02  7.56
## 72        8 70.5 4.53  3.53  6.59
## 73        8 70.5 4.53  5.05  5.88
## 74        8 70.5 4.53  7.15  4.73
## 75        8 70.5 4.53  9.07  4.57
## 76        8 70.5 4.53 12.10  3.00
## 77        8 70.5 4.53 24.12  1.25
## 78        2 72.4 4.40  0.00  0.00
## 79        2 72.4 4.40  0.27  1.72
## 80        2 72.4 4.40  0.52  7.91
## 81        2 72.4 4.40  1.00  8.31
## 82        2 72.4 4.40  1.92  8.33
## 83        2 72.4 4.40  3.50  6.85
## 84        2 72.4 4.40  5.02  6.08
## 85        2 72.4 4.40  7.03  5.40
## 86        2 72.4 4.40  9.00  4.55
## 87        2 72.4 4.40 12.00  3.01
## 88        2 72.4 4.40 24.30  0.90
## 89        4 72.7 4.40  0.00  0.00
## 90        4 72.7 4.40  0.35  1.89
## 91        4 72.7 4.40  0.60  4.60
## 92        4 72.7 4.40  1.07  8.60
## 93        4 72.7 4.40  2.13  8.38
## 94        4 72.7 4.40  3.50  7.54
## 95        4 72.7 4.40  5.02  6.88
## 96        4 72.7 4.40  7.02  5.78
## 97        4 72.7 4.40  9.02  5.33
## 98        4 72.7 4.40 11.98  4.19
## 99        4 72.7 4.40 24.65  1.15
## 100       1 79.6 4.02  0.00  0.74
## 101       1 79.6 4.02  0.25  2.84
## 102       1 79.6 4.02  0.57  6.57
## 103       1 79.6 4.02  1.12 10.50
## 104       1 79.6 4.02  2.02  9.66
## 105       1 79.6 4.02  3.82  8.58
## 106       1 79.6 4.02  5.10  8.36
## 107       1 79.6 4.02  7.03  7.47
## 108       1 79.6 4.02  9.05  6.89
## 109       1 79.6 4.02 12.12  5.94
## 110       1 79.6 4.02 24.37  3.28
## 111       6 80.0 4.00  0.00  0.00
## 112       6 80.0 4.00  0.27  1.29
## 113       6 80.0 4.00  0.58  3.08
## 114       6 80.0 4.00  1.15  6.44
## 115       6 80.0 4.00  2.03  6.32
## 116       6 80.0 4.00  3.57  5.53
## 117       6 80.0 4.00  5.00  4.94
## 118       6 80.0 4.00  7.00  4.02
## 119       6 80.0 4.00  9.22  3.46
## 120       6 80.0 4.00 12.10  2.78
## 121       6 80.0 4.00 23.85  0.92
## 122       9 86.4 3.10  0.00  0.00
## 123       9 86.4 3.10  0.30  7.37
## 124       9 86.4 3.10  0.63  9.03
## 125       9 86.4 3.10  1.05  7.14
## 126       9 86.4 3.10  2.02  6.33
## 127       9 86.4 3.10  3.53  5.66
## 128       9 86.4 3.10  5.02  5.67
## 129       9 86.4 3.10  7.17  4.24
## 130       9 86.4 3.10  8.80  4.11
## 131       9 86.4 3.10 11.60  3.16
## 132       9 86.4 3.10 24.43  1.12
arrange(df,Wt,desc(Time))
##     Subject   Wt Dose  Time  conc
## 1         5 54.6 5.86 24.35  1.57
## 2         5 54.6 5.86 12.00  4.37
## 3         5 54.6 5.86  9.10  5.90
## 4         5 54.6 5.86  7.02  7.09
## 5         5 54.6 5.86  5.02  7.56
## 6         5 54.6 5.86  3.50  8.74
## 7         5 54.6 5.86  2.02  9.33
## 8         5 54.6 5.86  1.00 11.40
## 9         5 54.6 5.86  0.52  5.63
## 10        5 54.6 5.86  0.30  2.02
## 11        5 54.6 5.86  0.00  0.00
## 12       10 58.2 5.50 23.70  2.42
## 13       10 58.2 5.50 12.10  5.68
## 14       10 58.2 5.50  9.38  7.14
## 15       10 58.2 5.50  7.08  8.02
## 16       10 58.2 5.50  5.05  9.18
## 17       10 58.2 5.50  3.55 10.21
## 18       10 58.2 5.50  2.05  7.83
## 19       10 58.2 5.50  1.02  6.41
## 20       10 58.2 5.50  0.77  5.22
## 21       10 58.2 5.50  0.37  2.89
## 22       10 58.2 5.50  0.00  0.24
## 23       12 60.5 5.30 24.15  1.17
## 24       12 60.5 5.30 12.05  4.57
## 25       12 60.5 5.30  9.03  6.11
## 26       12 60.5 5.30  7.07  6.59
## 27       12 60.5 5.30  5.07  8.57
## 28       12 60.5 5.30  3.52  9.75
## 29       12 60.5 5.30  2.00  9.72
## 30       12 60.5 5.30  1.00  7.82
## 31       12 60.5 5.30  0.50  3.96
## 32       12 60.5 5.30  0.25  1.25
## 33       12 60.5 5.30  0.00  0.00
## 34        7 64.6 4.95 24.22  1.15
## 35        7 64.6 4.95 12.05  3.53
## 36        7 64.6 4.95  9.00  4.39
## 37        7 64.6 4.95  6.98  5.25
## 38        7 64.6 4.95  5.00  6.66
## 39        7 64.6 4.95  3.48  7.09
## 40        7 64.6 4.95  2.02  6.58
## 41        7 64.6 4.95  1.02  5.02
## 42        7 64.6 4.95  0.50  2.35
## 43        7 64.6 4.95  0.25  0.85
## 44        7 64.6 4.95  0.00  0.15
## 45       11 65.0 4.92 24.08  0.86
## 46       11 65.0 4.92 12.12  2.69
## 47       11 65.0 4.92  9.03  3.62
## 48       11 65.0 4.92  7.03  4.45
## 49       11 65.0 4.92  5.02  5.22
## 50       11 65.0 4.92  3.60  5.87
## 51       11 65.0 4.92  1.98  6.81
## 52       11 65.0 4.92  0.98  8.00
## 53       11 65.0 4.92  0.50  7.24
## 54       11 65.0 4.92  0.25  4.86
## 55       11 65.0 4.92  0.00  0.00
## 56        3 70.5 4.53 24.17  1.05
## 57        8 70.5 4.53 24.12  1.25
## 58        3 70.5 4.53 12.15  3.70
## 59        8 70.5 4.53 12.10  3.00
## 60        8 70.5 4.53  9.07  4.57
## 61        3 70.5 4.53  9.00  4.90
## 62        8 70.5 4.53  7.15  4.73
## 63        3 70.5 4.53  7.07  5.30
## 64        3 70.5 4.53  5.08  6.20
## 65        8 70.5 4.53  5.05  5.88
## 66        3 70.5 4.53  3.62  7.50
## 67        8 70.5 4.53  3.53  6.59
## 68        3 70.5 4.53  2.02  7.80
## 69        8 70.5 4.53  2.02  7.56
## 70        3 70.5 4.53  1.02  8.20
## 71        8 70.5 4.53  0.98  7.31
## 72        3 70.5 4.53  0.58  6.90
## 73        8 70.5 4.53  0.52  3.05
## 74        3 70.5 4.53  0.27  4.40
## 75        8 70.5 4.53  0.25  3.05
## 76        3 70.5 4.53  0.00  0.00
## 77        8 70.5 4.53  0.00  0.00
## 78        2 72.4 4.40 24.30  0.90
## 79        2 72.4 4.40 12.00  3.01
## 80        2 72.4 4.40  9.00  4.55
## 81        2 72.4 4.40  7.03  5.40
## 82        2 72.4 4.40  5.02  6.08
## 83        2 72.4 4.40  3.50  6.85
## 84        2 72.4 4.40  1.92  8.33
## 85        2 72.4 4.40  1.00  8.31
## 86        2 72.4 4.40  0.52  7.91
## 87        2 72.4 4.40  0.27  1.72
## 88        2 72.4 4.40  0.00  0.00
## 89        4 72.7 4.40 24.65  1.15
## 90        4 72.7 4.40 11.98  4.19
## 91        4 72.7 4.40  9.02  5.33
## 92        4 72.7 4.40  7.02  5.78
## 93        4 72.7 4.40  5.02  6.88
## 94        4 72.7 4.40  3.50  7.54
## 95        4 72.7 4.40  2.13  8.38
## 96        4 72.7 4.40  1.07  8.60
## 97        4 72.7 4.40  0.60  4.60
## 98        4 72.7 4.40  0.35  1.89
## 99        4 72.7 4.40  0.00  0.00
## 100       1 79.6 4.02 24.37  3.28
## 101       1 79.6 4.02 12.12  5.94
## 102       1 79.6 4.02  9.05  6.89
## 103       1 79.6 4.02  7.03  7.47
## 104       1 79.6 4.02  5.10  8.36
## 105       1 79.6 4.02  3.82  8.58
## 106       1 79.6 4.02  2.02  9.66
## 107       1 79.6 4.02  1.12 10.50
## 108       1 79.6 4.02  0.57  6.57
## 109       1 79.6 4.02  0.25  2.84
## 110       1 79.6 4.02  0.00  0.74
## 111       6 80.0 4.00 23.85  0.92
## 112       6 80.0 4.00 12.10  2.78
## 113       6 80.0 4.00  9.22  3.46
## 114       6 80.0 4.00  7.00  4.02
## 115       6 80.0 4.00  5.00  4.94
## 116       6 80.0 4.00  3.57  5.53
## 117       6 80.0 4.00  2.03  6.32
## 118       6 80.0 4.00  1.15  6.44
## 119       6 80.0 4.00  0.58  3.08
## 120       6 80.0 4.00  0.27  1.29
## 121       6 80.0 4.00  0.00  0.00
## 122       9 86.4 3.10 24.43  1.12
## 123       9 86.4 3.10 11.60  3.16
## 124       9 86.4 3.10  8.80  4.11
## 125       9 86.4 3.10  7.17  4.24
## 126       9 86.4 3.10  5.02  5.67
## 127       9 86.4 3.10  3.53  5.66
## 128       9 86.4 3.10  2.02  6.33
## 129       9 86.4 3.10  1.05  7.14
## 130       9 86.4 3.10  0.63  9.03
## 131       9 86.4 3.10  0.30  7.37
## 132       9 86.4 3.10  0.00  0.00
###############
#             #
# Exercise 7  #
#             #
###############
mutate(df,trend=Time-mean(Time,na.rm = TRUE))
##     Subject   Wt Dose  Time  conc      trend
## 1         1 79.6 4.02  0.00  0.74 -5.8946212
## 2         1 79.6 4.02  0.25  2.84 -5.6446212
## 3         1 79.6 4.02  0.57  6.57 -5.3246212
## 4         1 79.6 4.02  1.12 10.50 -4.7746212
## 5         1 79.6 4.02  2.02  9.66 -3.8746212
## 6         1 79.6 4.02  3.82  8.58 -2.0746212
## 7         1 79.6 4.02  5.10  8.36 -0.7946212
## 8         1 79.6 4.02  7.03  7.47  1.1353788
## 9         1 79.6 4.02  9.05  6.89  3.1553788
## 10        1 79.6 4.02 12.12  5.94  6.2253788
## 11        1 79.6 4.02 24.37  3.28 18.4753788
## 12        2 72.4 4.40  0.00  0.00 -5.8946212
## 13        2 72.4 4.40  0.27  1.72 -5.6246212
## 14        2 72.4 4.40  0.52  7.91 -5.3746212
## 15        2 72.4 4.40  1.00  8.31 -4.8946212
## 16        2 72.4 4.40  1.92  8.33 -3.9746212
## 17        2 72.4 4.40  3.50  6.85 -2.3946212
## 18        2 72.4 4.40  5.02  6.08 -0.8746212
## 19        2 72.4 4.40  7.03  5.40  1.1353788
## 20        2 72.4 4.40  9.00  4.55  3.1053788
## 21        2 72.4 4.40 12.00  3.01  6.1053788
## 22        2 72.4 4.40 24.30  0.90 18.4053788
## 23        3 70.5 4.53  0.00  0.00 -5.8946212
## 24        3 70.5 4.53  0.27  4.40 -5.6246212
## 25        3 70.5 4.53  0.58  6.90 -5.3146212
## 26        3 70.5 4.53  1.02  8.20 -4.8746212
## 27        3 70.5 4.53  2.02  7.80 -3.8746212
## 28        3 70.5 4.53  3.62  7.50 -2.2746212
## 29        3 70.5 4.53  5.08  6.20 -0.8146212
## 30        3 70.5 4.53  7.07  5.30  1.1753788
## 31        3 70.5 4.53  9.00  4.90  3.1053788
## 32        3 70.5 4.53 12.15  3.70  6.2553788
## 33        3 70.5 4.53 24.17  1.05 18.2753788
## 34        4 72.7 4.40  0.00  0.00 -5.8946212
## 35        4 72.7 4.40  0.35  1.89 -5.5446212
## 36        4 72.7 4.40  0.60  4.60 -5.2946212
## 37        4 72.7 4.40  1.07  8.60 -4.8246212
## 38        4 72.7 4.40  2.13  8.38 -3.7646212
## 39        4 72.7 4.40  3.50  7.54 -2.3946212
## 40        4 72.7 4.40  5.02  6.88 -0.8746212
## 41        4 72.7 4.40  7.02  5.78  1.1253788
## 42        4 72.7 4.40  9.02  5.33  3.1253788
## 43        4 72.7 4.40 11.98  4.19  6.0853788
## 44        4 72.7 4.40 24.65  1.15 18.7553788
## 45        5 54.6 5.86  0.00  0.00 -5.8946212
## 46        5 54.6 5.86  0.30  2.02 -5.5946212
## 47        5 54.6 5.86  0.52  5.63 -5.3746212
## 48        5 54.6 5.86  1.00 11.40 -4.8946212
## 49        5 54.6 5.86  2.02  9.33 -3.8746212
## 50        5 54.6 5.86  3.50  8.74 -2.3946212
## 51        5 54.6 5.86  5.02  7.56 -0.8746212
## 52        5 54.6 5.86  7.02  7.09  1.1253788
## 53        5 54.6 5.86  9.10  5.90  3.2053788
## 54        5 54.6 5.86 12.00  4.37  6.1053788
## 55        5 54.6 5.86 24.35  1.57 18.4553788
## 56        6 80.0 4.00  0.00  0.00 -5.8946212
## 57        6 80.0 4.00  0.27  1.29 -5.6246212
## 58        6 80.0 4.00  0.58  3.08 -5.3146212
## 59        6 80.0 4.00  1.15  6.44 -4.7446212
## 60        6 80.0 4.00  2.03  6.32 -3.8646212
## 61        6 80.0 4.00  3.57  5.53 -2.3246212
## 62        6 80.0 4.00  5.00  4.94 -0.8946212
## 63        6 80.0 4.00  7.00  4.02  1.1053788
## 64        6 80.0 4.00  9.22  3.46  3.3253788
## 65        6 80.0 4.00 12.10  2.78  6.2053788
## 66        6 80.0 4.00 23.85  0.92 17.9553788
## 67        7 64.6 4.95  0.00  0.15 -5.8946212
## 68        7 64.6 4.95  0.25  0.85 -5.6446212
## 69        7 64.6 4.95  0.50  2.35 -5.3946212
## 70        7 64.6 4.95  1.02  5.02 -4.8746212
## 71        7 64.6 4.95  2.02  6.58 -3.8746212
## 72        7 64.6 4.95  3.48  7.09 -2.4146212
## 73        7 64.6 4.95  5.00  6.66 -0.8946212
## 74        7 64.6 4.95  6.98  5.25  1.0853788
## 75        7 64.6 4.95  9.00  4.39  3.1053788
## 76        7 64.6 4.95 12.05  3.53  6.1553788
## 77        7 64.6 4.95 24.22  1.15 18.3253788
## 78        8 70.5 4.53  0.00  0.00 -5.8946212
## 79        8 70.5 4.53  0.25  3.05 -5.6446212
## 80        8 70.5 4.53  0.52  3.05 -5.3746212
## 81        8 70.5 4.53  0.98  7.31 -4.9146212
## 82        8 70.5 4.53  2.02  7.56 -3.8746212
## 83        8 70.5 4.53  3.53  6.59 -2.3646212
## 84        8 70.5 4.53  5.05  5.88 -0.8446212
## 85        8 70.5 4.53  7.15  4.73  1.2553788
## 86        8 70.5 4.53  9.07  4.57  3.1753788
## 87        8 70.5 4.53 12.10  3.00  6.2053788
## 88        8 70.5 4.53 24.12  1.25 18.2253788
## 89        9 86.4 3.10  0.00  0.00 -5.8946212
## 90        9 86.4 3.10  0.30  7.37 -5.5946212
## 91        9 86.4 3.10  0.63  9.03 -5.2646212
## 92        9 86.4 3.10  1.05  7.14 -4.8446212
## 93        9 86.4 3.10  2.02  6.33 -3.8746212
## 94        9 86.4 3.10  3.53  5.66 -2.3646212
## 95        9 86.4 3.10  5.02  5.67 -0.8746212
## 96        9 86.4 3.10  7.17  4.24  1.2753788
## 97        9 86.4 3.10  8.80  4.11  2.9053788
## 98        9 86.4 3.10 11.60  3.16  5.7053788
## 99        9 86.4 3.10 24.43  1.12 18.5353788
## 100      10 58.2 5.50  0.00  0.24 -5.8946212
## 101      10 58.2 5.50  0.37  2.89 -5.5246212
## 102      10 58.2 5.50  0.77  5.22 -5.1246212
## 103      10 58.2 5.50  1.02  6.41 -4.8746212
## 104      10 58.2 5.50  2.05  7.83 -3.8446212
## 105      10 58.2 5.50  3.55 10.21 -2.3446212
## 106      10 58.2 5.50  5.05  9.18 -0.8446212
## 107      10 58.2 5.50  7.08  8.02  1.1853788
## 108      10 58.2 5.50  9.38  7.14  3.4853788
## 109      10 58.2 5.50 12.10  5.68  6.2053788
## 110      10 58.2 5.50 23.70  2.42 17.8053788
## 111      11 65.0 4.92  0.00  0.00 -5.8946212
## 112      11 65.0 4.92  0.25  4.86 -5.6446212
## 113      11 65.0 4.92  0.50  7.24 -5.3946212
## 114      11 65.0 4.92  0.98  8.00 -4.9146212
## 115      11 65.0 4.92  1.98  6.81 -3.9146212
## 116      11 65.0 4.92  3.60  5.87 -2.2946212
## 117      11 65.0 4.92  5.02  5.22 -0.8746212
## 118      11 65.0 4.92  7.03  4.45  1.1353788
## 119      11 65.0 4.92  9.03  3.62  3.1353788
## 120      11 65.0 4.92 12.12  2.69  6.2253788
## 121      11 65.0 4.92 24.08  0.86 18.1853788
## 122      12 60.5 5.30  0.00  0.00 -5.8946212
## 123      12 60.5 5.30  0.25  1.25 -5.6446212
## 124      12 60.5 5.30  0.50  3.96 -5.3946212
## 125      12 60.5 5.30  1.00  7.82 -4.8946212
## 126      12 60.5 5.30  2.00  9.72 -3.8946212
## 127      12 60.5 5.30  3.52  9.75 -2.3746212
## 128      12 60.5 5.30  5.07  8.57 -0.8246212
## 129      12 60.5 5.30  7.07  6.59  1.1753788
## 130      12 60.5 5.30  9.03  6.11  3.1353788
## 131      12 60.5 5.30 12.05  4.57  6.1553788
## 132      12 60.5 5.30 24.15  1.17 18.2553788
###############
#             #
# Exercise 8  #
#             #
###############

mutate(df, weight_cat = ifelse(Wt < 69.85 , "Welterweight",ifelse(Wt < 72.57 , "Light-middleweight",ifelse(Wt < 76.2  , "middleweight", ifelse(Wt>76.2, "Super-Middleweight", "Unknown")))))
##     Subject   Wt Dose  Time  conc         weight_cat
## 1         1 79.6 4.02  0.00  0.74 Super-Middleweight
## 2         1 79.6 4.02  0.25  2.84 Super-Middleweight
## 3         1 79.6 4.02  0.57  6.57 Super-Middleweight
## 4         1 79.6 4.02  1.12 10.50 Super-Middleweight
## 5         1 79.6 4.02  2.02  9.66 Super-Middleweight
## 6         1 79.6 4.02  3.82  8.58 Super-Middleweight
## 7         1 79.6 4.02  5.10  8.36 Super-Middleweight
## 8         1 79.6 4.02  7.03  7.47 Super-Middleweight
## 9         1 79.6 4.02  9.05  6.89 Super-Middleweight
## 10        1 79.6 4.02 12.12  5.94 Super-Middleweight
## 11        1 79.6 4.02 24.37  3.28 Super-Middleweight
## 12        2 72.4 4.40  0.00  0.00 Light-middleweight
## 13        2 72.4 4.40  0.27  1.72 Light-middleweight
## 14        2 72.4 4.40  0.52  7.91 Light-middleweight
## 15        2 72.4 4.40  1.00  8.31 Light-middleweight
## 16        2 72.4 4.40  1.92  8.33 Light-middleweight
## 17        2 72.4 4.40  3.50  6.85 Light-middleweight
## 18        2 72.4 4.40  5.02  6.08 Light-middleweight
## 19        2 72.4 4.40  7.03  5.40 Light-middleweight
## 20        2 72.4 4.40  9.00  4.55 Light-middleweight
## 21        2 72.4 4.40 12.00  3.01 Light-middleweight
## 22        2 72.4 4.40 24.30  0.90 Light-middleweight
## 23        3 70.5 4.53  0.00  0.00 Light-middleweight
## 24        3 70.5 4.53  0.27  4.40 Light-middleweight
## 25        3 70.5 4.53  0.58  6.90 Light-middleweight
## 26        3 70.5 4.53  1.02  8.20 Light-middleweight
## 27        3 70.5 4.53  2.02  7.80 Light-middleweight
## 28        3 70.5 4.53  3.62  7.50 Light-middleweight
## 29        3 70.5 4.53  5.08  6.20 Light-middleweight
## 30        3 70.5 4.53  7.07  5.30 Light-middleweight
## 31        3 70.5 4.53  9.00  4.90 Light-middleweight
## 32        3 70.5 4.53 12.15  3.70 Light-middleweight
## 33        3 70.5 4.53 24.17  1.05 Light-middleweight
## 34        4 72.7 4.40  0.00  0.00       middleweight
## 35        4 72.7 4.40  0.35  1.89       middleweight
## 36        4 72.7 4.40  0.60  4.60       middleweight
## 37        4 72.7 4.40  1.07  8.60       middleweight
## 38        4 72.7 4.40  2.13  8.38       middleweight
## 39        4 72.7 4.40  3.50  7.54       middleweight
## 40        4 72.7 4.40  5.02  6.88       middleweight
## 41        4 72.7 4.40  7.02  5.78       middleweight
## 42        4 72.7 4.40  9.02  5.33       middleweight
## 43        4 72.7 4.40 11.98  4.19       middleweight
## 44        4 72.7 4.40 24.65  1.15       middleweight
## 45        5 54.6 5.86  0.00  0.00       Welterweight
## 46        5 54.6 5.86  0.30  2.02       Welterweight
## 47        5 54.6 5.86  0.52  5.63       Welterweight
## 48        5 54.6 5.86  1.00 11.40       Welterweight
## 49        5 54.6 5.86  2.02  9.33       Welterweight
## 50        5 54.6 5.86  3.50  8.74       Welterweight
## 51        5 54.6 5.86  5.02  7.56       Welterweight
## 52        5 54.6 5.86  7.02  7.09       Welterweight
## 53        5 54.6 5.86  9.10  5.90       Welterweight
## 54        5 54.6 5.86 12.00  4.37       Welterweight
## 55        5 54.6 5.86 24.35  1.57       Welterweight
## 56        6 80.0 4.00  0.00  0.00 Super-Middleweight
## 57        6 80.0 4.00  0.27  1.29 Super-Middleweight
## 58        6 80.0 4.00  0.58  3.08 Super-Middleweight
## 59        6 80.0 4.00  1.15  6.44 Super-Middleweight
## 60        6 80.0 4.00  2.03  6.32 Super-Middleweight
## 61        6 80.0 4.00  3.57  5.53 Super-Middleweight
## 62        6 80.0 4.00  5.00  4.94 Super-Middleweight
## 63        6 80.0 4.00  7.00  4.02 Super-Middleweight
## 64        6 80.0 4.00  9.22  3.46 Super-Middleweight
## 65        6 80.0 4.00 12.10  2.78 Super-Middleweight
## 66        6 80.0 4.00 23.85  0.92 Super-Middleweight
## 67        7 64.6 4.95  0.00  0.15       Welterweight
## 68        7 64.6 4.95  0.25  0.85       Welterweight
## 69        7 64.6 4.95  0.50  2.35       Welterweight
## 70        7 64.6 4.95  1.02  5.02       Welterweight
## 71        7 64.6 4.95  2.02  6.58       Welterweight
## 72        7 64.6 4.95  3.48  7.09       Welterweight
## 73        7 64.6 4.95  5.00  6.66       Welterweight
## 74        7 64.6 4.95  6.98  5.25       Welterweight
## 75        7 64.6 4.95  9.00  4.39       Welterweight
## 76        7 64.6 4.95 12.05  3.53       Welterweight
## 77        7 64.6 4.95 24.22  1.15       Welterweight
## 78        8 70.5 4.53  0.00  0.00 Light-middleweight
## 79        8 70.5 4.53  0.25  3.05 Light-middleweight
## 80        8 70.5 4.53  0.52  3.05 Light-middleweight
## 81        8 70.5 4.53  0.98  7.31 Light-middleweight
## 82        8 70.5 4.53  2.02  7.56 Light-middleweight
## 83        8 70.5 4.53  3.53  6.59 Light-middleweight
## 84        8 70.5 4.53  5.05  5.88 Light-middleweight
## 85        8 70.5 4.53  7.15  4.73 Light-middleweight
## 86        8 70.5 4.53  9.07  4.57 Light-middleweight
## 87        8 70.5 4.53 12.10  3.00 Light-middleweight
## 88        8 70.5 4.53 24.12  1.25 Light-middleweight
## 89        9 86.4 3.10  0.00  0.00 Super-Middleweight
## 90        9 86.4 3.10  0.30  7.37 Super-Middleweight
## 91        9 86.4 3.10  0.63  9.03 Super-Middleweight
## 92        9 86.4 3.10  1.05  7.14 Super-Middleweight
## 93        9 86.4 3.10  2.02  6.33 Super-Middleweight
## 94        9 86.4 3.10  3.53  5.66 Super-Middleweight
## 95        9 86.4 3.10  5.02  5.67 Super-Middleweight
## 96        9 86.4 3.10  7.17  4.24 Super-Middleweight
## 97        9 86.4 3.10  8.80  4.11 Super-Middleweight
## 98        9 86.4 3.10 11.60  3.16 Super-Middleweight
## 99        9 86.4 3.10 24.43  1.12 Super-Middleweight
## 100      10 58.2 5.50  0.00  0.24       Welterweight
## 101      10 58.2 5.50  0.37  2.89       Welterweight
## 102      10 58.2 5.50  0.77  5.22       Welterweight
## 103      10 58.2 5.50  1.02  6.41       Welterweight
## 104      10 58.2 5.50  2.05  7.83       Welterweight
## 105      10 58.2 5.50  3.55 10.21       Welterweight
## 106      10 58.2 5.50  5.05  9.18       Welterweight
## 107      10 58.2 5.50  7.08  8.02       Welterweight
## 108      10 58.2 5.50  9.38  7.14       Welterweight
## 109      10 58.2 5.50 12.10  5.68       Welterweight
## 110      10 58.2 5.50 23.70  2.42       Welterweight
## 111      11 65.0 4.92  0.00  0.00       Welterweight
## 112      11 65.0 4.92  0.25  4.86       Welterweight
## 113      11 65.0 4.92  0.50  7.24       Welterweight
## 114      11 65.0 4.92  0.98  8.00       Welterweight
## 115      11 65.0 4.92  1.98  6.81       Welterweight
## 116      11 65.0 4.92  3.60  5.87       Welterweight
## 117      11 65.0 4.92  5.02  5.22       Welterweight
## 118      11 65.0 4.92  7.03  4.45       Welterweight
## 119      11 65.0 4.92  9.03  3.62       Welterweight
## 120      11 65.0 4.92 12.12  2.69       Welterweight
## 121      11 65.0 4.92 24.08  0.86       Welterweight
## 122      12 60.5 5.30  0.00  0.00       Welterweight
## 123      12 60.5 5.30  0.25  1.25       Welterweight
## 124      12 60.5 5.30  0.50  3.96       Welterweight
## 125      12 60.5 5.30  1.00  7.82       Welterweight
## 126      12 60.5 5.30  2.00  9.72       Welterweight
## 127      12 60.5 5.30  3.52  9.75       Welterweight
## 128      12 60.5 5.30  5.07  8.57       Welterweight
## 129      12 60.5 5.30  7.07  6.59       Welterweight
## 130      12 60.5 5.30  9.03  6.11       Welterweight
## 131      12 60.5 5.30 12.05  4.57       Welterweight
## 132      12 60.5 5.30 24.15  1.17       Welterweight
###############
#             #
# Exercise 9  #
#             #
###############

weight_group=group_by(df_2,weight_cat)
## Error in group_by_(.data, .dots = lazyeval::lazy_dots(...), add = add): object 'df_2' not found
###############
#             #
# Exercise 10  #
#             #
###############



summarize(weight_cat,Avg_Time=mean(Time),Total_Dose=sum(Dose))
## # A tibble: 4 × 3
##           weight_cat Avg_Time Total_Dose
##                <chr>    <dbl>      <dbl>
## 1 Light-middleweight 5.888788     148.06
## 2       middleweight 5.940000      48.40
## 3 Super-Middleweight 5.902121     122.32
## 4       Welterweight 5.884545     291.83



Tidy the data up!

Tidyr makes it easier for you to tidy your data. Some common uses are seperating columns with a seperator, uniting two columns into one, stacking and unstacking column categories etc.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Learn more about Tidyr in the online course R Data Pre-Processing & Data Management – Shape your Data! This course has 5 lectures on Tidyr, in addition to lectures on importing data, filtering and querying using dplyr, and SQL.

Exercise 1
Install and load the package tidyr. Copy and paste this code to get df

First=c(“Josh”,”Emily”,”Clark”,”Barack”)
Last=c(“Blah”,”Choo”,”Kent”,”Obama”)
age=c(21,24,30,50)
jobs=c(“programmer/Athlete”,”Blogger/Doctor”,”Reporter/Crime Fighter”,”President/commander in chief”)
hometown=c(“san-antonio,texas”,”corona,california”,”smallville,Kansas”,”chicago,iilinois”)

df= data.frame(First,Last,age,jobs,hometown)

 

Exercise 2

Use the separate() function to seperate jobs into First_job and Second_job using the appropriate seperator. Store it back into df.

Exercise 3

Combine the First and Last column together with one space between them to form the column “Name”. Use the function unite(). Store this back into df

Exercise 4
Use the separate() function to seperate hometown column into City and State using the appropriate seperator. Store it back into df.

Exercise 5

Add this code

 

x2014=c(“32131″,”41231″,”51341″,”12312”)
x2015=c(“2132131″,”12341231″,”12351341″,”12312312”)
x2016=c(“3122132131″,”512341231″,”12312351341″,”4412312312”)
inst_followers=data.frame(name,x2014,x2015,x2016)

 

view the inst_followers dataframe using the str() command

Exercise 6

Stack the year columns from the inst_followers data from wide to long format using the gather() command. Dataframe should now have columns name, year and followers only. Store this back in the dataframe.

Exercise 7

Unstack the columns and spread the year categories of the new dataframe out into columns, bringing it back to the original wide format. Use the spread() command.

Exercise 8

Type this code:

Month=c(“Jan”,”Feb”,”Mar”,”Apr”)
stock_A=c(1854.8,1850.3,1812.1,1835.3)
stock_B=c(1854.8,1850.3,1812.1,1835.3)
stocks=data.frame(Month,stock_A,stock_B)
stocks

Exercise 9

Use the gather() command to gather stock_A and stock_B. The columns should now look like | Month | Stock | Earnings|
Store this in stocks.

Exercise 10

Use the spread() command to bring the stocks d back to its original shape.




Tidy the Data Up Solutions

Below are the solutions to these exercises on sorting and ordering.

 

###############
#             #
# Exercise 1  #
#             #
###############

library(tidyr)
First=c("Josh","Emily","Clark","Barack")
Last=c("Blah","Choo","Kent","Obama")
age=c(21,24,30,50)
jobs=c("programmer/Athlete","Blogger/Doctor","Reporter/Crime Fighter","President/commander in chief")
hometown=c("san-antonio,texas","corona,california","smallville,Kansas","chicago,iilinois")

df= data.frame(First,Last,age,jobs,hometown)


###############
#             #
# Exercise 2  #
#             #
###############
df=separate(df,jobs,c("First_job","Second_job"),sep="/")

###############
#             #
# Exercise 3  #
#             #
###############

df=unite(df,"Name",c(First,Last),sep=" ")

###############
#             #
# Exercise 4  #
#             #
###############

df=separate(df,hometown,c("City","State"),sep=",")

###############
#             #
# Exercise 5  #
#             #
###############

x2014=c("32131","41231","51341","12312")
x2015=c("2132131","12341231","12351341","12312312")
x2016=c("3122132131","512341231","12312351341","4412312312")
inst_followers=data.frame(name,x2014,x2015,x2016)
str(inst_followers)
## 'data.frame':	4 obs. of  4 variables:
##  $ name : Factor w/ 4 levels "Barack","Clark",..: 4 3 2 1
##  $ x2014: Factor w/ 4 levels "12312","32131",..: 2 3 4 1
##  $ x2015: Factor w/ 4 levels "12312312","12341231",..: 4 2 3 1
##  $ x2016: Factor w/ 4 levels "12312351341",..: 2 4 1 3
###############
#             #
# Exercise 6  #
#             #
###############
inst_followers=gather(inst_followers,year,followers,x2014:x2016)
## Warning: attributes are not identical across measure variables; they will
## be dropped
###############
#             #
# Exercise 7  #
#             #
###############
spread(inst_followers,year,followers)
##     name x2014    x2015       x2016
## 1 Barack 12312 12312312  4412312312
## 2  Clark 51341 12351341 12312351341
## 3  Emily 41231 12341231   512341231
## 4   Josh 32131  2132131  3122132131
###############
#             #
# Exercise 8  #
#             #
###############

Month=c("Jan","Feb","Mar","Apr")
stock_A=c(1854.8,1850.3,1812.1,1835.3)
stock_B=c(1854.8,1850.3,1812.1,1835.3)
stocks=data.frame(Month,stock_A,stock_B)
stocks
##   Month stock_A stock_B
## 1   Jan  1854.8  1854.8
## 2   Feb  1850.3  1850.3
## 3   Mar  1812.1  1812.1
## 4   Apr  1835.3  1835.3
###############
#             #
# Exercise 9  #
#             #
###############

stocks=gather(stocks,stock,earnings,stock_A:stock_B)


###############
#             #
# Exercise 10  #
#             #
###############

spread(stocks,stock,earnings)
##   Month stock_A stock_B
## 1   Apr  1835.3  1835.3
## 2   Feb  1850.3  1850.3
## 3   Jan  1854.8  1854.8
## 4   Mar  1812.1  1812.1



Intermediate Tree 2

This is a continuation of the intermediate decision tree exercise.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

 

Exercise 1
use the predict() command to make predictions on the Train data. Set the method to “class”. Class returns classifications instead of probability scores. Store this prediction in pred_dec.

Exercise 2
Print out the confusion matrix

Exercise 3
What is the accuracy of the model. Use the confusion matrix.

Exercise 4
What is the misclassification error rate? Refer to Basic_decision_tree exercise to get the formula.

Exercise 5
Lets say we want to find the baseline model to compare our prediction improvement. We create a base model using this code


length(Test$class)
base=rep(1,3183)

Use the table() command to create a confusion matrix between the base and Test$class.

Learn more about decision trees in the online courses

Exercise 6
What is the number difference between the confusion matrix accuracy of dec and base?

Exercise 7

Remember the predict() command in question 1. We will use the same mode and same command except we will set the method to “regression”. This gives us a probability estimates. Store this in pred_dec_reg

Exercise 8
load the ROCR package.

Use the prediction(), performance() and plot() command to print the ROC curve. Use pred_dec_reg variable from Q7. You can also refer to the previous exercise to see the code.

Exercise 9
plot() the same ROC curve but set colorize=TRUE

Exercise 10
Comment on your findings using ROC curve and accuracy. Is it a good model? Did you notice that ROC prediction() command only takes probability predictions as one of its arguments. Why is that so?




Intermediate Tree 2

Below are the solutions to these intermediate exercise on decision tree 2

df=read.csv("C:/Contract/adult.csv",header=FALSE)
library(caTools)
colnames(df)=c("age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationhip","race","sex","capital-gain","capital-loss","hours-per-week","native-country","class")
df$class=ifelse(df$class==" >50K", 1, 0)
df$class=as.factor(df$class)
set.seed(1000)

split=sample.split(df$class, SplitRatio=0.8)

Train=df[split==TRUE,]

Test=df[split==FALSE,]
library(rpart)
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 3.3.2
dec=rpart(class~., data=Train)
par(mar = rep(2, 4))



###############
#             #
# Exercise 1  #
#             #
###############

pred_dec=predict(dec,newdata = Test,type="class")


###############
#             #
# Exercise 2  #
#             #
###############
table(Test$class,pred_dec)
##    pred_dec
##        0    1
##   0 2345   74
##   1  400  364
###############
#             #
# Exercise 3  #
#             #
###############

(2345+364)/(2345+364+400+74)
## [1] 0.8510839
###############
#             #
# Exercise 4  #
#             #
###############

mean(Test$class!=pred_dec)
## [1] 0.1489161
###############
#             #
# Exercise 5  #
#             #
###############

length(Test$class)
## [1] 3183
base=rep(1,3183)
table(Test$class,base)
##    base
##        1
##   0 2419
##   1  764
###############
#             #
# Exercise 6  #
#             #
###############

0.54
## [1] 0.54
###############
#             #
# Exercise 7  #
#             #
###############

pred_dec_reg=predict(dec,newdata = Test,type="prob")

###############
#             #
# Exercise 8  #
#             #
###############

pred=prediction(pred_dec_reg[,2],Test$class)
perf=performance(pred,"tpr","fpr")
plot(perf)
plot of chunk unnamed-chunk-1
###############
#             #
# Exercise 9  #
#             #
###############

plot(perf,colorize=TRUE)
plot of chunk unnamed-chunk-1
###############
#             #
# Exercise 10 #
#             #
###############

#It is a good model. The initial accuracy is 0.85 which is pretty good. The ROC curve is also leaning more towards the true positive side which is also a good sign. ROC prediction() command takes probability score predictions because it is used to give a visual representation of a range of threshold values. We can use ROC also to interpret what threshold value to chose and decide the ratio of true positive to false positives based on the problem at hand. That is for another exercise#



ROC curves

Every dataset comes with unique problems but the essense of it is to understand how to evaluate the models. Today we will briefly touch up on ROC curves.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Learn more about evaluating the performance of classification models in the online course Machine Learning A-Z™: Hands-On Python & R In Data Science. It includes 5 lectures (32 mins in total) on false positives/false negatives, the confusion matrix, accuracy paradox, and CAP curve analysis.

Exercise 1

ROC is a interesting topic. The only way we can learn ROC fast if we look at different models and see how it differs in its ROC curve. First we are going to load the ChickWeight dataset and then we are going to run different models to compare the change in ROC curve. If you do not have any package then use the install.packages() command to install them. Note the steps as well. We are essentially trying to predict wether a chicken is fed diet 1 or not. We are doing a 70:30 split and then we are using a glm log regression model to make the prediction.

Write this code in R:


library(ROCR)
library(caTools)
attach(ChickWeight)
df=ChickWeight
df$Diet=ifelse(df$Diet==1,1,0)
sample.seed=100
spl=sample.split(df$Diet,SplitRatio = 0.7)
Train=df[spl==TRUE,]
Test=df[spl==FALSE,]
model=glm(Diet~.,data=Train,family = binomial)
pred=predict(model,newdata = Test)

Exercise 2

Use the table() command to print the confusion matrix.Hint: use >0.5 to convert the probabilities into classes. What is the accuracy of the model?

Exercise 3

We want to plot the ROC curve now. Write this code down in R.

ROCRpred1=prediction(pred,Test$Diet)
ROCRperf1=performance(ROCRpred1,"tpr","fpr")

Now use the plot() command to plot ROCperf1

Exercise 4

What is your observations of the curve?Hint: Explain in terms of tpr and fpr and the direction it follows.

Exercise 5

Create a baseline model:

a.Use nrow() command to see the length of the Test variable. Store this in a variable ‘a’
b.Use the rep() command to create a list of 1’s and the number of instances should be equal to your answer in a.
c.Store the result of b into the variable rd.

What this will essentially do is a create a variable that has a 50/50 chance of getting the answer right or maybe slightly better or worse depending on the distribution of the diet variable. But this is really equal to a model that has an accuracy of 50 percent or almost by random chance.

Exercise 6

Using the same structure of the code from Q3, add the changes to reflect the requirement for the baseline and plot the baseline curve.Hint: use rd instead of pred

Exercise 7

What do you observe from the curve and explain your findings?

Exercise 8
model1 had a high accuracy rate because the chicken number was 100 percent corelated with diet. You can see that using just pure observation of the dataset. But what if we created an average model. We can do this by just using weight as the only predictor.

Write the code below to create model2


model2=glm(Diet~weight,data=Train,family = binomial)
pred2=predict(model2,newdata = Test)

Exercise 9
using the table() function again, create a confusion matrix between Test$Diet and pred2. Note the accuracy.

Exercise 10
Using the same code structure as Q3. Write down a code to plot the ROC curve of model2 and also set add=TRUE as one of the arguments so that it plots the new curve on top of the baseline. How does it compare with model 1 and the baseline model. Explain in terms of tpr and fpr




ROC curves

Below are the solutions to these exercises on ROC

###############
#             #
# Exercise 1  #
#             #
###############
library(ROCR)
library(caTools)
attach(ChickWeight)
df=ChickWeight
df$Diet=ifelse(df$Diet==1,1,0)
sample.seed=100
spl=sample.split(df$Diet,SplitRatio = 0.7)
Train=df[spl==TRUE,]
Test=df[spl==FALSE,]
model=glm(Diet~.,data=Train,family = binomial)
## Warning: glm.fit: algorithm did not converge
pred=predict(model,newdata = Test)



###############
#             #
# Exercise 2  #
#             #
###############

table(Test$Diet,pred>0.5)
##    
##     FALSE TRUE
##   0   107    0
##   1     0   66
##100 percent accuracy##

###############
#             #
# Exercise 3  #
#             #
###############






###############
#             #
# Exercise 4  #
#             #
###############

##Ideally we want maximum true positive rate and minimum false positive rate. As this is a 100 percent accurate model, we expect the curve slope to be vertical as far as possible ##

###############
#             #
# Exercise 5  #
#             #
###############


a=nrow(Test)
rd=rep(1,a)


###############
#             #
# Exercise 6  #
#             #
###############
table(Test$Diet,rd)
##    rd
##       1
##   0 107
##   1  66
###############
#             #
# Exercise 7  #
#             #
###############

##For any model with a 50 percent accuracy or a 50/50 chance of getting it right, we expect to see a curve that goes diagonal between the tpr and fpr axis. In other words When the tpr is 0.5, the fpr is also 0.50 (0.50,0.50)##

###############
#             #
# Exercise 8  #
#             #
###############

model2=glm(Diet~weight+Time,data=Train,family = binomial)
pred2=predict(model2,newdata = Test)



###############
#             #
# Exercise 9  #
#             #
###############

table(Test$Diet,pred2>0.5)
##    
##     FALSE TRUE
##   0   106    1
##   1    54   12
###############
#             #
# Exercise 10 #
#             #
###############

http://r-exercises.com/wp-admin/upload.php


##For a low Tpr that is less than 0.2, the model performs worse than the baseline model. For a Tpr greater than 0.2, it appears the model is performing slightly better than baseline. When tpr and fpr are both 0.5, the model performs better than baseline model with an accuracy of 0.61. However the model is worse than the first model.##



Intermediate Tree 1

If you followed through the Basic Decision Tree exercise, this should be useful for you. This is like a continuation but we add so much more. We are working with a bigger and badder datasets. We will be also using techniques we learned from model evaluation and work with ROC, accuracy and other metrics.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
read in the adult.csv file with header=FALSE. Store this in df. Use str() command to see the dataframe. Download the Data from here

Exercise 2
You are given the meta_data that goes with the CSV. You can download this here Use that to add the column names for your dataframe. Notice the df is ordered going from V1,V2,V3 _ _ and so on. As a side note, it is always best practice to use that to match and see if all the columns are read in correctly.

Exercise 3
Use the table command and print out the distribution of the class feature.

Exercise 4
Change the class column to binary.

Learn more about decision trees in the online courses

Exercise 5
Use the cor() command to see the corelation of all the numeric and integer columns including the class column. Remember that numbers close to 1 means high corelation and number close to 0 means low. This will give you a rough idea for feature selection

Exercise 6
Split the dataset into Train and Test sample. You may use sample.split() and use the ratio as 0.7 and set the seed to be 1000. Make sure to install and load caTools package.

Exercise 7
Check the number of rows of Train
Check the number of rows of Test

Exercise 8
We are ready to use decision tree in our dataset. Load the package “rpart” and “rpart.plot” If it is not installed, then use the install.packages() commmand.

Exercise 9
Use rpart to build the decision tree on the Train set. Include all features.Store this model in dec

Exercise 10
Use the prp() function to plot the decision tree. If you get any error use this code before the prp() command

par(mar = rep(2, 4))




intermediate tree 1

Below are the solutions to these intermediate exercise on decision tree

Learn more about decisions tree’s in the online courses Regression Machine Learning with R and Machine Learning A-Z™: Hands-On Python & R In Data Science

###############
#             #
# Exercise 1  #
#             #
###############

df=read.csv("C:/Contract/adult.csv",header=FALSE)
str(df)
## 'data.frame':	15916 obs. of  15 variables:
##  $ V1 : int  39 50 38 53 28 37 49 52 31 42 ...
##  $ V2 : Factor w/ 9 levels " ?"," Federal-gov",..: 8 7 5 5 5 5 5 7 5 5 ...
##  $ V3 : int  77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
##  $ V4 : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
##  $ V5 : int  13 13 9 7 13 14 5 9 14 13 ...
##  $ V6 : Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
##  $ V7 : Factor w/ 15 levels " ?"," Adm-clerical",..: 2 5 7 7 11 5 9 5 11 5 ...
##  $ V8 : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
##  $ V9 : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
##  $ V10: Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
##  $ V11: int  2174 0 0 0 0 0 0 0 14084 5178 ...
##  $ V12: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ V13: int  40 13 40 40 40 40 16 45 50 40 ...
##  $ V14: Factor w/ 41 levels " ?"," Cambodia",..: 39 39 39 39 6 39 23 39 39 39 ...
##  $ V15: Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
###############
#             #
# Exercise 2  #
#             #
###############

colnames(df)=c("age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationhip","race","sex","capital-gain","capital-loss","hours-per-week","native-country","class")

###############
#             #
# Exercise 3  #
#             #
###############

table(df$class)
## 
##  <=50K   >50K 
##  12097   3819
###############
#             #
# Exercise 4  #
#             #
###############

df$class=ifelse(df$class==" >50K", 1, 0)

###############
#             #
# Exercise 5  #
#             #
###############

cor(df[,c(1,3,5,11,12,13,15)])
##                        age       fnlwgt education-num capital-gain
## age             1.00000000 -0.079506361    0.02668698  0.066466487
## fnlwgt         -0.07950636  1.000000000   -0.04671504  0.000653693
## education-num   0.02668698 -0.046715043    1.00000000  0.117453069
## capital-gain    0.06646649  0.000653693    0.11745307  1.000000000
## capital-loss    0.06176551 -0.012139341    0.08090257 -0.031685331
## hours-per-week  0.05659864 -0.012345724    0.14528405  0.075715672
## class           0.22920766 -0.013067759    0.32856870  0.221049951
##                capital-loss hours-per-week       class
## age              0.06176551     0.05659864  0.22920766
## fnlwgt          -0.01213934    -0.01234572 -0.01306776
## education-num    0.08090257     0.14528405  0.32856870
## capital-gain    -0.03168533     0.07571567  0.22104995
## capital-loss     1.00000000     0.05439109  0.15366554
## hours-per-week   0.05439109     1.00000000  0.22544319
## class            0.15366554     0.22544319  1.00000000
###############
#             #
# Exercise 6  #
#             #
###############

set.seed(1000)

split=sample.split(df$class, SplitRatio=0.8)
## Error in eval(expr, envir, enclos): could not find function "sample.split"
Train=df[split==TRUE,]
## Error in split == TRUE: comparison (1) is possible only for atomic and list types
Test=df[split==FALSE,]
## Error in split == FALSE: comparison (1) is possible only for atomic and list types
###############
#             #
# Exercise 7  #
#             #
###############

nrow(Train)
## Error in nrow(Train): object 'Train' not found
nrow(Test)
## Error in nrow(Test): object 'Test' not found
###############
#             #
# Exercise 8  #
#             #
###############

library(rpart)
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 3.3.2
###############
#             #
# Exercise 9  #
#             #
###############

dec=rpart(class~., data=Train)
## Error in is.data.frame(data): object 'Train' not found
###############
#             #
# Exercise 10 #
#             #
###############

par(mar = rep(2, 4))

prp(dec)
## Error in inherits(x, "rpart"): object 'dec' not found