Experimental Design Exercises

In this set of exercises we shall follow the practice of conducting an experimental study. Researcher wants to see if there is any influence of working-out on body mass. Three groups of subjects with similar food and sport habits were included in the experiment. Each group was subjected to a different set of exercises. Body mass was measured before and after workout. The focus of the research is the difference in body mass between groups, measured after working-out. In order to examine these effects, we shall use paired t test, t test for independent samples, one-way and two-ways analysis of variance and analysis of covariance.

You can download the dataset here. The data is fictious.

Answers to the exercises are available here.

If you have different solution, feel free to post it.

Exercise 1

Load the data. Calculate descriptive statistics and test for the normality of both initial and final measurements for whole sample and for each group.

Exercise 2

Is there effect of exercises and what is the size of that effect for each group? (Tip: You should use paired t test.)

Exercise 3

Is the variance of the body mass on final measurement the same for each of the three groups? (Tip: Use Levene’s test for homogeneity of variances)

Exercise 4

Is there a difference between groups on final measurement and what is the effect size? (Tip: Use one-way ANOVA)

Learn more about statistics for your experimental design in the online course Learn By Example: Statistics and Data Science in R. In this course you will learn how to:

  • Work thru regression problems
  • use different statistical tests and interpret them
  • And much more

Exercise 5

Between which groups does the difference of body mass appear after the working-out? (Tip: Conduct post-hoc test.)

Exercise 6

What is the impact of age and working-out program on body mass on final measurement? (Tip: Use two-way between groups ANOVA.)

Exercise 7

What is the origin of effect of working-out program between subjects of different age? (Tip: You should conduct post-hoc test.)

Exercise 8

Is there a linear relationship between initial and final measurement of body mass for each group?

Exercise 9

Is there a significant difference in body mass on final measurement between groups, while controlling for initial measurement?

Exercise 10

How much of the variance is explained by independent variable? How much of the variance is explained by covariate?




Experimental Design Solutions

Below are the solutions to these exercises on Experimental design exercises

####################
#                  #
#    Exercise 1    #
#                  #
####################

data <- read.csv("experimental-design.csv")
as.factor(data$group) -> data$group
as.factor(data$age) -> data$age
summary(data$initial_mass)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   53.50   62.18   68.90   67.70   72.27   86.00
summary(data$final_mass)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   50.60   60.08   64.45   65.44   72.00   81.30
shapiro.test(data$initial_mass)
## 
## 	Shapiro-Wilk normality test
## 
## data:  data$initial_mass
## W = 0.98306, p-value = 0.5053
shapiro.test(data$final_mass)
## 
## 	Shapiro-Wilk normality test
## 
## data:  data$final_mass
## W = 0.97517, p-value = 0.2073
sapply(split(data$initial_mass, data$group), summary)
##             1     2     3
## Min.    53.50 53.50 54.50
## 1st Qu. 63.40 62.00 62.18
## Median  68.20 69.65 68.25
## Mean    67.24 68.26 67.58
## 3rd Qu. 70.00 73.00 72.72
## Max.    86.00 82.00 81.00
sapply(split(data$final_mass, data$group), summary)
##             1     2     3
## Min.    50.60 52.00 51.00
## 1st Qu. 60.08 60.25 61.00
## Median  62.95 70.50 65.00
## Mean    62.43 68.64 65.25
## 3rd Qu. 64.62 75.00 72.00
## Max.    81.30 81.00 77.00
sapply(split(data$initial_mass, data$group), shapiro.test)
##           1                             2                            
## statistic 0.9561618                     0.9735745                    
## p.value   0.415793                      0.7920937                    
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           3                            
## statistic 0.9777148                    
## p.value   0.8768215                    
## method    "Shapiro-Wilk normality test"
## data.name "X[[i]]"
sapply(split(data$final_mass, data$group), shapiro.test)
##           1                             2                            
## statistic 0.9447748                     0.9231407                    
## p.value   0.2479696                     0.08832153                   
## method    "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]"                      "X[[i]]"                     
##           3                            
## statistic 0.9453135                    
## p.value   0.2543017                    
## method    "Shapiro-Wilk normality test"
## data.name "X[[i]]"
####################
#                  #
#    Exercise 2    #
#                  #
####################


invisible(sapply(split(data, data$group), function(x)
    {
      t.test(x$initial_mass, x$final_mass, paired=TRUE) -> t
      cat(sprintf("Group %d\r\nstatistic=%.3f\r\ndf=%d\r\np=%.3f\r\neta^2=%.3f\r\n\r\n",
              unique(x$group), t$statistic, t$parameter, t$p.value,
              t$statistic^2/(t$statistic^2+t$parameter)))

    }))
## Group 1

## statistic=7.474

## df=21

## p=0.000

## eta^2=0.727

## 

## Group 2

## statistic=-0.687

## df=21

## p=0.500

## eta^2=0.022

## 

## Group 3

## statistic=4.372

## df=21

## p=0.000

## eta^2=0.477

## 

####################
#                  #
#    Exercise 3    #
#                  #
####################

library("car")
leveneTest(data$final_mass, data$group, center=mean)
## Levene's Test for Homogeneity of Variance (center = mean)
##       Df F value Pr(>F)
## group  2   1.232 0.2986
##       63
####################
#                  #
#    Exercise 4    #
#                  #
####################

print(summary(aov(final_mass ~ group, data)) -> f)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## group        2    425  212.64   3.626 0.0323 *
## Residuals   63   3694   58.64                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ss = f[[1]]$'Sum Sq'
paste("eta squared=", round(ss[1] / (ss[1]+ss[2]), 3))
## [1] "eta squared= 0.103"
####################
#                  #
#    Exercise 5    #
#                  #
####################

summary(f <- aov(final_mass ~ group, data))
##             Df Sum Sq Mean Sq F value Pr(>F)  
## group        2    425  212.64   3.626 0.0323 *
## Residuals   63   3694   58.64                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(f, "group")
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = final_mass ~ group, data = data)
## 
## $group
##          diff        lwr       upr     p adj
## 2-1  6.209091  0.6669343 11.751248 0.0245055
## 3-1  2.818182 -2.7239748  8.360338 0.4455950
## 3-2 -3.390909 -8.9330657  2.151248 0.3128050
# significant difference appears between 1st and 2nd group (p<0.05)

####################
#                  #
#    Exercise 6    #
#                  #
####################

options(contrasts = c("contr.helmert", "contr.poly"))
m.lm <- lm(final_mass ~ age + group + age*group, data=data)
print(m.anova <- Anova(m.lm, type=3))
## Anova Table (Type III tests)
## 
## Response: final_mass
##             Sum Sq Df   F value    Pr(>F)    
## (Intercept) 267773  1 8536.0541 < 2.2e-16 ***
## age           1388  2   22.1282 7.725e-08 ***
## group          186  2    2.9678  0.059415 .  
## age:group      564  4    4.4981  0.003152 ** 
## Residuals     1788 57                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m.anova[[1]][2:4] / (m.anova[[1]][2:4]+m.anova[[1]][5])
## [1] 0.43707242 0.09431139 0.23992294
####################
#                  #
#    Exercise 7    #
#                  #
####################

m.aov <- aov(final_mass ~ age + group +  age*group, data)
TukeyHSD(x=m.aov, "age")
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = final_mass ~ age + group + age * group, data = data)
## 
## $age
##                       diff        lwr       upr     p adj
## old-middle-age     6.69775   2.382683 11.012817 0.0012494
## young-middle-age  -5.75200  -9.564155 -1.939845 0.0017300
## young-old        -12.44975 -16.764817 -8.134683 0.0000000
# there is significant difference between all groups

####################
#                  #
#    Exercise 8    #
#                  #
####################

library(lattice)
xyplot(initial_mass ~ final_mass | group, data = data, panel=function(x, y, ...)
  {
  panel.xyplot(x, y, ...)
  panel.lmline(x, y, ...)
})
Linearity between initial and final mass
####################
#                  #
#    Exercise 9    #
#                  #
####################

model.1 = lm(final_mass~initial_mass, data=data)
model.2 = lm(final_mass~initial_mass+group, data=data)
anova(model.1, model.2)
## Analysis of Variance Table
## 
## Model 1: final_mass ~ initial_mass
## Model 2: final_mass ~ initial_mass + group
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     64 746.24                                  
## 2     62 442.63  2    303.62 21.264 9.286e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# there is the difference between groups while controlling initial measurement (p < 0.05)

####################
#                  #
#    Exercise 10   #
#                  #
####################

model.3 = lm(final_mass~initial_mass+group, data=data)
library(heplots)
etasq(model.3, anova=TRUE)
## Anova Table (Type II tests)
## 
## Response: final_mass
##              Partial eta^2 Sum Sq Df F value    Pr(>F)    
## initial_mass       0.88019 3251.8  1 455.495 < 2.2e-16 ***
## group              0.40686  303.6  2  21.264 9.286e-08 ***
## Residuals                   442.6 62                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# independent variable (group) explains 41%, covariate (initial_mass) explains 88%



Data Structures Exercises

There are 5 important basic data structures in R: vector, matrix, array, list and dataframe. They can be 1-dimensional (vector and list), 2-dimensional (matrix and data frame) or multidimensional (array). They also differ according to homogeneity of elements they can contain: while all elements contained in vector, matrix and array must be of the same type, list and data frame can contain multiple types.

In this set of exercises we shall practice casting between different types of these data structures, together with some basic operations on them. You can find more about data structures on Advanced R – Data structures page.

Answers to the exercises are available here.

If you have different solution, feel free to post it.

Exercise 1

Create a vector named v which contains 10 random integer values between -100 and +100.

Exercise 2

Create a two-dimensional 5×5 array named a comprised of sequence of even integers greater than 25.

Create a list named s containing sequence of 20 capital letters, starting with ‘C’.

Exercise 3

Create a list named l and put all previously created objects in it. Name them a, b and c respectively. How many elements are there in the list? Show the structure of the list. Count all elements recursively.

Exercise 4

Without running commands in R, answer the following questions:

  1. what is the result of l[[3]]?
  2. How would you access random-th letter in the list element c?
  3. If you convert list l to a vector, what will be the type of it’s elements?
  4. Can this list be converted to an array? What will be the data type of elements in array?

Check the results with R.

Exercise 5

Remove letters from the list l. Convert the list l to a vector and check its class. Compare it with the result from exercise 4, question #3.

Exercise 6

Find the difference between elements in l[["a"]] and l[["b"]]. Find the intersection between them. Is there number 33 in their union?

Exercise 7

Create 5×5 matrix named m and fill it with random numeric values rounded to two decimal places, ranging from 1.00 to 100.00.

Exercise 8

Answer the following question without running R command, then check the result.

What will be the class of data structure if you convert matrix m to:

  • vector
  • list
  • data frame
  • array?

Exercise 9

Transpose array l$b and then convert it to matrix.

Exercise 10

Get union of matrix m and all elements in list l and sort it ascending.




Data Structures Solutions

Below are the solutions to these exercises on data structures.

####################
#                  #
#    Exercise 1    #
#                  #
####################

v <- sample(-100:100, 10, replace=TRUE)

####################
#                  #
#    Exercise 2    #
#                  #
####################

a <- array(seq(from = 26, length.out = 25, by = 2), c(5, 5))
s <- LETTERS[match("C", LETTERS):(match("C", LETTERS)+19)]

####################
#                  #
#    Exercise 3    #
#                  #
####################

l <- list(a = v, b = a, c = s)
length(l)
## [1] 3
str(l)
## List of 3
##  $ a: int [1:10] -83 72 -44 71 -54 -17 -40 -76 22 58
##  $ b: num [1:5, 1:5] 26 28 30 32 34 36 38 40 42 44 ...
##  $ c: chr [1:20] "C" "D" "E" "F" ...
length(unlist(l))
## [1] 55
####################
#                  #
#    Exercise 4    #
#                  #
####################

l[[3]]
##  [1] "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [18] "T" "U" "V"
l[[3]][sample(1:length(l[[3]]), 1)]
## [1] "O"
class(unlist(l))
## [1] "character"
x <- array(l)
class(x[1])
## [1] "list"
####################
#                  #
#    Exercise 5    #
#                  #
####################

l$c <- NULL
class(unlist(l))
## [1] "numeric"
####################
#                  #
#    Exercise 6    #
#                  #
####################

setdiff(l$a, l$b)
## [1] -83 -44  71 -54 -17 -40 -76  22
intersect(l$a, l$b)
## [1] 72 58
33 %in% union(l$a, l$b)
## [1] FALSE
####################
#                  #
#    Exercise 7    #
#                  #
####################

m <- matrix(data = round(runif(5*5, 0.99, 100.00), 2), nrow = 5)

####################
#                  #
#    Exercise 8    #
#                  #
####################

class(as.vector(m))
## [1] "numeric"
class(as.list(m))
## [1] "list"
class(as.data.frame(m))
## [1] "data.frame"
class(as.array(m))
## [1] "matrix"
####################
#                  #
#    Exercise 9    #
#                  #
####################

as.matrix(aperm(l$b))
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   26   28   30   32   34
## [2,]   36   38   40   42   44
## [3,]   46   48   50   52   54
## [4,]   56   58   60   62   64
## [5,]   66   68   70   72   74
####################
#                  #
#    Exercise 10   #
#                  #
####################

sort(union(as.vector(m), unlist(l)))
##  [1] -83.00 -76.00 -54.00 -44.00 -40.00 -17.00   8.02   9.58  10.41  10.46
## [11]  10.51  16.28  20.85  22.00  22.33  25.58  25.66  26.00  27.96  28.00
## [21]  28.07  30.00  32.00  34.00  36.00  37.02  38.00  38.36  40.00  42.00
## [31]  44.00  45.22  46.00  48.00  50.00  52.00  53.18  54.00  56.00  58.00
## [41]  60.00  62.00  64.00  66.00  67.11  68.00  70.00  71.00  72.00  73.88
## [51]  74.00  74.64  83.52  89.62  89.72  91.35  99.19  99.45



Student’s Achievement Research Project – Exercises

In this set of exercises we shall follow standard practice of conducting a research project. The goal of the research is to find the relationship between student’s preparations and his achievement on the final exam. Preparations are viewed as the amount of time student spends on preparatory classes and score in mathematics achieved in the final year of school.

Here is the data set.

Answers to the exercises are available here.

If you have different solution, feel free to post it.

Exercise 1

Load the data and check if the sample size is large enough for conducting multivariate linear regression? Tip: sample size is “large enough” if it is greater then 50 + 8 * m, where m is the number of predictor variables

Exercise 2

Calculate descriptive statistics for criterion variable. Was the final test appropriate for the level of knowledge of students? (Tip: you check it by checking skewness of distribution – we expect the distribution to be symmetric.)

Exercise 3

Do the students with good score in mathematics in final year differ from those with bed scores regarding the results on the final exam? Did the students with good score in mathematics in final year attend preparatory classes more than those with bed score?

Exercise 4

Calculate correlation matrix for three variables included in a model. Can we expect a multicolinearity problem? Does the correlation between predictor variables justify conducting multiple regression?

Exercise 5

Create multiple linear regression model m to check if number of preparatory classes and score in mathematics in the final year can explain the result on final test.

Exercise 6

Find and eliminate outliers from the data.

Exercise 7

Using the scatter plot, check for the linearity of residual of model m.

Exercise 8

Test the normality of residual of model m.

Exercise 9

  1. Is model m statistically significant on the level of 0.05?
  2. Which predictor variables significantly contribute to the explanation of criterion variable?

Exercise 10

Does introduction of gender as a predictor variable adds to the explanatory power of the model?




Student’s Achievement Research Project – Solutions

Below are the solutions to these exercises on conducting research project in a school.

####################
#                  #
#    Exercise 1    #
#                  #
####################

data <- read.csv2("school-research.csv")
nrow(data) > 50 + 8 * 2
## [1] TRUE
####################
#                  #
#    Exercise 2    #
#                  #
####################

summary(data$final_result)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   71.00   82.00   87.00   86.42   92.00   99.00
library("moments")
skewness(data$final_result)
## [1] -0.2864368
# curve is mildly skwewd to the left, which means that the test was a bit easier

####################
#                  #
#    Exercise 3    #
#                  #
####################

t.test(data$final_result, data$maths, alternative = "two.sided", paired=FALSE)
## 
## 	Welch Two Sample t-test
## 
## data:  data$final_result and data$maths
## t = 114.43, df = 80.88, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  84.48041 87.47021
## sample estimates:
##  mean of x  mean of y 
## 86.4197531  0.4444444
t.test(data$preparation, data$marths, alternative = "greater", paired=FALSE)
## 
## 	One Sample t-test
## 
## data:  data$preparation
## t = 22.985, df = 80, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  4.901382      Inf
## sample estimates:
## mean of x 
##  5.283951
####################
#                  #
#    Exercise 4    #
#                  #
####################

cor(data[c(1, 2, 4)], method="pearson")
##              final_result preparation     maths
## final_result    1.0000000   0.3900393 0.3629136
## preparation     0.3900393   1.0000000 0.4323024
## maths           0.3629136   0.4323024 1.0000000
# 1. no
# 2. yes, since the correlation is moderate

####################
#                  #
#    Exercise 5    #
#                  #
####################

m <- lm(data$final_result ~ data$preparation+data$maths)

####################
#                  #
#    Exercise 6    #
#                  #
####################

boxplot(data[c(1, 2)])$outliers
Outliers
## NULL
# there are no outliers

####################
#                  #
#    Exercise 7    #
#                  #
####################

plot(scale(m$fitted.values), scale(m$residuals))
Residual plot
# since there is no pattern, we conclude that relationship is linear

####################
#                  #
#    Exercise 8    #
#                  #
####################

shapiro.test(scale(m$residuals))$p.value > 0.05
## [1] TRUE
####################
#                  #
#    Exercise 9    #
#                  #
####################

f <- summary(m)$fstatistic
pf(f[1], f[2], f[3], lower.tail = F) < 0.05
## value 
##  TRUE
summary(m)$coefficients[c(2,3), 4] < 0.05
## data$preparation       data$maths 
##             TRUE             TRUE
####################
#                  #
#    Exercise 10   #
#                  #
####################

n <- lm(data$final_result ~ data$preparation+data$maths+data$gender)
f <- summary(n)$fstatistic
(summary(m)$adj.r.squared < summary(n)$adj.r.squared) && (pf(f[1], f[2], f[3], lower.tail = F) < 0.05)
## [1] TRUE



String Manipulation – Exercises

rope-1379561__340

In this set of exercises we will practice functions that enable us to manipulate strings.You can find more about string manipulation functions in Handling and Processing Strings in R e-book.

Answers to the exercises are available here.

If you have different solution, feel free to post it.

Exercise 1

Load text from the file and print it on screen. Text file contains excerpt from novel “Gambler” by Fyodor Dostoyevsky.

Exercise 2

How many paragraphs is there in the excerpt?

Exercise 3

How many characters is there in the excerpt

Exercise 4

Collapse paragraphs into one and display it on the screen (un-list it).

Exercise 5

Convert the text to uppercase and save it to new file “gambler-upper.txt”.

Exercise 6

Change all letters ‘a’ and ‘t’ to ‘A’ and ‘T’.

Exercise 7

Does the text contain word ‘lucky’?

Exercise 8

How many words are there in the excerpt, assuming that words are sub-strings separated by space or new line character?

Exercise 9

How many times is word money repeated in the excerpt?

Exercise 10

Ask the user to input two numbers, divide them and display both numbers and the result on the screen, each of them formatted to 2 decimal places.




String Manipulation – Solutions

Below are the solutions to these exercises on functions that are used to manipulate strings.

####################
#                  #
#    Exercise 1    #
#                  #
####################

gambler <- readLines("http://www.r-exercises.com/wp-content/uploads/2016/11/gambler.txt")
noquote(gambler)
## [1] At length I returned from two weeks leave of absence to find that my patrons had arrived three days ago in Roulettenberg. I received from them a welcome quite different to that which I had expected. The General eyed me coldly, greeted me in rather haughty fashion, and dismissed me to pay my respects to his sister. It was clear that from SOMEWHERE money had been acquired. I thought I could even detect a certain shamefacedness in the General's glance. Maria Philipovna, too, seemed distraught, and conversed with me with an air of detachment. Nevertheless, she took the money which I handed to her, counted it, and listened to what I had to tell. To luncheon there were expected that day a Monsieur Mezentsov, a French lady, and an Englishman; for, whenever money was in hand, a banquet in Muscovite style was always given. Polina Alexandrovna, on seeing me, inquired why I had been so long away. Then, without waiting for an answer, she departed. Evidently this was not mere accident, and I felt that I must throw some light upon matters. It was high time that I did so.                                                                                     
## [2] I was assigned a small room on the fourth floor of the hotel (for you must know that I belonged to the General's suite). So far as I could see, the party had already gained some notoriety in the place, which had come to look upon the General as a Russian nobleman of great wealth. Indeed, even before luncheon he charged me, among other things, to get two thousand-franc notes changed for him at the hotel counter, which put us in a position to be thought millionaires at all events for a week! Later, I was about to take Mischa and Nadia for a walk when a summons reached me from the staircase that I must attend the General. He began by deigning to inquire of me where I was going to take the children; and as he did so, I could see that he failed to look me in the eyes. He WANTED to do so, but each time was met by me with such a fixed, disrespectful stare that he desisted in confusion. In pompous language, however, which jumbled one sentence into another, and at length grew disconnected, he gave me to understand that I was to lead the children altogether away from the Casino, and out into the park. Finally his anger exploded, and he added sharply:
## [3] "I suppose you would like to take them to the Casino to play roulette? Well, excuse my speaking so plainly, but I know how addicted you are to gambling. Though I am not your mentor, nor wish to be, at least I have a right to require that you shall not actually compromise me."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
## [4] "I have no money for gambling," I quietly replied.
####################
#                  #
#    Exercise 2    #
#                  #
####################

length(gambler)
## [1] 4
####################
#                  #
#    Exercise 3    #
#                  #
####################

nchar(gambler)
## [1] 1073 1158  276   50
####################
#                  #
#    Exercise 4    #
#                  #
####################

t <- paste(gambler, collapse="\n")
cat(t)
## At length I returned from two weeks leave of absence to find that my patrons had arrived three days ago in Roulettenberg. I received from them a welcome quite different to that which I had expected. The General eyed me coldly, greeted me in rather haughty fashion, and dismissed me to pay my respects to his sister. It was clear that from SOMEWHERE money had been acquired. I thought I could even detect a certain shamefacedness in the General's glance. Maria Philipovna, too, seemed distraught, and conversed with me with an air of detachment. Nevertheless, she took the money which I handed to her, counted it, and listened to what I had to tell. To luncheon there were expected that day a Monsieur Mezentsov, a French lady, and an Englishman; for, whenever money was in hand, a banquet in Muscovite style was always given. Polina Alexandrovna, on seeing me, inquired why I had been so long away. Then, without waiting for an answer, she departed. Evidently this was not mere accident, and I felt that I must throw some light upon matters. It was high time that I did so.
## I was assigned a small room on the fourth floor of the hotel (for you must know that I belonged to the General's suite). So far as I could see, the party had already gained some notoriety in the place, which had come to look upon the General as a Russian nobleman of great wealth. Indeed, even before luncheon he charged me, among other things, to get two thousand-franc notes changed for him at the hotel counter, which put us in a position to be thought millionaires at all events for a week! Later, I was about to take Mischa and Nadia for a walk when a summons reached me from the staircase that I must attend the General. He began by deigning to inquire of me where I was going to take the children; and as he did so, I could see that he failed to look me in the eyes. He WANTED to do so, but each time was met by me with such a fixed, disrespectful stare that he desisted in confusion. In pompous language, however, which jumbled one sentence into another, and at length grew disconnected, he gave me to understand that I was to lead the children altogether away from the Casino, and out into the park. Finally his anger exploded, and he added sharply:
## "I suppose you would like to take them to the Casino to play roulette? Well, excuse my speaking so plainly, but I know how addicted you are to gambling. Though I am not your mentor, nor wish to be, at least I have a right to require that you shall not actually compromise me."
## "I have no money for gambling," I quietly replied.
####################
#                  #
#    Exercise 5    #
#                  #
####################

cat(toupper(gambler), file="gambler-output.txt")

####################
#                  #
#    Exercise 6    #
#                  #
####################

chartr("at", "AT", gambler)
## [1] "AT lengTh I reTurned from Two weeks leAve of Absence To find ThAT my pATrons hAd Arrived Three dAys Ago in RouleTTenberg. I received from Them A welcome quiTe differenT To ThAT which I hAd expecTed. The GenerAl eyed me coldly, greeTed me in rATher hAughTy fAshion, And dismissed me To pAy my respecTs To his sisTer. IT wAs cleAr ThAT from SOMEWHERE money hAd been Acquired. I ThoughT I could even deTecT A cerTAin shAmefAcedness in The GenerAl's glAnce. MAriA PhilipovnA, Too, seemed disTrAughT, And conversed wiTh me wiTh An Air of deTAchmenT. NeverTheless, she Took The money which I hAnded To her, counTed iT, And lisTened To whAT I hAd To Tell. To luncheon There were expecTed ThAT dAy A Monsieur MezenTsov, A French lAdy, And An EnglishmAn; for, whenever money wAs in hAnd, A bAnqueT in MuscoviTe sTyle wAs AlwAys given. PolinA AlexAndrovnA, on seeing me, inquired why I hAd been so long AwAy. Then, wiThouT wAiTing for An Answer, she depArTed. EvidenTly This wAs noT mere AccidenT, And I felT ThAT I musT Throw some lighT upon mATTers. IT wAs high Time ThAT I did so."                                                                                     
## [2] "I wAs Assigned A smAll room on The fourTh floor of The hoTel (for you musT know ThAT I belonged To The GenerAl's suiTe). So fAr As I could see, The pArTy hAd AlreAdy gAined some noTorieTy in The plAce, which hAd come To look upon The GenerAl As A RussiAn noblemAn of greAT weAlTh. Indeed, even before luncheon he chArged me, Among oTher Things, To geT Two ThousAnd-frAnc noTes chAnged for him AT The hoTel counTer, which puT us in A posiTion To be ThoughT millionAires AT All evenTs for A week! LATer, I wAs AbouT To TAke MischA And NAdiA for A wAlk when A summons reAched me from The sTAircAse ThAT I musT ATTend The GenerAl. He begAn by deigning To inquire of me where I wAs going To TAke The children; And As he did so, I could see ThAT he fAiled To look me in The eyes. He WANTED To do so, buT eAch Time wAs meT by me wiTh such A fixed, disrespecTful sTAre ThAT he desisTed in confusion. In pompous lAnguAge, however, which jumbled one senTence inTo AnoTher, And AT lengTh grew disconnecTed, he gAve me To undersTAnd ThAT I wAs To leAd The children AlTogeTher AwAy from The CAsino, And ouT inTo The pArk. FinAlly his Anger exploded, And he Added shArply:"
## [3] "\"I suppose you would like To TAke Them To The CAsino To plAy rouleTTe? Well, excuse my speAking so plAinly, buT I know how AddicTed you Are To gAmbling. Though I Am noT your menTor, nor wish To be, AT leAsT I hAve A righT To require ThAT you shAll noT AcTuAlly compromise me.\""                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
## [4] "\"I hAve no money for gAmbling,\" I quieTly replied."
####################
#                  #
#    Exercise 7    #
#                  #
####################

'lucky' %in% gambler
## [1] FALSE
####################
#                  #
#    Exercise 8    #
#                  #
####################

w <- strsplit(t, " ")
length(w[[1]])
## [1] 470
####################
#                  #
#    Exercise 9    #
#                  #
####################

sum(w[[1]][] == 'money')
## [1] 4
####################
#                  #
#    Exercise 10   #
#                  #
####################

numbers <- scan(n=2)
sprintf("%.2f / %.2f = %.2f", numbers[1], numbers[2], numbers[1]/numbers[2])
## [1] "1.00 / 6.00 = 0.17"



Nonparametric Tests Exercises

math-1500720_960_720

In this set of exercises you will be presented with real-life problems in marketing. Your task will be to choose appropriate nonparametric statistical technique and solve the problem using appropriate R functions.

Answers to the exercises are available here.

Exercise 1

A company wants to learn if sales income is equaly distributed among the stores. In order to test it, 8 stores were randomly selected. The sales figures are: 102, 300, 102, 100, 205, 105, 71 and 92 units of product.

Are the sales equaly distributed among the stores, on the level of significance of 95%?

Exercise 2

A company sells the same product in two types of stores: classical and self-service stores. The data about income earned in each type of store are as follows:

Classical stores: 50, 50, 60, 70, 75, 80, 90, 85

Self-service: 55, 75, 80, 90, 105, 65

On the level of significance of 95%, is there a difference in income among different types of stores?

Exercise 3

Accounting data for sales showed that in randomly selected 15 stores the quantities of products sold are:

509, 517, 502, 629, 830, 911, 847, 803, 727, 853, 757, 730, 774, 718, 904

Unsatisfied with those results, a company decided to start advertising campaign. After the campaign finished, the amount of products sold in these same stores were:

517, 508, 523, 730, 821, 940, 818, 821, 842, 842, 709, 688, 787, 780, 901

Did the advertizing campaign produce statistically significant results?

Exercise 4

One product is produced in white, blue and red color. Five stores were randomly selected in order to test, with the 5% risk of error, if the color influences the number of products sold. Data about sales are given in the following table:

Store White Blue Red
1. 510 925 730
2. 720 735 745
3. 930 753 875
4. 754 685 610
5. 105

Exercise 5

A TV station conducted surveys in March, April, May and June asking a number of it’s viewers about their satisfaction with the program in the previous month. The same viewers participated in all four surveys. You can download survey data here

Did the viewer’s satisfaction change during four months?

Tip: in order to conduct this test, you’ll need to install and use CVST library.

Exercise 6

A company conducted survey in order to learn about customer satisfaction with company’s service. Then, after improvement of the service, company conducted another survey on the same customers. The summary of two surveys is given in the following table:

Survey Satisfied Not satisfied
Before improvement 32 68
After improvement 48 52

Is there significant change in customer’s satisfaction due to the improvement of the service?

Exercise 7

A company conducted a survey in order to examine if the frequency of usage of company’s service depends on the size of the city where it’s clients live. The summary of survey is given in the following table:

City size Frequency of service usage
Always Sometime Never
Small 151 252 603
Medium 802 603 405
Large 753 55 408

Does the frequency of usage of company’s service depend on the size of the city?

Exercise 8

A company produces product A. It expect that demand for product B to rise. In order to make production plan, it wants to obtain the data about the consumption of two products in order to find the association between them. Thus, it conducted a survey, asking 100 randomly chosen consumers about the quantities of two products they consume. The data can be downloaded here.

How strong is the association between consumption of products A and B?

Exercise 9

A company produces several models of the same product. A survey which was conducted included 200 buyers who were asked about factor that had the strongest influence on their decision to buy a product. The following data summarizes the survey:

Characteristics Male Female
Price 301 502
Design 353 155
Color 558 153

On the level of significance of 95%, is there a difference between genders in regard to characteristics of product.

Exercise 10

Using data from the previous exercise, calculate the contingency coefficient as a measure of association between gender and product characteristics.




Nonparametric Tests Solutions

Below are the solutions to these exercises on nonparametric tests.

####################
#                  #
#    Exercise 1    #
#                  #
####################

# Chi-square goodness of fit
# H0: f1=f2=f3=f4=f5=f6=f7=f8
# H1: f1!=f2

chisq.test(c(102, 300, 102, 100, 205, 105, 71, 92))
## 
## 	Chi-squared test for given probabilities
## 
## data:  c(102, 300, 102, 100, 205, 105, 71, 92)
## X-squared = 314.74, df = 7, p-value < 2.2e-16
# p < 0.05 => Sales are not equaly distributed among the stores


####################
#                  #
#    Exercise 2    #
#                  #
####################

# Mann-Whitney test
# H0: Me1=Me2
# H1: Me1!=Me2

x <- c(50, 50, 60, 70, 75, 80, 90, 85)
y <- c(55, 75, 80, 90, 105, 65)

wilcox.test(x, y, correct=FALSE, paired=FALSE)
## Warning in wilcox.test.default(x, y, correct = FALSE, paired = FALSE):
## cannot compute exact p-value with ties
## 
## 	Wilcoxon rank sum test
## 
## data:  x and y
## W = 17.5, p-value = 0.3993
## alternative hypothesis: true location shift is not equal to 0
# p > 0.05 => H0 can't be rejected, i.e. the income doesn't depend on the type of store

####################
#                  #
#    Exercise 3    #
#                  #
####################

# Wilcoxon's paired test
# H0: S(+) >= S(-)
# H1: S(+) < S(-)

x <- c(509, 517, 502, 629, 830, 911, 847, 803, 727, 853, 757, 730, 774, 718, 904)
y <- c(517, 508, 523, 730, 821, 940, 818, 821, 842, 842, 709, 688, 787, 780, 901)

wilcox.test(x, y, correct=FALSE, paired=TRUE)
## Warning in wilcox.test.default(x, y, correct = FALSE, paired = TRUE):
## cannot compute exact p-value with ties
## 
## 	Wilcoxon signed rank test
## 
## data:  x and y
## V = 45.5, p-value = 0.41
## alternative hypothesis: true location shift is not equal to 0
# p > 0.05 => H0 can't be rejected, i.e. the campaign was not successful

####################
#                  #
#    Exercise 4    #
#                  #
####################

# Kruskal Wallis
# H0: Me1=Me2=Me3
# H1: Me1!=Me2!=Me3

x <- c(510, 720, 930, 754, 105)
y <- c(925, 735, 753, 685)
z <- c(730, 745, 875, 610)

kruskal.test(list(x, y, z))
## 
## 	Kruskal-Wallis rank sum test
## 
## data:  list(x, y, z)
## Kruskal-Wallis chi-squared = 0.47473, df = 2, p-value = 0.7887
# p > 0.05 => H0 can't be rejected, i.e. color doesn't influence the sales

####################
#                  #
#    Exercise 5    #
#                  #
####################

# Cochran's Q
# H0: Pi1=Pi2=Pi3
# H1: Pi1!=Pi2

library("CVST")

data <- read.csv("http://www.r-exercises.com/wp-content/uploads/2016/11/tv-station.csv")
cochranq.test(data)
## 
## 	Cochran's Q Test
## 
## data:  data
## Cochran's Q = 1.2, df = 3, p-value = 0.753
# p > 0.05 => H0 can't be rejected, i.e. there are no differences in satisfaction between four measures

####################
#                  #
#    Exercise 6    #
#                  #
####################

# McNemar's test
# H0: Pi1=Pi2
# H1: Pi1!=Pi2

satisfaction <- matrix(c(32, 68, 48, 52), nrow=2)
mcnemar.test(satisfaction)
## 
## 	McNemar's Chi-squared test with continuity correction
## 
## data:  satisfaction
## McNemar's chi-squared = 3.1121, df = 1, p-value = 0.07771
# p > 0.05 => H0> can't be rejected, i.e. there is no difference in satisfaction before and after improvement

####################
#                  #
#    Exercise 7    #
#                  #
####################

# Chi-square test for homogeneity 
# H0: oij=eij for all cells 

usage <- matrix(c(151, 802, 753, 252, 603, 55, 603, 404, 408), nrow=3)

chisq.test(usage)
## 
## 	Pearson's Chi-squared test
## 
## data:  usage
## X-squared = 822.12, df = 4, p-value < 2.2e-16
# p < 0.05 => H0 is rejected, i.e. there is significant influence of the city size to frequency of buying product

####################
#                  #
#    Exercise 8    #
#                  #
####################

data <- read.csv("http://www.r-exercises.com/wp-content/uploads/2016/11/ab-consumption.csv")
shapiro.test(as.numeric(data$A))
## 
## 	Shapiro-Wilk normality test
## 
## data:  as.numeric(data$A)
## W = 0.97147, p-value = 0.02867
shapiro.test(as.numeric(data$B))
## 
## 	Shapiro-Wilk normality test
## 
## data:  as.numeric(data$B)
## W = 0.97673, p-value = 0.07367
# since the data is not normaly distributed, we use Spearman's correlation coefficient

cor(data, method="spearman")
##            A          B
## A 1.00000000 0.03736654
## B 0.03736654 1.00000000
# practically, there is no correlation between consumption of products A and B

####################
#                  #
#    Exercise 9    #
#                  #
####################

# Chi-square test for independence
# H0: oij=eij for all cells

m <- matrix(data = c(301, 353, 558, 502, 155, 153), nrow = 3)
chisq.test(m)
## 
## 	Pearson's Chi-squared test
## 
## data:  m
## X-squared = 289.71, df = 2, p-value < 2.2e-16
# p < 0.05 => H0 is rejected i.e. gender influences the decision to buy

####################
#                  #
#    Exercise 10   #
#                  #
####################

h <- chisq.test(m)$statistic
n <- sum(m)
cobs <- sqrt(h/(h+n))
cobs
## X-squared 
## 0.3540099
r <- NROW(m)
c <- NCOL(m)
cmax <- ((r-1)/r*(c-1)/c)^(1/4)
cobs/cmax
## X-squared 
## 0.4659032