Data Structures Exercises

There are 5 important basic data structures in R: vector, matrix, array, list and dataframe. They can be 1-dimensional (vector and list), 2-dimensional (matrix and data frame) or multidimensional (array). They also differ according to homogeneity of elements they can contain: while all elements contained in vector, matrix and array must be of the same type, list and data frame can contain multiple types.

In this set of exercises we shall practice casting between different types of these data structures, together with some basic operations on them. You can find more about data structures on Advanced R – Data structures page.

Answers to the exercises are available here.

If you have different solution, feel free to post it.

Exercise 1

Create a vector named v which contains 10 random integer values between -100 and +100.

Exercise 2

Create a two-dimensional 5×5 array named a comprised of sequence of even integers greater than 25.

Create a list named s containing sequence of 20 capital letters, starting with ‘C’.

Exercise 3

Create a list named l and put all previously created objects in it. Name them a, b and c respectively. How many elements are there in the list? Show the structure of the list. Count all elements recursively.

Exercise 4

Without running commands in R, answer the following questions:

  1. what is the result of l[[3]]?
  2. How would you access random-th letter in the list element c?
  3. If you convert list l to a vector, what will be the type of it’s elements?
  4. Can this list be converted to an array? What will be the data type of elements in array?

Check the results with R.

Exercise 5

Remove letters from the list l. Convert the list l to a vector and check its class. Compare it with the result from exercise 4, question #3.

Exercise 6

Find the difference between elements in l[["a"]] and l[["b"]]. Find the intersection between them. Is there number 33 in their union?

Exercise 7

Create 5×5 matrix named m and fill it with random numeric values rounded to two decimal places, ranging from 1.00 to 100.00.

Exercise 8

Answer the following question without running R command, then check the result.

What will be the class of data structure if you convert matrix m to:

  • vector
  • list
  • data frame
  • array?

Exercise 9

Transpose array l$b and then convert it to matrix.

Exercise 10

Get union of matrix m and all elements in list l and sort it ascending.




Data Science for Doctors – Part 4 : Inferential Statistics (1/5)

Data science enhances people’s decision making. Doctors and researchers are making critical decisions every day. Therefore, it is absolutely necessary for those people to have some basic knowledge of data science. This series aims to help people that are around medical field to enhance their data science skills.

We will work with a health related database the famous “Pima Indians Diabetes Database”. It was generously donated by Vincent Sigillito from Johns Hopkins University. Please find further information regarding the dataset here.

This is the fourth part of the series and it aims to cover partially the subject of Inferential statistics. Researchers rarely have the capability of testing many patients,or experimenting a new treatment to many patients, therefore making inferences out of a sample is a necessary skill to have. This is where inferential statistics comes into play.

Before proceeding, it might be helpful to look over the help pages for the sample, mean, sd , sort, pnorm. Moreover it is crucial to be familiar with the Central Limit Theorem.

You also may need to load the ggplot2 library.
install.packages("moments")
library(moments)

Please run the code below in order to load the data set and transform it into a proper data frame format:

url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
data <- read.table(url, fileEncoding="UTF-8", sep=",")
names <- c('preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class')
colnames(data) <- names
data <- data[-which(data$mass ==0),]

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1

Generate (10000 iterations) a sampling distribution of sample size 50, for the variable mass.

You are encouraged to experiment with different sample sizes and iterations in order to see the impact that they have to the distribution. (standard deviation, skewness, and kurtosis) Moreover you can plot the distributions to have a better perception of what you are working on.

Exercise 2

Find the mean and standard error (standard deviation) of the sampling distribution.

You are encouraged to use the values from the original distribution (data$mass) in order to comprehend how you derive the mean and standard deviation as well as the importance that the sample size has to the distribution.

Exercise 3

Find the of the skewness and kurtosis of the distribution you generated before.

Exercise 4

Suppose that we made an experiment and we took a sample of size 50 from the population and they followed an organic food diet. Their average mass was 30.5. What is the Z score for a mean of 30.5?

Exercise 5

What is the probability of drawing a sample of 50 with mean less than 30.5? Use the the z-table if you feel you need to.

Exercise 6

Suppose that you did the experiment again but to a larger sample size of 150 and you found the average mass to be 31. Compute the z score for this mean.

Exercise 7

What is the probability of drawing a sample of 150 with mean less than 31?

Exercise 8

If everybody would adopt the diet of the experiment. Find the margin of error for the 95% of sample means.

Exercise 9

What would be our interval estimate that 95% likely contains what this population mean would be if everyone in our population would start adopting the organic diet.

Exercise 10

Find the interval estimate for 98% and 99% likelihood.




Building Shiny App exercises part 7

Connect widgets & plots

In the seventh part of our journey we are ready to connect more of the widgets we created before with our k-means plot in order to totally control its output. Of cousre we will also reform the plot itself properly in order to make it a real k-means plot.
Read the examples below to understand the logic of what we are going to do and then test yous skills with the exercise set we prepared for you. Lets begin!

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

First of all let’s move the widgets we are going to use from the sidebarPanel into the mainPanel and specifically under our plot.

Learn more about Shiny in the online course R Shiny Interactive Web Apps – Next Level Data Visualization. In this course you will learn how to create advanced Shiny web apps; embed video, pdfs and images; add focus and zooming tools; and many other functionalities (30 lectures, 3hrs.).

Exercise 1

Remove the textInput from your server.R file. Then place the checkboxGroupInput and the selectInput in the same row with the sliderInput. Name them “Variable X” and “Variable Y” respectively. HINT: Use fluidrow and column.

Create a reactive expression

Reactive expressions are expressions that can read reactive values and call other reactive expressions. Whenever a reactive value changes, any reactive expressions that depended on it are marked as “invalidated” and will automatically re-execute if necessary. If a reactive expression is marked as invalidated, any other reactive expressions that recently called it are also marked as invalidated. In this way, invalidations ripple through the expressions that depend on each other.
The reactive expression is activated like this: example <- reactive({ })

Exercise 2

Place a reactive expression in server.R, at any spot except inside output$All and name it “Data”. HINT: Use reactive

Connect your dataset’s variables with your widgets.

Now let’s connect your selectInput with the variables of your dataset as in the example below.

#ui.R
library(shiny)
shinyUI(fluidPage(
titlePanel("Shiny App"),

sidebarLayout(
sidebarPanel(h2(“Menu”),
selectInput(‘ycol’, ‘Y Variable’, names(iris)) ),
mainPanel(h1(“Main”)
)
)
))
#server.R
shinyServer(function(input, output) {
example <- reactive({
iris[, c(input$ycol)]
})
})

Exercise 3

Put the variables of the iris dataset as inputs in your selectInput as “Variable Y” . HINT: Use names.

Exercise 4

Do the same for checkboxGroupInput and “Variable X”. HINT: Use names.

Select the fourth variabale as default like the example below.

#ui.R
library(shiny)
shinyUI(fluidPage(
titlePanel("Shiny App"),

sidebarLayout(
sidebarPanel(h2(“Menu”),
checkboxGroupInput(“xcol”, “Variable X”,names(iris),
selected=names(iris)[[4]]),
selectInput(“ycol”, “Y Variable”, names(iris),
selected=names(iris)[[4]])
),
mainPanel(h1(“Main”)
)
)
))
#server.R
shinyServer(function(input, output) {
example <- reactive({
iris[, c(input$xcol,input$ycol)
]
})
})

Exercise 5

Make the second variable the default choise for both widgets. HINT: Use selected.

Now follow the example below to create a new function and place there the automated function for k means calculation.

#ui.R
library(shiny)
shinyUI(fluidPage(
titlePanel("Shiny App"),

sidebarLayout(
sidebarPanel(h2(“Menu”),
checkboxGroupInput(“xcol”, “Variable X”,names(iris),
selected=names(iris)[[4]]),
selectInput(“ycol”, “Y Variable”, names(iris),
selected=names(iris)[[4]])
),
mainPanel(h1(“Main”)
)
)
))
#server.R
shinyServer(function(input, output) {
example <- reactive({
iris[, c(input$xcol,input$ycol)
]
})
example2 <- reactive({
kmeans(example())
})
})

Exercise 6

Create the reactive function Clusters and put in there the function kmeans which will be applied on the function Data. HINT: Use reactive.

Connect your plot with the widgets.

It is time to connect your plot with the widgets.

Exercise 7

Put Data inside renderPlot as first argument replacing the data that you have chosen to be plotted until now. Moreover delete xlab and ylab.

Improve your k-means visualiztion.

You gan change automatically the colours of your clusters by copying and pasting this part of code as first argument of renderPlot before the plot function:

palette(c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3",
"#FF7F00", "#FFFF33", "#A65628", "#F781BF", "#999999"))

We will choose to have up to nine clusters so we choose nine colours.

Exercise 8

Set min of your sliderInput to 1, max to 9 and value to 4 and use the palette function to give colours.

This is how you can give different colors to your clusters. To activate these colors put this part of code into your plot function.

col = Clusters()$cluster,

Exercise 9

Activate the palette function.

To make your clusters easily foundable you can fully color them by adding into plot function this:
pch = 20, cex = 3

Exercise 10

Fully color the points of your plot.




Data Science for Doctors – Part 3 : Distributions

Data science enhances people’s decision making. Doctors and researchers are making critical decisions every day. Therefore, it is absolutely necessary for those people to have some basic knowledge of data science. This series aims to help people that are around medical field to enhance their data science skills.

This is the third part of the series, it will contain the main distributions that you will use most of the time. This part is created in order to make sure that you have (or will have after solving this set of exercises) the knowledge for the next parts to come. The distributions that we will see are:

1)Binomial Distribution: The binomial distribution fits to repeated trials each with a dichotomous outcome such as success-failure, healthy-disease, heads-tails.

2)Normal Distribution: It is the most famous distribution, it is also assumed for many gene expression values.

3)T-Distribution: The T-distribution has many useful applications for testing hypotheses when the sample size is lower than thirty.

4)Chi-squared Distribution: The chi-squared distribution plays an important role in testing hypotheses about frequencies.

5)F-Distribution: The F-distribution is important for testing the equality of two variances.

Before proceeding, it might be helpful to look over the help pages for the choose, dbinom, pbinom , rbinom, qbinom,pnorm, qnorm, rnorm, dnorm,pchisq, qchisq, dchisq, df, pf, df.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1

Let X be binomially distributed with n = 100 and p = 0.3.Compute the following:
a) P(X = 34), P(X ≥ 34), and P(X ≤ 34)
b) P(30 ≤ X ≤ 60)
c) The quantiles x0.025, and x0.975

Exercise 2

Let X be normally distributed with mean = 3 and standard deviation = 1.Compute the following:
a) P(X 2),P(2 ≤ X ≤ 4)
b) The quantiles x0.025, x0.5and x0.975.

Exercise 3

Let T8 distribution.Compute the following:
a)P(T8 < 1), P(T8 > 2), P(-1 < T8 < 1).
b)The quantiles t0.025, t0.5, and t0.975. Can you justify the values of the quantiles?

Exercise 4

Compute the following for the chi-squared distribution with 5 degrees of freedom:
a) P(X25<2), P(X25>4),P(4<X25<6).
b) The quantiles g0.025, g0.5, and g0.975.

Exercise 5

Compute the following for the F6,3 distribution:
a)P(F6,3 < 2), P(F6,3 > 3), P(1 < F6,3 < 4).
b)The quantiles f0.025, f0.5, and f0.975.

Exercise 6

Generate 100 observations following binomial distribution and plot them(if possible at the same plot):
a) n = 20, p = 0.3
b) n = 20, p = 0.5
c) n = 20, p = 0.7

Exercise 7

Generate 100 observations following normal distribution and plot them(if possible at the same plot):
a) standard normal distribution ( N(0,1) )
b) mean = 0, s = 3
c) mean = 0, s = 7

Exercise 8

Generate 100 observations following T distribution and plot them(if possible at the same plot):
a) df = 5
b) df = 10
c) df = 25

Exercise 9

Generate 100 observations following chi-squared distribution and plot them(if possible at the same plot):
a) df = 5
b) df = 10
c) df = 25

Exercise 10

Generate 100 observations following F distribution and plot them(if possible at the same plot):
a) df1 = 3, df2 = 9
b) df1 = 9, df2 = 3
c) df1 = 15, df2 = 15




Building Shiny App exercises part 6

RENDER FUNCTIONS

In the sixth part of our series we will talk about the renderPlot and the renderUI function and then we will be ready to create our first visualization. (Find part 1-5 here).
We are going to create a simple interactive scatterplot that will help us see the clusters that are created when we run the k-means algorithm on our dataset. Read the examples below to understand how to activate a renderPlot function and the test yous skills with the exercise set we prepared for you. Lets begin!

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

DESCRIPTIVE STATISTICS

As in every statistical application it is wise to apply descriptive statistics on your dataset and also provide this information to user in an easy-readable way. So, first of all we will place a Data Table inside the “SUMMARY” tabPanel. The example below can be your guide.

#ui.R
library(shiny)
shinyUI(fluidPage(
sidebarLayout(
sidebarPanel(
),
mainPanel(
dataTableOutput("Table")
)
)))

#server.R
shinyServer(function(input, output, session) {
sum<-as.data.frame.array(summary(iris))
output$Table <- renderDataTable(sum)
})

Learn more about Shiny in the online course R Shiny Interactive Web Apps – Next Level Data Visualization. In this course you will learn how to create advanced Shiny web apps; embed video, pdfs and images; add focus and zooming tools; and many other functionalities (30 lectures, 3hrs.).

Exercise 1

Create a Data Table(“Table2”) with the descriptive statistics of your dataset. HINT: Use summary, as.data.frame.array and renderDataTable.

renderPlot

The renderPlot function enders a reactive plot that is suitable for assigning to an output slot. The general form of the function that generates the plot is below:

renderPlot(expr, width = "auto", height = "auto", res = 72, ...,
env = parent.frame(), quoted = FALSE, execOnResize = FALSE,
outputArgs = list())

The example below shows you how to create a simple scatterplot between two variables of the iris dataset(“Sepal Length” and “Sepal Width”).

# ui.R
library(shiny)
shinyUI(fluidPage(
sidebarLayout(
sidebarPanel(
),
mainPanel(
plotOutput("plot1")
)
)))

#server.R
shinyServer(function(input, output, session) {
output$plot1 <- renderPlot({
plot(iris$Sepal.Length,iris$Sepal.Width)
})
})

Initially remove renderImage and radioButtons from the tabPanel “K means”.

Exercise 2

Add a scatterplot inside the tabPanel “K Means” between two variables of the iris dataset.

INTERACTIVE PLOTS

Shiny has built-in support for interacting with static plots generated by R’s base graphics functions,this makes it easy to add features like selecting points and regions, as well as zooming in and out of images.
To get the position of the mouse when a plot is clicked, you simply need to use the click option with the plotOutput. For example, this app will print out the x and y coordinate position of the mouse cursor when a click occurs.

#ui.R
library(shiny)
shinyUI(fluidPage(
sidebarLayout(
sidebarPanel(),
mainPanel(
plotOutput("plot1", click = "plot_click"),
verbatimTextOutput("info")
)
)))

#server.R
shinyServer(function(input, output, session) {
output$plot1 <- renderPlot({
plot(iris$Sepal.Length,iris$Sepal.Width)
})
output$info <- renderText({
paste0("x=", input$plot_click$x, "\ny=", input$plot_click$y)
})
})

Exercise 3

Add click inside the plotOutput you just created. Name it “mouse”.

Exercise 4

Add a verbatimTextOutput inside the “K Means” tabPanel,under the plotOutput you created before. Name it “coord”.

Exercise 5

Make “x” and “y” coordinates appear in the pre-tag you just created. HINT : Use renderText and paste0 and do not forget to activate it with the submitButton.

Exercise 6

Set height = “auto” and width = “auto”.

PLOT ANNOTATION

This function can be used to add labels to a plot. Its first four principal arguments can also be used as arguments in most high-level plotting functions. They must be of type character or expression. In the latter case, quite a bit of mathematical notation is available such as sub- and superscripts, greek letters, fraction, etc.
title(main = NULL, sub = NULL, xlab = NULL, ylab = NULL,
line = NA, outer = FALSE, ...)

Look at the example below:
# ui.R
library(shiny)
shinyUI(fluidPage(
sidebarLayout(
sidebarPanel(),
mainPanel(
plotOutput("plot1", click = "plot_click"),
verbatimTextOutput("info")
)
)))

#server.R
shinyServer(function(input, output, session) {
output$plot1 <- renderPlot({
plot(iris$Sepal.Length,iris$Sepal.Width,main = "SCATTER PLOT",sub = "K Means",xlab="Sepal Length",ylab = "Sepal Width")
})
output$info <- renderText({
paste0("x=", input$plot_click$x, "\ny=", input$plot_click$y)
})
})

Exercise 7

Set scatterplot title to “K-Means”, the X-axis label to “Petal Length” and the Y-axis label to “Petal Width”. HINT: Use main,xlab,ylab.

You can also modify and set other graphical parameters related to the title and subtitle like the example below:

# ui.R
library(shiny)
shinyUI(fluidPage(
sidebarLayout(
sidebarPanel(),
mainPanel(
plotOutput("plot1", click = "plot_click"),
verbatimTextOutput("info")
)
)))

#server.R
shinyServer(function(input, output, session) {
output$plot1 <- renderPlot({
plot(iris$Sepal.Length,iris$Sepal.Width,main = "SCATTER PLOT",sub = "K Means",xlab="Sepal Length",ylab = "Sepal Width",
cex.main = 3, font.main= 5, col.main= "green",
cex.sub = 0.65, font.sub = 4, col.sub = "orange")
})
output$info <- renderText({
paste0("x=", input$plot_click$x, "\ny=", input$plot_click$y)
})
})

Exercise 8

Give values to the rest of the graphical parameters of the title like the example above and get used to them. HINT: Use cex.main, font.main and col.main.

renderUI

renderUI(expr, env = parent.frame(), quoted = FALSE, outputArgs = list())

Makes a reactive version of a function that generates HTML using the Shiny UI library. As you can see in the example below this expression returns a tag object.

# ui.R
library(shiny)
shinyUI(fluidPage(
sidebarLayout(
sidebarPanel( uiOutput("Controls")),
mainPanel(
plotOutput("plot1", click = "plot_click"),
verbatimTextOutput("info")
)
)))

#server.R
shinyServer(function(input, output, session) {
output$plot1 <- renderPlot({
plot(iris$Sepal.Length,iris$Sepal.Width,main = "SCATTER PLOT",sub = "K Means",xlab="Sepal Length",ylab = "Sepal Width",
cex.main = 2, font.main= 4, col.main= "blue",
cex.sub = 0.75, font.sub = 3, col.sub = "red")
})
output$info <- renderText({
paste0("x=", input$plot_click$x, "\ny=", input$plot_click$y)
})
output$Controls <- renderUI({
tagList(
sliderInput("n", "N", 1, 1000, 500),
textInput("label", "Label")
)
})
})

Exercise 9

Put a uiOutput inside tabPanel “K-Means” and name it “All”. Then create its output in server.R with a tagList into it. HINT: Use uiOutput, renderUI and tagList.

Exercise 10

Remove the submitButton and move the sliderInput and the textOutput from the ui.R into the tagList.




Data Science for Doctors – Part 2 : Descriptive Statistics

Data science enhances people’s decision making. Doctors and researchers are making critical decisions every day. Therefore, it is absolutely necessary for those people to have some basic knowledge of data science. This series aims to help people that are around medical field to enhance their data science skills.

We will work with a health related database the famous “Pima Indians Diabetes Database”. It was generously donated by Vincent Sigillito from Johns Hopkins University. Please find further information regarding the dataset here.

This is the second part of the series, it will contain the main descriptive statistics measures you will use most of the time. Those measures are divided in measures of central tendency and measures of spread. Moreover, most of the exercises can be solved with built-in functions, but I would encourage you to solve them “by hand”, because once you know the mechanics of the measures, then you are way more confident on using those measures. On the “solutions” page, I have both methods, so even if you didn’t solve them by hand, it would be nice if you check them out.

Before proceeding, it might be helpful to look over the help pages for the mean, median, sort , unique, tabulate, sd, var, IQR, mad, abs, cov, cor, summary, str, rcorr.

You also may need to load the Hmisc library.
install.packages('Hmisc')
library(Hmisc)

In case you haven’t solve the part 1, run the following script to load the prerequisites for this part.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1

Find the mean of the mass variable.

Exercise 2

Find the median of the mass variable.

Exercise 3

Find the mode of the mass.

Exercise 4

Find the standard deviation of the age variable.

Learn more about descriptive statistics in the online courses Learn by Example: Statistics and Data Science in R (including 8 lectures specifically on descriptive statistics), and Introduction to R.

Exercise 5

Find the variance of the mass variable.

Unlike the popular mean/standard deviation combination,interquartile range and median/mean absolute deviation are not sensitive to the presence of outliers. Even though it is recommended to go for MAD because they can approximate the standard deviation.

Exercise 6

Find the interquartile range of the age variable.

Exercise 7

Find the median absolute deviation of age variable. Assume that the age follows a normal distribution.

Exercise 8
Find the covariance of the variables age, mass.

Exercise 9

Find the spearman and pearson correlations of the variables age, mass.

Exercise 10

Print the summary statistics, and the structure of the data set. Moreover construct the correlation matrix of the data set.




Multipanel Graphics in R (part 1)

Multipanel Graphics in RIn many situations, we require that several plots are placed in the same figure as subplots. R has various ways of doing it. Base Graphics has three different ways to draw subplots, i.e. mfrow, layout and split.screen, with increasing degree of complexity, and, at the same time, with increased control over the plot elements. This example introduces the mfrowmfcol and layout functions in Base Graphics. We use the familiar iris dataset for the illustrations.

Answers to the exercises are available here.If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Consider the iris dataset, draw the following scatterplots, a) Sepal.Length vs Sepal.Width, b) Sepal.Length vs Petal.Length , and c) Sepal.Length vs Petal.Width . Annotate each scatterplot with a title. Use separate colors and plotting characters for each plot.

Exercise 2
Plot the three scatterplots in the same figure as subplots arranged in one row. Use mfrow.

Exercise 3
Plot the three scatterplots in the same figure as subplots arranged in one column. Use mfrow .

Exercise 4
Repeat the same scatterplots. Partition in such a way that the first row contains plots a and b, and the second row contain plot c. Use mfrow.

Exercise 5
Repeat Exercise 2 with mfcol.

Exercise 6
Repeat Exercise 3 with mfcol.

Exercise 7
Repeat Exercise 4 with mfcol.

Exercise 8
Repeat Exercise 2 with layout.

Exercise 9
Repeat Exercise 3 with layout.

Exercise 10
Repeat Exercise 4 with layout. In this case, let scatterplot c occupy the second row completely.




Data Science for Doctors – Part 1 : Data Display

Data science enhances people’s decision making. Doctors and researchers are making critical decisions every day. Therefore, it is absolutely necessary for those people to have some basic knowledge of data science. This series aims to help people that are around medical field to enhance their data science skills.

We will work with a health related database the famous “Pima Indians Diabetes Database”. It was generously donated by Vincent Sigillito from Johns Hopkins University. Please find further information regarding the dataset here.

This is the first part of the series, it is going to be about data display.

Before proceeding, it might be helpful to look over the help pages for the table, pie, geom_bar , coord_polar, barplot, stripchart, geom_jitter, density, geom_density, hist, geom_histogram, boxplot, geom_boxplot, qqnorm, qqline, geom_point, plot, qqline, geom_point .

You also may need to load the ggplot2 library.
install.packages('ggplot2')
library(ggplot)

Please run the code below in order to load the data set and transform it into a proper data frame format:

url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
data <- read.table(url, fileEncoding="UTF-8", sep=",")
names <- c('preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class')
colnames(data) <- names

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1

Create a frequency table of the class variable.

Exercise 2

class.fac <- factor(data[['class']],levels=c(0,1), labels= c("Negative","Positive"))

Create a pie chart of the class.fac variable.

Exercise 3

Create a bar plot for the age variable.

Exercise 4

Create a strip chart for the mass against class.fac.

Exercise 5

Create a density plot for the preg variable.

Exercise 6

Create a histogram for the preg variable.

Exercise 7

Create a boxplot for the age against class.fac.

Exercise 8

Create a normal QQ plot and a line which passes through the first and third quartiles.

Exercise 9

Create a scatter plot for the variables age against the mass variable .

Exercise 10

Create scatter plots for every variable of the data set against every variable of the data set on a single window.
hint: it is quite simple, don’t overthink about it.




Building Shiny App Exercises (part 5)

RENDER FUNCTIONS
In the fourth part of our series we just “scratched the surface” of reactivity by analyzing some of the properties of the renderTable function.
Now it is time to get deeper and learn how to use the rest of the render functions that shiny provides. As you were told in part 4 these are:

renderImage
renderPlot
renderPrint
renderText
renderUI

Below you will see the functionality of three of them (renderImage, renderText and renderPrint) and then we will be ready to use those of them that match our needs in the next parts, just like the widgets and give a specific form to our application. As you will probably understand, when reading this part our aim is to perform several statistical analyses on our dataset. We will start by creating a K-Means tabPanel.

Follow the examples to understand the logic of the tools you are going to use and then enhance the app you started creating in part 1 by practising with the exercise set we prepared for you. Lets begin!

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

renderImage
Sending pre-rendered images with renderImage.
These are images saved as a link to a source file. If your Shiny app has pre-rendered images saved in a subdirectory, you can send them using renderImage. Suppose the images are in the subdirectory “www”/, and are named “image1.png”, “image2.png”, and so on. The following code would send the appropriate image, depending on the value of input$n:

# ui.R
library(shiny)
shinyUI(fluidPage(
titlePanel("RenderImage"),
sidebarLayout(
sidebarPanel(
radioButtons("n", label = h4("Radio Buttons"),
choices = list("Choice 1" = 1, "Choice 2" = 2),
selected = 2)
),
mainPanel(
imageOutput("Image")
)
)

))
#server.R

shinyServer(function(input, output, session) {
# Send a pre-rendered image, and don't delete the image after sending it
output$Image <- renderImage({
# When input$n is 3, filename is ./images/image3.jpeg
filename <- normalizePath(file.path('./www',
paste('image', input$n, '.png', sep='')))

# Return a list containining the filename
list(src = filename)

}, deleteFile = FALSE)
})

Now let’s break down what the code above exactly does. First of all as we saw in part 1 you should save your images in a subdirectory called “www” inside the directory that you work. Let’s say you save 2 images and you name them “image1” and “image2”. As you can see we use radioButtonshere to select which one of the two we want to be displayed. The filename contains the exact output path of the images while the list contains the filename along with some other values.
In this example, deleteFile is FALSE because we don’t want Shiny to delete an image after sending it.

Learn more about Shiny in the online course R Shiny Interactive Web Apps – Next Level Data Visualization. In this course you will learn how to create advanced Shiny web apps; embed video, pdfs and images; add focus and zooming tools; and many other functionalities (30 lectures, 3hrs.).

Exercise 1

Place a tabPanel in the tabsetPanel of your Shiny App. Name it “K Means”.

Exercise 2

Move the radioButtons from the sidebarPanel inside the tabPanel “K Means” you just created and name it “Select Image”. Also, move the submitButton from the sidebarPanel to the tabPanel “K Means” without title.

Exercise 3

Place an imageOutput inside the tabPanel “K Means” with name “Image” (ui.R) and the reactive function of it (server.R). Still nothing happens. HINT: Use renderImage.

Create a subdirectory inside the directory you work and name it “images”. Put there two images with names “pic1” and “pic2” respectively and .png ending.

Exercise 4

Now create the filename. Follow the example above to create the right path. Do not forget to connect it with the radioButtons. Two steps left.

Exercise 5

Now it is time to set deleteFile = “FALSE”.

Exercise 6

Create the list that contains the filename.

Exercise 7

Set width = 300 and height = 200 into the list.

renderText-renderPrint

The example below shows how the renderText works.

#ui.R
library(shiny)
shinyUI(fluidPage(
titlePanel("RenderImage"),
sidebarLayout(
sidebarPanel(
sliderInput("slider1", label = h4("Sliders"),
min =3 , max = 10, value =3)
),
mainPanel(
textOutput("text1")
)
)
))

#server.R
shinyServer(function(input, output, session) {

output$text1 <- renderText({
paste(“You have selected”, input$slider1,”clusters”)
})
})

The code above takes a numeric value from the sliderInput and puts it in the exact place of our sentence in the mainPanel.

Before proceeding to the next exercise move the sliderInput from the sidebarPanel just after the imageOutput in the tabPanel “K Means”. Then change its name to “Clusters”, its min to 3, its max to 10 and value to 3.

Exercise 8

Put the textOutpout named “text1” inside your tabPanel exactly after the sliderInput, then place its reactive function inside server.R using renderText.

Exercise 9

Display the reactive output by putting inside the renderText function the sentence “You have selected”,(?),”clusters.” HINT : Use paste.

Exercise 10

Follow exactly the same steps but this time instead of renderText use renderPrint and note the difference.




Let’s get started with dplyr

The dplyr package by Hadley Wickham is a very useful package that provides “A Grammar of Data Manipulation”. It aims to simplify common data manipulation tasks, and provides “verbs”, i.e. functions that correspond to the most common data manipulation tasks. Have fun playing with dplyr in the exercises below!

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Install and load the package dplyr package. Given the metadata:

Wt: weight of the subject (kg).

Dose: dose of theophylline administered orally to the subject (mg/kg).

Time: time since drug administration when the sample was drawn (hr).

conc: theophylline concentration in the sample (mg/L).

Copy and paste this code to get df

df=data.frame(Theoph)
library(dplyr)

Exercise 2

Use the names() function to get the column names of df.

Exercise 3

Let’s practice using the select() function. This allows you to work with just column names instead of indices.
a) Select only the columns starting from Subject to Dose
b) Only select the Wt and Dose columns now.

Learn more about dplyr in Section 5 Using dplyr on one and multiple Datasets of the online course R Data Pre-Processing & Data Management – Shape your Data! Rated 4.6 / 5 (45 ratings) 473 students enrolled

Exercise 4
Let’s look at the sample with Dose greater than 5 mg/kg. Use the filter command() to return df with Dose>5′

Exercise 5

Great. Now use filter command to return df with Dose>5 and Time greater than the mean Time.

Exercise 6

Now let’s try sorting the data. Use the arrange() function to
1) arrange df by weight (descending)
2) arrange df by weight (ascending)
3) arrange df by weight (ascending) and Time (descending)

Exercise 7

The mutate() command allows you to create a new column using conditions and data derived from other columns. Use mutate() command to create a new column called trend that equals to Time-mean(Time). This will tell you how far each time value is from its mean. Set na.rm=TRUE.

Exercise 8

Given the meta-data

76.2 kg Super-middleweight
72.57 kg Middleweight
69.85 kg Light-middleweight
66.68 kg Welterweight

Use the mutate function to classify the weight using the information above. For the purpose of this exercise, considering anything above 76.2 kg to be Super-middleweight and anything below 66.8 to be Welterweight. Anything below 76.2 to be middleweight and anything below 72.57 to be light-middleweight. Store the classifications under weight_cat. Hint: Use ifelse function() with mutate() to achieve this. Store this back into df.

Exercise 9

Use the groupby() command to group df by weight_cat. This allows us to use aggregated functions similar to group by in SQL. Store this in a df called weight_group

Exercise 10

Use the summarize() command on the weight_group created in Question 9 to find the mean Time and sum of Dose received by each weight categories.