mapply() works with multivariate arrays, and applys a function to a set of vector or list arguments. mapply() also simplifies the output. Structure of the mapply() function: mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) Answers to the exercises are available here. Exercise 1 Beginning level Required dataframe: PersonnelData <- data.frame(Representative=c(1:4), Sales=c(95,110,115,90), […]

# data manipulation

## Let’s get started with dplyr

The dplyr package by Hadley Wickham is a very useful package that provides “A Grammar of Data Manipulation”. It aims to simplify common data manipulation tasks, and provides “verbs”, i.e. functions that correspond to the most common data manipulation tasks. Have fun playing with dplyr in the exercises below! Answers to the exercises are available […]

## Protected: Tidy the data up!

There is no excerpt because this is a protected post.

## Hierarchical Clustering exercises (beginner)

Grouping objects into clusters is a frequent task in data analysis. In this set of exercises we will use hierarchical clustering to cluster European capitals based on their latitude and longitude. Before trying out this exercise please make sure that you are familiar with the following functions: dist, hlcust, cutree, rect.hclust We will be using […]

## Select and Query Exercise

In this exercise we cover the basics on selecting and extracting data using queries. We add a few other materials into it. This should prepare you for the next exercise: Basic Decision Tree. The purpose of this is to give you the 20 percent of the tools to get 80 percent of work done. You […]

## Descriptive Analytics-Part 4 : Data Manipulation

Descriptive Analytics is the examination of data or content, usually manually performed, to answer the question “What happened?”. In order to be able to solve this set of exercises you should have solved the part 0, part 1, part 2 ,and part 3 of this series but also you should run this script which contain […]

## Sampling Exercise Part 1

In this Exercise, we will dive quickly through some basic sampling methods. Follow along this series to use these methods later for our decision trees modelling exercise. We will sample using the package caTools and caret. This is a beginner level exercise. Please refer to the help section for set.seed(), sample.split(),createDataPartition(), and createFolds() functions. You […]

## Optimize Data Exploration With Sapply() – Exercises

The apply() functions in R are a utilization of the Split-Apply-Combine strategy for Data Analysis, and are a faster alternative to writing loops. The sapply() function applies a function to individual values of a dataframe, and simplifies the output. Structure of the sapply() function: sapply(data, function, …) The dataframe used for these exercises: dataset1 <- […]

## Efficient Processing With Apply() Exercises

The apply() function is an alternative to writing loops, via applying a function to columns, rows, or individual values of an array or matrix. The structure of the apply() function is: apply(X, MARGIN, FUN, …) The matrix variable used for the exercises is: dataset1 <- cbind(observationA = 16:8, observationB = c(20:19, 6:12)) Answers to the […]

## Reshape 2 Exercises

The Reshape 2 package is based on differentiating between identification variables, and measurement variables. The functions of the Reshape 2 package then “melt” datasets from wide to long format, and “cast” datasets from long to wide format. Required package: library(reshape2) Answers to the exercises are available here. Exercise 1 Set a variable called “moltenMtcars“, by […]