Welcome to another interesting episode of our data digest. In this episode, we will be discussing an introduction to data analytics using R.
Instructor: Paulina Boadiwaah Mensah | LinkedIn
The name R came from the names of its core developers, Robert Gentleman and Ross Ihaka. It’s also a play on the name of its parent language; S. R’s semantics, however, is closer to that of Scheme, a functional programming language.
R is a functional programming language. Functions are first-class objects. This means that you can do anything with functions as with any other R object. As a result, you can assign R functions to variables, store them in lists, pass them as arguments, and return them as a result of a function. Everything that happens in R happens via a function call. Even assignment is really a function.
However, R is not a “pure” functional language. R has the stuff of imperative programming languages such as loops and assignments. So it isn’t just a functional programming language.
These are some of the many benefits that R can serve you:
Let me inform you about the various opportunities that are in there for you as an R expert:
Below are some of the companies that use R in their tech stack along with some other software.
Variables are simply names or tags that are given to the values or contents stored in the memory of the computer system that is executing codes.
There are some rules that bind the naming of variables in R. As such you should ensure that your variable:
The R programming language has majorly 6 six data types. The following are the data types with examples.
R has multiple operators that allow you to perform a variety of tasks.
Examples of each of the operators.
A data structure is a particular way of organizing data in a computer so that it can be used effectively. the idea is to reduce the space and time complexities of different tasks. The R programming language has 6 six data structures:
These are some of the platforms through which you can learn more and practice better the use of R for data analytics.
Let’s get our hands dirty and begin analyzing data in R!
In this hands-on session, it is recommended that you
Let’s follow the steps
Step I: Launch the RStudio from the RStudio shortcut on your desktop.
You can press Ctrl+ L to clear the console
The console is where you will be writing the codes and also have the output displayed to you.
Step II: Set your working directory
The working directory is the folder that you will be loading your data from and also where your files will be saved. You should set your working directory to your own desired folder and not copy and paste the code below. The code will only show you how you can go about it.
#setwd('filepath')
setwd("C:/Users/pauli/OneDrive/Desktop")
Step III: Load the CarData library
You can load the CarData library by writing this code on your console. When you load a library, all its contents, be it functions or datasets will be made available to you.
library(CarData)
Step IV: Read the data.
As defined earlier, the data Salaries from the CarData package will be used for this analytics.
dataset <- Salaries
View(dataset)
The View() prints the tabular form of the dataset on a new page.
You can as well use the head() or tail() to print the first 6 rows or last six rows of your dataset
head(dataset)
tail(dataset)
Step V: Obtaining the descriptive statistics of the numeric variables in the Salaries dataset.
summary(Salaries)
Step VI: Let’s check for the data structure type of the variables (columns) in this dataset.
class(dataset$discipline)
class(dataset$rank)
class(dataset$yrs.since.phd)
class(dataset$yrs.service)
class(dataset$sex)
class(dataset$salary)
Step VII: Load the tidyverse package.
library(tidyverse)
The tidyverse package is used for importing, tidying, manipulating, and visualizing data.
Step VIII: Replace the elements in column discipline, changing A to Theoretical Department and B to Applied Department
dataset <- dataset %>% mutate(discipline=fct_recode(discipline, "Theoretical Department" ="A", "Applied Department" ="B"))
Step IX: Now, let’s rename some column names.
We will be capitalizing the column names and also changing the third column name to "YEARS since PHD"
dataset <- dataset %>% rename_with(toupper)
dataset <- dataset %>% rename("YEARS since PHD" = YRS.SINCE.PHD)
Step X: Let’s get a table summary of the dataset and do a little bit of adjustment to the table data
To do this, it is necessary to load the gtsummary and scales package.
library(gtsummary)
library(scales)
dollar_fxn <- label_number_si(accuracy = 0.1, prefix ="$")
dataset %>% tbl_summary(digits=starts_with("SALARY") ~ dollar_fxn)
%>% modify_caption("General Overview of the Salaries Dataset")
%>% bold_labels()
Step XI: Create a scatterplot
This plot displays the years in service versus the salary.
dataset %>% ggplot(aes(x=`YEARS since PHD`, y=SALARY, color=SEX))
+ geom_point() + theme_wsj() + scale_color_manual(values=c("red", "black")) + geom_smooth(method=lm, se=F) +labs(title="Resagratia R Session",
subtitle="Dataset from CarData package")
Step XII: Now let’s create a report of what we have done so far using R Markdown
We do hope that you found this episode of the Data digest series exciting and insightful, for more access to such quality content, kindly sign up to the Resa platform by clicking here.
Thank you for learning with Us!
Empowering individuals and businesses with the tools to harness data, drive innovation, and achieve excellence in a digital world.
Copyright 2025Resagratia. All Rights Reserved.