CSV File Manipulation and Visualization in R – Part 1

Hello welcome friends, this is ravindu from mathblog93. Today i am gonna show you how to  csv file manipulations and do a very simple data visualization.

Often, we come across instances where we need to extract data from several .csv files. In that case, i advice you to gather those .csv files and put them into one folder.

In my case, i have created a folder named “bankData” in my working directory and put my .csv file into it. There i have 4 .csv files.

First of all, what i am going to do is to read all .csv files in my folder to R. For that, i use ‘list.files’ function.

filenames <- list.files("bankData",full.names=TRUE)

Here, what i have done is that i have stores all 4 of my file names to the character vector “filenames”. If you type “str(filenames)” in the console, you will see that “filenames” is a character vector of length 4.

After that, we are going to create a list of 4 data frames using the function “lapply” and an anonymous function. What this does is that this takes the list of file names that was returned before and import those file into R using “read_csv” function.


csv_files <- lapply(filenames,function(i){
  read.csv(i, header=FALSE, stringsAsFactors = FALSE, skip=4)
})

Here you can see  that we have set the argument “stringsAsFactors” to FALSE. Because in R, character variables that are passed into a data frame are converted into factors. We need to avoid this as we are going to do some manipulations with this data. Hence, we have set that argument to FALSE.

Argument “skip” specifies the no. of lines of the data file to skip before it starts reading the data. The .csv files we have here contains title texts in the first 4 rows, therefore, the argument “skip” is set to 4.

Once this is done we have all our data in our working environment.

Now if you type “csv_files” in the console, you will see that there is a column names “V3” which contains text characters rarely. We need to get rid of this column as this is of no use to our analysis.

Following code does that for us ;

csv_files<-lapply(csv_files, function(x) { x["V3"] <- NULL; x })

Here, function lapply is applied with the list of csv files and an anonymous function as arguments. What the anonymous function does here is that it assigns “V3” a null value hence removing it from the list.

Next step is to combine the 4 data frames in the list “csv_files” to a single data-frame.

df <- do.call(rbind.data.frame, csv_files)

Above code uses “row bind” to combine the 4 data frames into a single data frame. The function “do.call” constructs and executes a function call from a function and a list of arguments to be passed to it.(Refer R Documentation : https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/do.call)

Since this post is becoming very lengthy, we will discuss the rest of the analysis in the next tutorial.

Thanks for reading this guys! Stay tunes with us for latest updates!

Cheers!

Image credits : http://systemicresult.com/index.php/data-science/data-analysis

Advertisements

Plotting with ggplot2 in R

ggplot2 is an R package for statistical graphics. Using ggplot2 you can create stunning visualizations on your data. Posted here is an introductory ggplot2 project which visualizes the ‘iris’ data set that comes with RStudio.

The following code produces the plot as shown in the image.


#install ggplot2
install.packages("ggplot2")
library(ggplot2) 
#plot scatter plot of sepal length vs petal length
qplot(Sepal.Length, Petal.Length, data = iris)
#identify by species
qplot(Sepal.Length, Petal.Length, data = iris, color = Species)
#size of each point denotes petal width
qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width)
#make the plot little trasparent
qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width,
alpha=I(0.7))
#add axis labels and title
qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width,
xlab = "Sepal Length", ylab = "Petal Length", main = "Sepal vs Petal Length")

Above code is inspired from https://www.r-bloggers.com/quick-introduction-to-ggplot2/ . We sincerely thank them for contribution.

Basic Probability Theory

gmat-probability-questions-review

 

 

Here, we are going to teach you the basics of probability theory through a series of well formed problems. Please keep in mind that this lecture series is built on the fact that you have some kind of basic knowledge on fundamental theories of probability and statistics. But nevertheless if you have any problems on Basic Concepts in probability theory and Statistics, please drop a comment or email us at mathblog93@gmail.com.

Thanks!!!

ජීවිතය කියාදුන් අපූරු ගුරුතුමා…

“අද දින මා විසින් කරන ලද සියලුම වැඩ හොදින් බලා පවරන ලද අභ්‍යාස කරගෙන එනවාද ?එනවාමයිද? කුමන බාධක ගැටලු ආවත් 2012 විභාගය ඉහලින්ම සමත් වෙනවාද වෙනවාමයි ද? සියලු විභාග ඉහලින් සමත් වී, මේ රටේ ඉහලම රැකියා කරන, ඉහලම ධනවතුන් බවට පත් වී මේ රට ගොඩදමන , මේ රටේ දුප්පත්කම නැති කරන ගෞරවනීය පුරවැසියෙක් වෙනවාද? වෙනවාමයි ද ? ”
මේ වචන ටික මතකයි අදටත්! මේ චින්තනය තියන ළමයි 1000ක් හැම අවුරුද්දෙම සමාජයට පය තියනවනම් අද අපේ රට සුරපුරක්… මෙවන් ගෞරවනීය ගුරුවරයෙකුගෙන් ඉගෙන්න තරම් අපි වාසනාවන්ත උනා… හැල්මේ සල්ලි පස්සෙ නොයන, කුණුහරුප මාකට් නොකරන, ටියුශන් කාරයන් අතර ගුරුවරයෙක් වූ සදා ආදරණීය අපේ ගුරුතුමා…

1043855_774555509226031_1750592890_n