How to get started with R.

Getting started with R. 

R is a flexible and powerful open-source Language and has extensive statistical and graphing capabilities. Its syntax is very simple and intuitive. The large and fast-growing community around the R language has certainly contributed to its value as a programming language and as a data analysis environment.


1)  Install R and R studio (IDE) 

       rstudio

2) Install packages:

  The sheer power of R lies in its incredible packages. Installing packages is very easy in R
install.packages("package name")

 

3) Rstudio OverView

RStudio is the most popular R code editor, and it interfaces with R for Windows, MacOS, and Linux platforms.

  1. script pane– to write and save the programming script
  2. Console pane – where all the code will get executed
  3. Environment/history pane – displays all the variables created,functions used with in the current session
  4. Helper pane – contains multiple tabs to install/display packages, view visualization plots, locate files within the workspace

4) The Workspace

The workspace is your current R working environment and includes any user-defined objects (vectors, matrices, data frames, lists, functions)

5) Entering Commands

R is a command line driven program. The user enters commands at the prompt (> by default) and each command is executed one at a time.

6) Data Types in R

data-types.jpg

A vector is a variable in the commonly admitted meaning. A factor is a categorical variable. An array is a table with k dimensions, a matrix being a particular case of array with k = 2. Note that the elements of an array or of a matrix are all of the same mode. A data frame is a table composed with one or several vectors and/or factors all of the same length but possibly of different modes.

7) variable assignment (<- or  =)  

variable <- 10 
Extracting elements: this, [, can be used to extract content from vectors, lists, or data frames. and, [[ and $, extract content from a single object.

8)  Getting Help

Once R is installed, there is a comprehensive built-in help system. At the program’s command prompt you can use any of the following:

help("data.frame")
?data.frame
?getwd
?"$"

9) Books

tibshirani.jpg        hands on programing r    the art of r.jpg   r for data science    ml with r.jpg     Advanced-R.jpg

Important Packages

To load data

RMySQLRPostgresSQLRSQLite –  to read in data from a database.

XLConnectxlsx – to read and write Micorsoft Excel files from R.

foreign – to read a SAS/SPSS data set into R

R can handle plain text files – no package required. Just use the functions read.csv, read.table, and read.fwf.

To manipulate data

dplyr – dplyr is a go to package for fast data manipulation.

tidyr – Tools for changing the layout of your data sets.

stringr – Easy to learn tools for regular expressions and character strings.

lubridate – Tools that make working with dates and times easier.

To visualize data

ggplot2 – R’s famous package for making beautiful graphics.

ggvis – Interactive, web based graphics built with the grammar of graphics.

rgl – Interactive 3D visualizations with R

googleVis – Let’s you use Google Chart tools to visualize data in R.

To model data

car – car’s Anova function is popular for making type II and type III Anova tables.

mgcv – Generalized Additive Models

lme4/nlme – Linear and Non-linear mixed effects models

randomForest – Random forest methods from machine learning

multcomp – Tools for multiple comparison testing

vcd – Visualization tools and tests for categorical data

glmnet – Lasso and elastic-net regression methods with cross validation

survival – Tools for survival analysis

caret – Tools for training regression and classification models

To report results

shiny – Easily make interactive, web apps with R.

R Markdown – The perfect workflow for reproducible reporting.

For Spatial data

spmaptools – Tools for loading and using spatial data including shapefiles.

maps – Easy to use map polygons for plots.

ggmap – Download street maps straight from Google maps and use them as a background in your ggplots.

For Time Series and Financial data

zoo – Provides the most popular format for saving time series objects in R.

xts – Very flexible tools for manipulating time series data sets.

quantmod – Tools for downloading financial data, plotting common charts, and doing technical analysis.

To write high performance R code

Rcpp – Write R functions that call C++ code for lightning fast speed.

data.table – An alternative way to organize data sets for very, very fast operations.

parallel – Use parallel processing in R to speed up your code or to crunch large data sets.

To work with the web

XML – Read and create XML documents with R

jsonlite – Read and create JSON data tables with R

httr – A set of useful tools for working with http connections

 

Start your first data science case study in R here

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s