• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

Introduction to R and "how to" techniques in R

  • Thread starter Thread starter Yuriy
  • Start date Start date

Yuriy

MFE Alum
Joined
2/1/05
Messages
694
Points
28
In this thread, I will be posting interesting R examples that will give you a flavor of what R is. You can download R for free at http://www.r-project.org (there was a discussion about downloading R or S-Plus elsewhere on the forum).

One good thing about R is that it is a free program and if you need to do something, chances are that someone has already done something similar and you can find an open source code somewhere on the Internet.

R is a console application where you write your script and it may or may not be processed as you type it. All variables and functions that you create in your code become objects and can be used at any time later on (but once you close R you lose your objects).

Comments in R:
=> Comments in R are preceded by # on a given line. For example,
# This line is a comment and will be ignored

Variables:
=> To create variables and assign values to them, you do the following:
x = 5
# variable x is now equal to 5, to output the value of x on the screen type
x

Vectors:
=> To create vectors do the following
y = c(1,3,5,7,9,11)
# vector y is now [1, 3, 5, 7, 9, 11]
y

Matrices:
=> To create a matrix do the following
z = matrix(y,3,2,T)
z
# this creates a 3 by 2 matrix from vector y packing the numbers row by row
z = matrix(y,3,2,F)
# this creates a 3 by 2 matrix from vector y packing the numbers column by column
z

Help:
=> Every built in function has explanation in the help file, to access help type
help("matrix")

Matrix Transpose:
=> A transpose of a matrix is
z1 = t(z)
z1

Matrix Multiplication:
=> Multiplication of matrices is done using %*%
z1=z # make a copy of matrix z
z2=t(z) # set z2 equal to the transpose of z
product = z1%*%z2 # multiply the two matrices
product

Inverse of a Matrix:
=> To invert a matrix use the function solve()
A = matrix(c(1,0,2,1),2,2,T) # you can type in the vector directly into the formula
B = solve(A)

Initialize a Matrix:
=> To initialize a matrix use, for example, the command rep() that will repeat a value a certain number of times
C = matrix(rep(0,9),3,3)
# if you don't specify how you want to pack the matrix, the default 'by column' is used
C
# you get a 3 by 3 matrix of zeros

Creating a Sequence:
=> Sequences of numbers can be created in different ways, one way is
s = seq(0,1,0.1)
# a sequence from 0 to 1 with step size 0.1
s


Now to generating random numbers...
 
R can generate random numbers from many distributions.

For example, to generate random numbers from the log normal distribution with given mu and sigma, do the following (mu and sigma are the mean and standard deviation of the LOGNORMAL distribution, m and s are parameters for lognormal distribution).

# First set up mu, sigma, m, and s
mu = 0.1 # say mu is 0.1
sigma = 0.04 # say sigma is 0.04
m = log(sqrt((mu^2)/(1+(sigma/mu)^2)))
# OR, m = log(mu/(sqrt(1+(sigma/mu)^2)))
s = sqrt(log(1+(sigma/mu)^2))

#To generate 10000 lognormally distributed random numbers, type
y = rlnorm(10000,m,s)

#You can check the mean of the generated numbers by typing
mean(y)
#And standard deviation by typing
sd(y)

#Given the left-tail probability, you can find the corresponding percentile (quantile)
qlnorm(0.95,m,s)
#gives 0.1749754, meaning that 95% of the area (of PDF) is to the left of 0.1749754

#Given the percentile (quantile), you can find the corresponding probability
plnorm(0.34,m,s)
#gives 0.999623, meaning that the area to the left of 0.34 (of PDF) is 99.9623%
1-plnorm(0.34,m,s)
#gives the right tail probability of 0.0003769674

Similarly you can generate random numbers from other distributions, see help.
 
Input/Output From/To a File

There are several ways you can import and export data to and from R. These are the ones I use.

If you have an Excel table you can save it as a text file with TABs separating columns and use the following command to import data to R.

#The text file is called stocks.txt and is saved on C: drive
stocks=read.table(file='C:/stocks.txt',header=T,row.names=1)
#This creates an object 'stocks' holding the table
#header and row.names will tell R if you want column headings to be imported and where to find row names

If you have written a function that you often use but don't want to type it in every time, you can import it as an object and call it later on. The same way you can import functions written by someone else.
#The function is saved in the text file 'myfunction.q' and stored on drive C:
source("C:/myfunction.q")

Many times you dont want the output to appear on the screen but want it to be saved in a file. Here is what does the trick.
#The output will be saved in the file named "out.txt" on drive C:
sink("c:/out.txt")


You can also import data from the Internet or save it on you hard drive. R has excellent help resources on this topic.
 
Functions in R

Functions in R are easy to write. As mentioned before, R processes commands line by line. However, if you put { on a line, R will not process your commands until it sees } . Functions utilize this technique.

For example we can write a function calculating mean and standard deviation for a dataset.

#Let say you have a vector x
x = c(0,1,2,3,4)

#Function starts on the next line, first goes the function name
mystats = function(dataset){
#Variable dataset is your input to the function, also note an opening bracket {
y = c(mean(dataset),sd(dataset))
#y is a vector holding 2 values mean and standrd deviation of the dataset
y
#The line above tells R what to return after the function's end
}
#Closing bracket indicates function's end

#Now use the function on the dataset x
z = mystats(x)
z
#Will output 2 values mean and standard deviation of x
 
If - Else statement

Very simple to write

if(something==1) { # or < or > and maybe other comparisons
variable1=2
variable2=4
}
else{
variable1=0
variable2=0
}


For Loop

x = c(rep(0,10)) # simple initialization
for(i in 1:100){
x=i
}
 
Sometimes when you import a dataset as a table, R will not let you perform certain operations on it as it is. In this case, the table needs to be transformed to a matrix.

Let say we imported the table 'stocks' above and want to calculate means, standard deviations, and covariance/correlation matrices. This is what you do.

#If 'stocks' is stock prices and not returns, insert the following line to calculate log-returns
stocks=apply(log(stocks),2,diff)
#here R is taking natural logs of all stock prices first, and then computing differences
#differences are computed 2nd - 1st, 3rd - 2nd, ... Nth - N-1th
#so you need to sort your data by date in increasing order
#'2' means perform differences on columns, '1' would mean differences on rows

#Now assuming 'stocks' holds stock returns
sr=as.matrix(stocks) #transform into a matrix
returns=apply(sr,2,mean) # average return of all stocks in the table
volatility=apply(sr,2,sd) # standard deviation of returns of all stocks in the table
Covarsr=cov(sr) # covariance matrix of returns of all stocks in the table
Corrsr=cor(sr) # correlation matrix of returns of all stocks in the table
 
There is a million other things you can do in R like regression, time series analysis (including ARCH and GARCH). I will see if I can make R price options :)
 
Yuriy, great work

There is a million other things you can do in R like regression, time series analysis (including ARCH and GARCH). I will see if I can make R price options :)

there are some contributed packages, of which tseries (time series) is one of them, which is really nice. Nothing beats S-Plus's finmetrics module though, when it comes to serious time series analysis :)
 
Yuriy, great work

there are some contributed packages, of which tseries (time series) is one of them, which is really nice. Nothing beats S-Plus's finmetrics module though, when it comes to serious time series analysis :)

But R is free for everybody to use though.
 
Yuriy, this is great!! Thanks a lot for all the quick info about R.
 
John, I know about S-Plus and I did a little bit in FinMetrics, but I'm not sure if all of us here have access to S-Plus and FinMetrics :)
 
Back
Top