• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

A linear regression model about MMF admission

  • Thread starter Thread starter Eric.Z
  • Start date Start date
Joined
12/13/11
Messages
83
Points
18
For the moment, I have a linear regression model (with GPA as independent variable. I will try to add more variables when I have more data available)for those who applied Boston MMF and who are either accepted or rejected with a GPA between 0 and 4. (Waitlist is treated as rejection.) I am trying to figure out a logit and probit model in excel, which I believe would be more accurate for this kind of models with dummy variable as the dependent variable. I will post the result as soon as possible.

Admission = -0.4822 + .2838 * GPA (Admission is 1 if accepted and zero if rejected.)
P-value of the slope is 0.04502
R-square = 0.0303

The P-value looks not surprising to me at all. I believe if more samples are available and more independent variables can be included in the model, the R square would be much higher. From personal experience, admission result should be predictable if GPA, work experience, major and a few other information are available.



This is the initial work. I will post more results as soon as I get more data. Please HELP if you have any idea for the following questions. It will be appreciated.


1. I managed to import the data from the tracker to Excel. However, excel fails to recognize the tracker as a table; everything in the webpage is imported to one column in Excel. I have thousands of data in the column in the excel file now. I wonder if anyone knows any way to import the tracker data as a table in the excel so that I can include more data in the regression analysis.

2. Any idea how to convert the non-US system GPA to the 4.0 GPA?

3. I am thinking of adding a dummy variable of being international. Would you expect that to be a statistically significant variable?
 
1. I'd love access to this data once you figure it out.
2. A linear regression makes absolutely no sense for a success/failure model. Your R-square is a good indicator of this.
3. I can *try* writing a program in Stata that parses the information to build the table from the column.
 
1. ive been thinking about these kinds of mdoels for a while (not just for MFE but for college/law school etc). there is definitely existing literature on this though.
2. linear regression doesn't make sense. use a logistic regression for these types of models.
3. ditch excel. use R (free, and it can do this in <5 seconds).
 
A linear regression makes absolutely no sense for a success/failure model. Your R-square is a good indicator of this.
Agree, I am really surprised by how many people on this site blindly apply econometric models. In this case, using an independent variable that is continuous and a dependent variable that is 1 or zero as an indicator function.
 
1. I'd love access to this data once you figure it out.
2. A linear regression makes absolutely no sense for a success/failure model. Your R-square is a good indicator of this.
3. I can *try* writing a program in Stata that parses the information to build the table from the column.
1. I'd love access to this data once you figure it out.
2. A linear regression makes absolutely no sense for a success/failure model. Your R-square is a good indicator of this.
3. I can *try* writing a program in Stata that parses the information to build the table from the column.

Yeah, definitely let me know if you get the data. I would be more than happy to discuss what can be found from the data. :)
 
Agree, I am really surprised by how many people on this site blindly apply econometric models. In this case, using an independent variable that is continuous and a dependent variable that is 1 or zero as an indicator function.
I would certainly agree that probit and logit would fit here better. However, I believe a linear model is also informative as long as its limitations are recognized.
For this problem, if you are concerning about the prediction values being negative or larger than one, it will be helpful to take negative predictions as zero and predictions larger than one as 1. The OLS estimators should be unbiased, if i am not mistaken. However, heteroscedasticity is definitly a problem we should be aware of.
 
You can try to use some tool to read the tracker rss feed into a cvs file. Python can do this. When I can get around to it, I will post the tracker data dump for you with username field stripped off.
In any case since we don't have GPA and GRE data for all entries, I think your sample size for Boston is too small.
 
You can try to use some tool to read the tracker rss feed into a cvs file. Python can do this. When I can get around to it, I will post the tracker data dump for you with username field stripped off.
In any case since we don't have GPA and GRE data for all entries, I think your sample size for Boston is too small.
Yes, you are right. The sample size is certainly not large, but I think it should be okay. For schools like CMU, NYU or Columbia, there will definitely be more samples. I am not aware of any other ways to do it for BU unless increasing sample sized.
You can take multiple years.
Its a good idea to take multiple years. How to get the data from previous years?
 
. logit result greq grev if greq>200
Iteration 0: log likelihood = -13.460233
Iteration 1: log likelihood = -11.59538
Iteration 2: log likelihood = -11.584696
Iteration 3: log likelihood = -11.584694
Iteration 4: log likelihood = -11.584694
Logistic regression Number of obs = 20
LR chi2(2) = 3.75
Prob > chi2 = 0.1533
Log likelihood = -11.584694 Pseudo R2 = 0.1393
------------------------------------------------------------------------------
result | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
greq | .030872 .0384086 0.80 0.422 -.0444075 .1061515
grev | -.0105618 .0060417 -1.75 0.080 -.0224034 .0012798
_cons | -16.86222 29.26458 -0.58 0.564 -74.21974 40.49529
------------------------------------------------------------------------------

I didn't bother with GPA because it's so nonstandard, and I only included old GRE scores.

I found this humorous, though:
. logit result greq grev greawa if greq>100&greq<200
note: greawa != 4 predicts success perfectly
greawa dropped and 6 obs not used

outcome = greq > 163 predicts data perfectly
 
. logit result greq grev if greq>200
Iteration 0: log likelihood = -13.460233
Iteration 1: log likelihood = -11.59538
Iteration 2: log likelihood = -11.584696
Iteration 3: log likelihood = -11.584694
Iteration 4: log likelihood = -11.584694
Logistic regression Number of obs = 20
LR chi2(2) = 3.75
Prob > chi2 = 0.1533
Log likelihood = -11.584694 Pseudo R2 = 0.1393
------------------------------------------------------------------------------
result | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
greq | .030872 .0384086 0.80 0.422 -.0444075 .1061515
grev | -.0105618 .0060417 -1.75 0.080 -.0224034 .0012798
_cons | -16.86222 29.26458 -0.58 0.564 -74.21974 40.49529
------------------------------------------------------------------------------

I didn't bother with GPA because it's so nonstandard, and I only included old GRE scores.

I found this humorous, though:
. logit result greq grev greawa if greq>100&greq<200
note: greawa != 4 predicts success perfectly
greawa dropped and 6 obs not used
outcome = greq > 163 predicts data perfectly
Interesting! Would you mind sending me the data you used?
 
Back
Top