Projects idea for students going into quant field

Andy Nguyen · 3/8/23

Whenever anyone sks how to get into a field involving #programming or #machinelearning, the advice is always "do projects and upload them to GitHub." The same applies to #quant investing, but what #quant projects would impress a prospective employer?

Well, I can tell you what would impress me. Other quants can comment on what would be good for them.

1) Code in a commonly used language like Python, C++, Java, Julia, or C#. Do not use Excel for this. Excel is not how production-level systems are deployed, and the difference between implementing in Excel and implementing in code is massive. You CAN use Jupyter notebooks as a way to integrate code presentation of results but keep in mind that isn't where production code lives

2) Ideally, you want to work off of real world datasets. If you don't already have access to them, you can find them at QuantConnect or Kaggle.

3) Build some sensible features. They don't need to be mind-blowing but put some thought into it. If you are using futures, make sure carry and momentum are in there somewhere. If you are looking at stock data, make sure you get a decent sample of the McLean and Pontiff (2016) signals (https://buff.ly/3OX2KvD). It is better to have too many features here than too few. If you don't know what to put, just make stationary (!) ratios, diffs, and percentage diffs where appropriate. Even using historical returns and share turnover over various periods as predictors is reasonable.

4) Implement various return prediction models using a linear model (e.g., linear ridge), a tree-based model (e.g., gradient boosting), and a neural network model (here, feed forward is fine). Don't worry about getting a large dataset. Of course, we know that if you want feed forward neural networks to be effective, we need a large dataset, but the point isn't to get an incredible model. It is to demonstrate skill.

5) Build a portfolio and calculate backtested returns. You can use mean-variance optimization (meaning you need to calculate covariance) or take a Kelly criterion approach if you're sizing bets and might be levered up or not. I personally don't care whether the model is any good. I just want to know how to put the pieces together.

6) Create a pretty graph of growth of a dollar and performance table compared to a benchmark (if appropriate).

Make sure the code is clean and elegant. Input, model, and output should be separated and should each be runnable as one line of code. No absolute paths should exist in the input or output--only in a config file, stored as a variable. In fact, it should be the only thing a user would have to change to run the code (as long as they have the data).

Source: Vivek Viswanathan

Daniel Duffy · 3/8/23

Excel is still important and its interfacing with C, C++, C#.
Traders use(d) Excel a lot!

CrossGamma · 3/9/23

Daniel Duffy said:
Excel is still important and its interfacing with C, C++, C#.
Traders use(d) Excel a lot!

These days it's almost a red flag if a company heavily uses Excel for trading. Maybe some banks still do. The world moved to Python 5+ years ago for more ad-hoc analysis and scripting.

Quasar Chunawala · 3/9/23

CrossGamma said:
These days it's almost a red flag if a company heavily uses Excel for trading. Maybe some banks still do. The world moved to Python 5+ years ago for more ad-hoc analysis and scripting.

I mean Python is used for rapid prototyping and ad-hoc analysis, but there would still be places you need a GUI and interactivity for doing a variety of things, that could be written in .NET, Java and in some cases just good 'ol excel.

CrossGamma · 3/9/23

Quasar Chunawala said:
I mean Python is used for rapid prototyping and ad-hoc analysis, but there would still be places you need a GUI and interactivity for doing a variety of things, that could be written in .NET, Java and in some cases just good 'ol excel.

The point is about Excel, not .NET or Java. It's pretty easily to build and deploy simply Python (web-)GUIs. I haven't seen Excel heavily used in trading for 5+ years and probably would be very hesitant joining a shop that still does.

Daniel Duffy · 3/9/23

CrossGamma said:
The point is about Excel, not .NET or Java. It's pretty easily to build and deploy simply Python (web-)GUIs. I haven't seen Excel heavily used in trading for 5+ years and probably would be very hesitant joining a shop that still does.

Excel in finance has a long history. Let's put things in perspective. I'm not talking about "programming in Excel" with macros but Excel interop with VBA, C++ (xll etc.), dll have to deal with legacy systems. I am neutral on which to use but AFAIK traders do use it.

How would you replace a C++ pricing library using Excel to Python? Many stakeholders use and want Excel or are stuck with Excel because no way will you rewrite a legacy system just to interface it with Python, I reckon.

Your market making niche is different, I agree. Horses for courses, I suppose.

Daniel Duffy · 3/9/23

CrossGamma said:
These days it's almost a red flag if a company heavily uses Excel for trading. Maybe some banks still do. The world moved to Python 5+ years ago for more ad-hoc analysis and scripting.

Why is it a red flag? Just curious.

Quasar Chunawala · 3/9/23

Daniel Duffy said:
Excel in finance has a long history. Let's put things in perspective. I'm not talking about "programming in Excel" with macros but Excel interop with VBA, C++ (xll etc.), dll have to deal with legacy systems. I am neutral on which to use but AFAIK traders do use it.

Your market making niche is different, I agree. Horses for courses, I suppose.

And most sell-side shops are anything but monolithic in how their software/quantitative models are consumed. It could be legacy excel tools, newer web tools. In short, its just a nice to know thing : C++ <-> Excel interop.

CrossGamma · 3/9/23

Daniel Duffy said:
Why is it a red flag? Just curious.

It shows they are using legacy tech, are probably fairly manual, ... Bit like you wouldn't want to work for a company that is still on Python 2.7, C++11 (98?) or Java 7. I can see how banks have more legacy systems to maintain and that Excel is a good tool for less technical people like sales, structureres, ... For trading companies it's not a great signal as the market is very competitive and the best people typically want to work with the newest tech.

Daniel Duffy · 3/9/23

CrossGamma said:
It shows they are using legacy tech, are probably fairly manual, ... Bit like you wouldn't want to work for a company that is still on Python 2.7, C++11 (98?) or Java 7. I can see how banks have more legacy systems to maintain and that Excel is a good tool for less technical people like sales, structureres, ... For trading companies it's not a great signal as the market is very competitive and the best people typically want to work with the newest tech.

Good sales pitch. Not everyone wants new tech.
I'm sure most quants can make up their own minds. Many have skills beyond programming. Many are interested in math and finance as well.

Today's newest tech is tomorrow's legacy, unless software systems have a short shelf life. C++11 is 90% of what is needed. 20/80 Pareto rule.

// when I was a developer in Comprimo Amsterdam (where the BNR building is now) 45 years I wrote an enterprise P&L manhour control on Apple II and in Pascal. It lived for 19 years after having been ported to a minicomputer. No one cared what the tech was. It was all about the business. There was no Excel in those days..

Daniel Duffy · 3/9/23

On "new tech", most of the cool features in C++20 were developed 30-40 years ago

generics ML and Ada
futures/promises/tasks
parallel/asynch programming
coroutines
tuples
lambda and functional programmig (1930s)
etc.

Daniel Duffy · 3/9/23

Just saw a post on LinkedIn

My dream as a 30-years old: to work as a trader in a room packed with people, feel the adrenaline of the noise and be in front of 6 mega screens.
My dream as a 40-years old: to run my business anywhere in the world, preferably in a quiet place and in front of a small screen laptop.

CrossGamma · 3/9/23

Daniel Duffy said:
Good sales pitch. Not everyone wants new tech.
I'm sure most quants can make up their own minds. Many have skills beyond programming. Many are interested in math and finance as well.

Today's newest tech is tomorrow's legacy, unless software systems have a short shelf life. C++11 is 90% of what is needed. 20/80 Pareto rule.

// when I was a developer in Comprimo Amsterdam (where the BNR building is now) 45 years I wrote an enterprise P&L manhour control on Apple II and in Pascal. It lived for 19 years after having been ported to a minicomputer. No one cared what the tech was. It was all about the business. There was no Excel in those days..

How is it a sales pitch if I have no product to sell (as opposed to others here *cough*)?

Daniel Duffy · 3/9/23

It feels a bit like preaching, speaking for myself. It might be a cultural/language thing.
It's the way I read it.

Daniel Duffy · 3/22/23

https://media.licdn.com/dms/image/D4E22AQEFaVoNzoN7ig/feedshare-shrink_800/0/1679504866944?e=1682553600&v=beta&t=cDEPlYpFcWI3mZV_Bbnj3G3oCgSVaFGlcjmU7u7WgLc

Andy Nguyen · 3/22/23

Daniel Duffy said:
https://media.licdn.com/dms/image/D4E22AQEFaVoNzoN7ig/feedshare-shrink_800/0/1679504866944?e=1682553600&v=beta&t=cDEPlYpFcWI3mZV_Bbnj3G3oCgSVaFGlcjmU7u7WgLc

Daniel,
You can use the Image button to upload a picture or enter the url of the image you found so they can show properly.

Dimitri Vulis · 3/26/23

Download QuantLib.

Look at all the open issues Issues · lballabio/QuantLib and see if you can fix/add something.

Or, make reference imlementations for calculations that should be there but aren't yet, e.g. post-libor RFR curves for various currencies, correct cash flows and price/yield (i.e. matching Bloomberg) for various types of bonds...

Try to get your contributions accepted. They would certainly look good on your cv.

AppliedMath · 5/30/23

In terms of Github, the projects are useless without decent readmes. The code you write is proof of what you explain in an interview or write in a Readme.md. Therefore, if your readme is 1-2 paragraphs, you've "shot yourself in the foot" with the project.

Daniel Duffy · 5/30/23

Developers have a dismal reputations for documenting their code. Most poeple are not interested in reading someone else's code.
There should a story

Fin model -> math model -> numerical model -> C++/Python -> Conclusions
Make sure you understand it A-Z.

At least, if you want to win friends and influence people.

Blogs - MSc Theses on Machine Learning and Computational Finance 2019 :: Datasim

MSc Theses at University of Birmingham 2019 MSc Mathematical Finance Programme Supervisor: Dr. Daniel J. Duffy, dduffy@datasim.nl Course Director: Dr. Colin Rowat,..

www.datasim.nl

This had 55K views and 25 reposts on LinkedIn, recently.

// On my LI entry I have posted about 30 MSc theses I supervised.

Paul Lopez · 11/14/23

Lovely thread!

Projects idea for students going into quant field

Andy Nguyen

Daniel Duffy

C++ author, trainer

CrossGamma

Quasar Chunawala

CrossGamma

Daniel Duffy

C++ author, trainer

Daniel Duffy

C++ author, trainer

Quasar Chunawala

CrossGamma

Daniel Duffy

C++ author, trainer

Attachments

Daniel Duffy

C++ author, trainer

Daniel Duffy

C++ author, trainer

CrossGamma

Daniel Duffy

C++ author, trainer

Daniel Duffy

C++ author, trainer

Andy Nguyen

Dimitri Vulis

AppliedMath

Daniel Duffy

C++ author, trainer

Blogs - MSc Theses on Machine Learning and Computational Finance 2019 :: Datasim

Paul Lopez

Similar threads