## BIG DATA ANALYSIS FOR ECONOMICS AND FINANCE

## Syllabus

EN
IT

LEARNING OUTCOMES: The course will discuss big data analysis for empirical exercises

in economics, finance, and business.

KNOWLEDGE AND UNDERSTANDING: Statistical and machine learning tools for

supervised and unsupervised learning with big data will be detailed.

APPLYING KNOWLEDGE AND UNDERSTANDING: We will discuss the main applications,

with some more attention given to text mining. The R software will be used throughout.

MAKING JUDGEMENTS: The student will have to be able to choose the most appropriate

statistical method, interpret its result, and be aware of its limitations.

COMMUNICATION SKILLS: The student will have to be able to communicate the results

through graphs, tables, and comments.

LEARNING SKILLS: The students will be made aware of strengths and limitations of the

methods, and pointed to further reading.

### Learning Objectives

LEARNING OUTCOMES: The course will discuss big data analysis for empirical exercises

in economics, finance, and business.

KNOWLEDGE AND UNDERSTANDING: Statistical and machine learning tools for

supervised and unsupervised learning with big data will be detailed.

APPLYING KNOWLEDGE AND UNDERSTANDING: We will discuss the main applications,

with some more attention given to text mining. The R software will be used throughout.

MAKING JUDGEMENTS: The student will have to be able to choose the most appropriate

statistical method, interpret its result, and be aware of its limitations.

COMMUNICATION SKILLS: The student will have to be able to communicate the results

through graphs, tables, and comments.

LEARNING SKILLS: The students will be made aware of strengths and limitations of the

methods, and pointed to further reading.

## ALESSIO FARCOMENI

### Prerequisites

Attending students must have a good knowledge of linear and logistic regression models.

For students of the LM in Economics, it is advised to have at least attended the following

courses: “Statistics”, “Econometrics”, and possibly also “Microeconometrics”.

For students of the LM in Finance and Banking, it is advised to have passed "Mathematics”,

and at least attended “Statistics”and possibly also “Time series and econometrics”.

For students of the LM in Economics, it is advised to have at least attended the following

courses: “Statistics”, “Econometrics”, and possibly also “Microeconometrics”.

For students of the LM in Finance and Banking, it is advised to have passed "Mathematics”,

and at least attended “Statistics”and possibly also “Time series and econometrics”.

### Program

Characteristics of Big data. Sources of big data and motivating applications: web scraping,

social media, Google. Architectures for big data collection, analysis, and storage. Working

with a large sample size Principles of prediction and tuning parameter choice.

Unsupervised Learning: k-means, PAM, trimmed k-means. Supervised learning:

regularization and feature selection for linear and non-linear regression models. Ridge

regression, LASSO, elastic net. Machine learning approaches for supervised learning:

k-nearest-neighbors, classification and regression trees, random forests. An overview of

neural networks and deep learning. Images, sounds, text, as sources of information. Text

mining: natural language processing, latent Dirichlet allocation, sentiment analysis.

social media, Google. Architectures for big data collection, analysis, and storage. Working

with a large sample size Principles of prediction and tuning parameter choice.

Unsupervised Learning: k-means, PAM, trimmed k-means. Supervised learning:

regularization and feature selection for linear and non-linear regression models. Ridge

regression, LASSO, elastic net. Machine learning approaches for supervised learning:

k-nearest-neighbors, classification and regression trees, random forests. An overview of

neural networks and deep learning. Images, sounds, text, as sources of information. Text

mining: natural language processing, latent Dirichlet allocation, sentiment analysis.

### Books

Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman

& Hall/CRC Press

Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data

Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.

& Hall/CRC Press

Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data

Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.

### Bibliography

Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman

& Hall/CRC Press

Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data

Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.

& Hall/CRC Press

Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data

Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.

### Teaching methods

The course is carried out through lectures and practicums. Techniques will be introduced

by examples and described in mathematical formulas. Focus will be on the practical

use of each technique, and interpretation of the results.

by examples and described in mathematical formulas. Focus will be on the practical

use of each technique, and interpretation of the results.

### Exam Rules

The exam will be written, with a mix of open and closed form questions. Questions will cover the entire course material. Some questions will report either R code or R output, and will pertain the interpretation of the same.

Students must book for the exam. Students not booked in advance will not be allowed to take the exam.

Students will have to demonstrate to be able to choose the most appropriate statistical methodology, to know its limitations and strenghts, and to be able to implement each technique and interpret the results.

Students must book for the exam. Students not booked in advance will not be allowed to take the exam.

Students will have to demonstrate to be able to choose the most appropriate statistical methodology, to know its limitations and strenghts, and to be able to implement each technique and interpret the results.