Login
Student authentication

Is it the first time you are entering this system?
Use the following link to activate your id and create your password.
»  Create / Recover Password

Syllabus

EN IT

Learning Objectives




LEARNING OUTCOMES: The course will discuss big data analysis for empirical exercises
in economics, finance, and business.
KNOWLEDGE AND UNDERSTANDING: Statistical and machine learning tools for
supervised and unsupervised learning with big data will be detailed.
APPLYING KNOWLEDGE AND UNDERSTANDING: We will discuss the main applications,
with some more attention given to text mining. The R software will be used throughout.
MAKING JUDGEMENTS: The student will have to be able to choose the most appropriate
statistical method, interpret its result, and be aware of its limitations.
COMMUNICATION SKILLS: The student will have to be able to communicate the results
through graphs, tables, and comments.
LEARNING SKILLS: The students will be made aware of strengths and limitations of the
methods, and pointed to further reading.

Prerequisites

Attending students must have a good knowledge of linear and logistic regression models.
For students of the LM in Economics, it is advised to have at least attended the following
courses: “Statistics”, “Econometrics”, and possibly also “Microeconometrics”.
For students of the LM in Finance and Banking, it is advised to have passed "Mathematics”,
and at least attended “Statistics”and possibly also “Time series and econometrics”.

Program

Characteristics of Big data. Sources of big data and motivating applications: web scraping,
social media, Google. Architectures for big data collection, analysis, and storage. Working
with a large sample size. (4 hours)
Unsupervised Learning: k-means, PAM, trimmed k-means. (4 hours) Supervised learning:
regularization and feature selection for linear and non-linear regression models. Ridge
regression, LASSO, elastic net. (10 hours) Machine learning approaches for supervised learning:
k-nearest-neighbors, classification and regression trees, random forests. Principles of prediction and tuning parameter choice.
An overview of
neural networks and deep learning. (10 hours)
Images, sounds, text, as sources of information. Text
mining: natural language processing, latent Dirichlet allocation, sentiment analysis. (8 hours)

Books

Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.

Bibliography

Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.

Teaching methods

The course is carried out through lectures and practicums. Techniques will be introduced
by examples and described in mathematical formulas. Focus will be on the practical
use of each technique, and interpretation of the results.

Exam Rules

The exam will be written, with a mix of open and closed form questions. Questions will cover the entire course material. Some questions will report either R code or R output, and will pertain the interpretation of the same.

Students must book for the exam. Students not booked in advance will not be allowed to take the exam.

Students will have to demonstrate to be able to choose the most appropriate statistical methodology, to know its limitations and strenghts, and to be able to implement each technique and interpret the results.

The maximum mark will be 30 out of 30, the minimal mark to pass the exam is 18 out of 30. More in detail:

o Not passed: subtantial gaps in the comprehension or knowledge of the material.

o 18-20: sufficient comprehension and knowledge of the topics. Possible minor mistakes. Good ability to summarise the contents.

o 21-23: working comprehension and knowledge of the topics. Discrete ability to summarise and coherent reasoning.

o 24-26: discrete knowledge of the topics and comprehension; good ability to summarise and rigorous reasoning

o 27-29: complete knowledge of the topics and comprehension, very good ability to summarise. Ability to reason autonomously on the topics of the course.

o 30-30L: excellent knowledge of the topics and comprehension, excellent ability to summarise and ability to reason autonomously on the topics of the course.