Login
Student authentication

Is it the first time you are entering this system?
Use the following link to activate your id and create your password.
»  Create / Recover Password

Syllabus

EN IT

Learning Objectives

LEARNING OUTCOMES:
The course provides an introduction to Statistical Learning and Data Mining.
The advances in information technology have made available very rich information data sets, often generated automatically as a by-product of the main institutional activity of a firm or business unit. Most organizations today produce an electronic record of essentially every transaction in which they are involved. Firms collect terabytes data over operating periods (transactions data, e.g. credit cards). Most often these data are collected as secondary data, with no specific sampling design or research question on top.

The course offers an insight into the main statistical methodologies for the visualisation and the analysis of business and market data, providing the information requirements for specific tasks such as credit scoring, prediction and classification, market segmentation and product positioning. Emphasis will be given to empirical applications using modern software tools (Rstudio).

The course has the following intended learning outcomes:
- to provide a thorough knowledge of data mining methods and statistical learning techniques;
- to provide the expertise to manage complexity in information and to be able to distil the stylized facts that are relevant for interpretation;
- to be able to predict business outcomes;
- to be able to select a predictive method among those available;
- to be able to communicate the statistical findings to a non expert audience;
- to be able to perform sophisticated statistical analyses with the appropriate software.
- to critically appraise the potential and the limitation of the available methodologies.

KNOWLEDGE AND UNDERSTANDING:
The course covers the modern statistical methodologies for the visualisation and the analysis of business and market data, that are relevant for making decisions in a complex and rapidly changing business environment.
The fundamental theme is supervised statistical learning, which deals with the prediction of quantitative and qualitative outcomes using a potentially large set of inputs. The two problems, regression and classification, constitute the core of the course.
Emphasis is given to the problem of variable and model selection and on the generalizability of a prediction method outside the training sample, via the optimization of the trade-off between model complexity and the in-sample goodness of fit.


APPLYING KNOWLEDGE AND UNDERSTANDING:

The methodologies exposed during the course are applied to real life datasets and case studies, dealing with the prediction of sales, credit scoring and pricing goods.

Two hours per week are dedicated to tutorials where statistical analyses are conducted in the Laboratory and implemented in the software R-studio.

Students are expected to perform their statistical analyses in a group assignment.

MAKING JUDGEMENTS:

The prediction of an outcome is an informed decision based on the knowledge of covariates and antecedents. An important supervised learning problem is classification. We discuss Bayes classification rule and how to select the prediction rule that is optimal for a particular target variable. The student is expected to be able to draw conclusions on the basis of the statistical evidence and to validate those conclusions on validation or test samples drawn from the same target population.

COMMUNICATION SKILLS:

Particular attention is dedicated to the ability to communicate the statistical evidence in a systematic and synthetic way, using graphs and summaries, to a non-specialist target audience.

The software used in the tutorials is oriented towards graphical displays and visualization of data. The student is asked to report on the statistical analysis carried out for a particular purpose in the individual assignments.

LEARNING SKILLS:

Students develop their learning skills by comparing the teaching material provided by the instructor and exposed in the lectures with the readings suggested with weekly periodicity. The software tutorials and the analysis of cases studies in the assignments will help build their applied skills and their autonomous progress towards the intended learning outcomes.

Prerequisites

Preliminary courses in Maths and Statistics

Program

1. Introduction to data mining. Tools for data analysis, visualisation and description.

2. The linear regression model.

3.Model selection and evaluation: bias-variance trade-off, model complexity and goodness of fit. Cross-validation. Selection using information criteria.

4. Regularization and shrinkage methods: rigde regression, lasso, forward stagewise regression. Principal components regression.

5. Linear methods for classication: Bayes Classication Rule. Discriminant analysis. Canonical variates. Logistic regression.

6. Semiparametric regression: Regression splines and smoothing
splines.

7. Kernel smoothing methods: Local polynomial regression.
Density estimation. Nearest neighbor classication.

8. Additive Models, tree-based methods. GAM, Regression and
classication trees. Boosting.

Books

The textbook for the course is the following:

G James, D Witten, T Hastie, and R Tibshirani and J Friedman. An Introduction to Statistical Learning with Applications in R. Springer, Springer Series in Statistics, 2009.
Dowloadable at http://www-bcf.usc.edu/~gareth/ISL/


The course material will be made available on the course website: slides, suggested readings, datasets, supplementary materials (script of Matlab, R and SAS).

Additional useful reference:
-
• T Hastie, R Tibshirani and J Friedman. The Elements of StatisticalLearning: Data Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics, 2009. Website: http://www-stat.stanford.edu/ElemStatLearn/

* G. Bekes and G. Kezdi. Data Analysis for Business, Economics, and Policy

Bibliography

G James, D Witten, T Hastie, and R Tibshirani and J Friedman. An Introduction to Statistical Learning with Applications in R. Springer, Springer Series in Statistics, 2009.
Dowloadable at http://www-bcf.usc.edu/~gareth/ISL/

•- Hastie, R Tibshirani and J Friedman. The Elements of StatisticalLearning: Data Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics, 2009. Website: http://www-stat.stanford.edu/ElemStatLearn/

G. Bekes and G. Kezdi. Data Analysis for Business, Economics, and Policy

Teaching methods

•- Lectures
• Classes
• Exercises
• Tutorials (Matlab, R)

Exam Rules

30% Individual and Group Assignments
70% Final Exam

The assignments aim at assessing the capabilities of processing and analysing statistical modelling, as well as the ability to communicate the relevant findings. The students are expected to produce a technical report.

The final exam is a written test of 120 minutes which assesses the learning of the program.