EN
IT
Obiettivi Formativi
OBIETTIVI FORMATIVI: Il corso discuterà analisi di big data per esercizi empirici
in economia, business, e finanza.
CONOSCENZA E CAPACITÀ DI COMPRENSIONE: Strumenti statistici e informatici
per classificazione supervisionata e non-supervisionata, inclusa regolarizzazione
e scelta dei predittori. Breve introduzione alle reti neurali.
CAPACITÀ DI APPLICARE CONOSCENZA E COMPRENSIONE: Verranno discusse le
principali applicazioni moderne. Verrà utilizzato il software R.
AUTONOMIA DI GIUDIZIO: Lo studente dovrà essere in grado di scegliere lo strumento più
utile, e interpretarne i risultati e le limitazioni.
ABILITÀ COMUNICATIVE: Lo studente dovrà essere in grado di comunicare i
risultati delle sue analisi attraverso grafici, tabelle, e brevi testi sintetici.
CAPACITÀ DI APPRENDIMENTO: Gli studenti saranno coscienti dei vantaggi e limiti delle
tecniche introdotte, e potranno approfondirle su testi appositi.
Learning Objectives
LEARNING OUTCOMES: The course will discuss big data analysis for empirical exercises
in economics, finance, and business.
KNOWLEDGE AND UNDERSTANDING: Statistical and machine learning tools for
supervised and unsupervised learning with big data will be detailed.
APPLYING KNOWLEDGE AND UNDERSTANDING: We will discuss the main applications,
with some more attention given to text mining. The R software will be used throughout.
MAKING JUDGEMENTS: The student will have to be able to choose the most appropriate
statistical method, interpret its result, and be aware of its limitations.
COMMUNICATION SKILLS: The student will have to be able to communicate the results
through graphs, tables, and comments.
LEARNING SKILLS: The students will be made aware of strengths and limitations of the
methods, and pointed to further reading.
Prerequisiti
I corsi: "Statistics”, “Econometrics”, “Microeconometrics” per l'LM in Economics.
I corsi: “Mathematics”, “"Statistics”, “Time series and econometrics”for LM in Finance and
Banking.
Prerequisites
Attending students must have a good knowledge of linear and logistic regression models.
For students of the LM in Economics, it is advised to have at least attended the following
courses: “Statistics”, “Econometrics”, and possibly also “Microeconometrics”.
For students of the LM in Finance and Banking, it is advised to have passed "Mathematics”,
and at least attended “Statistics”and possibly also “Time series and econometrics”.
Programma
Caratteristiche dei Big Data. Fonti di Big data. Architetture per i big data (4 ore)
Apprendimento non-supervisionato: k-medie, PAM, k-medie trimmata (4 ore)
Apprendimento supervisionato: screening, regolarizzazione. Regressione ridge, LASSO, elastic net. (10 ore)
Metodi informatici: k-nn, C&RT, random forests, reti neurali. Principi di previsione e scelta dei parametri di tuning. (10 ore)
Principali applicazioni: analisi testuale, di immagini, di suoni. Latent Dirichlet Allocation,
sentiment analysis. (8 ore)
Program
Characteristics of Big data. Sources of big data and motivating applications: web scraping,
social media, Google. Architectures for big data collection, analysis, and storage. Working
with a large sample size. (4 hours)
Unsupervised Learning: k-means, PAM, trimmed k-means. (4 hours) Supervised learning:
regularization and feature selection for linear and non-linear regression models. Ridge
regression, LASSO, elastic net. (10 hours) Machine learning approaches for supervised learning:
k-nearest-neighbors, classification and regression trees, random forests. Principles of prediction and tuning parameter choice.
An overview of
neural networks and deep learning. (10 hours)
Images, sounds, text, as sources of information. Text
mining: natural language processing, latent Dirichlet allocation, sentiment analysis. (8 hours)
Testi Adottati
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Books
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Bibliografia
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Bibliography
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Modalità di svolgimento
Lezioni frontali ed esercitazioni. Le tecniche verranno introdotte tramite esempi e descritte
tramite formule. Il focus sarà sull'utilizzo pratico di ciascuna tecnica, e l'interpretazione
dei risultati.
Teaching methods
The course is carried out through lectures and practicums. Techniques will be introduced
by examples and described in mathematical formulas. Focus will be on the practical
use of each technique, and interpretation of the results.
Regolamento Esame
L'esame si svolgera' in forma scritta, con un mix di domande aperte e chiuse. Le domande riguarderanno il materiale del corso, e includeranno codice R e output di R del quale verra' chiesta l'interpretazione.
Per accedere all'esame gli studenti devono aver prenotato per tempo su Delphi.
Lo studente dovrà dimostrare di essere in grado di scegliere la tecnica statistica piu' adeguata al problema in esame, di conoscerne limiti e vantaggi, e di saper implementare la tecnica e interpretarne i risultati.
Il voto sarà espresso in trentesimi secondo la seguente scale:
Non idoneo: importanti carenze e/o inaccuratezze nella conoscenza e comprensione degli argomenti; limitate capacità di analisi e sintesi, frequenti generalizzazioni.
o 18-20: conoscenza e comprensione degli argomenti appena sufficiente con possibili imperfezioni; capacità di analisi sintesi e autonomia di giudizio sufficienti.
o 21-23: Conoscenza e comprensione degli argomenti routinaria; Capacità di analisi e sintesi corrette con argomentazione logica coerente.
o 24-26: Discreta conoscenza e comprensione degli argomenti; buone capacità di analisi e sintesi con argomentazioni espresse in modo rigoroso.
o 27-29: Conoscenza e comprensione degli argomenti completa; notevoli capacità di analisi, sintesi. Buona autonomia di giudizio.
o 30-30L: Ottimo livello di conoscenza e comprensione degli argomenti. Notevoli capacità di analisi e di sintesi e di autonomia di giudizio. Argomentazioni espresse in modo originale.
Exam Rules
The exam will be written, with a mix of open and closed form questions. Questions will cover the entire course material. Some questions will report either R code or R output, and will pertain the interpretation of the same.
Students must book for the exam. Students not booked in advance will not be allowed to take the exam.
Students will have to demonstrate to be able to choose the most appropriate statistical methodology, to know its limitations and strenghts, and to be able to implement each technique and interpret the results.
The maximum mark will be 30 out of 30, the minimal mark to pass the exam is 18 out of 30. More in detail:
o Not passed: subtantial gaps in the comprehension or knowledge of the material.
o 18-20: sufficient comprehension and knowledge of the topics. Possible minor mistakes. Good ability to summarise the contents.
o 21-23: working comprehension and knowledge of the topics. Discrete ability to summarise and coherent reasoning.
o 24-26: discrete knowledge of the topics and comprehension; good ability to summarise and rigorous reasoning
o 27-29: complete knowledge of the topics and comprehension, very good ability to summarise. Ability to reason autonomously on the topics of the course.
o 30-30L: excellent knowledge of the topics and comprehension, excellent ability to summarise and ability to reason autonomously on the topics of the course.
EN
IT
Obiettivi Formativi
OBIETTIVI FORMATIVI: Il corso discuterà analisi di big data per esercizi empirici
in economia, business, e finanza.
CONOSCENZA E CAPACITÀ DI COMPRENSIONE: Strumenti statistici e informatici
per classificazione supervisionata e non-supervisionata, inclusa regolarizzazione
e scelta dei predittori. Breve introduzione alle reti neurali.
CAPACITÀ DI APPLICARE CONOSCENZA E COMPRENSIONE: Verranno discusse le
principali applicazioni moderne. Verrà utilizzato il software R.
AUTONOMIA DI GIUDIZIO: Lo studente dovrà essere in grado di scegliere lo strumento più
utile, e interpretarne i risultati e le limitazioni.
ABILITÀ COMUNICATIVE: Lo studente dovrà essere in grado di comunicare i
risultati delle sue analisi attraverso grafici, tabelle, e brevi testi sintetici.
CAPACITÀ DI APPRENDIMENTO: Gli studenti saranno coscienti dei vantaggi e limiti delle
tecniche introdotte, e potranno approfondirle su testi appositi.
Learning Objectives
LEARNING OUTCOMES: The course will discuss big data analysis for empirical exercises
in economics, finance, and business.
KNOWLEDGE AND UNDERSTANDING: Statistical and machine learning tools for
supervised and unsupervised learning with big data will be detailed.
APPLYING KNOWLEDGE AND UNDERSTANDING: We will discuss the main applications,
with some more attention given to text mining. The R software will be used throughout.
MAKING JUDGEMENTS: The student will have to be able to choose the most appropriate
statistical method, interpret its result, and be aware of its limitations.
COMMUNICATION SKILLS: The student will have to be able to communicate the results
through graphs, tables, and comments.
LEARNING SKILLS: The students will be made aware of strengths and limitations of the
methods, and pointed to further reading.
ALESSIO FARCOMENI
Prerequisiti
I corsi: "Statistics”, “Econometrics”, “Microeconometrics” per l'LM in Economics.
I corsi: “Mathematics”, “"Statistics”, “Time series and econometrics”for LM in Finance and
Banking.
Prerequisites
Attending students must have a good knowledge of linear and logistic regression models.
For students of the LM in Economics, it is advised to have at least attended the following
courses: “Statistics”, “Econometrics”, and possibly also “Microeconometrics”.
For students of the LM in Finance and Banking, it is advised to have passed "Mathematics”,
and at least attended “Statistics”and possibly also “Time series and econometrics”.
Programma
Caratteristiche dei Big Data. Fonti di Big data. Architetture per i big data. Principi di
previsione e scelta dei parametri di tuning.
Apprendimento non-supervisionato: k-medie, PAM, k-medie trimmata.
Apprendimento supervisionato: regolarizzazione. Regressione ridge, LASSO, elastic net.
Metodi informatici: k-nn, C&RT, random forests, reti neurali.
Principali applicazioni: analisi testuale, di immagini, di suoni. Latent Dirichlet Allocation,
sentiment analysis.
Program
Characteristics of Big data. Sources of big data and motivating applications: web scraping,
social media, Google. Architectures for big data collection, analysis, and storage. Working
with a large sample size Principles of prediction and tuning parameter choice.
Unsupervised Learning: k-means, PAM, trimmed k-means. Supervised learning:
regularization and feature selection for linear and non-linear regression models. Ridge
regression, LASSO, elastic net. Machine learning approaches for supervised learning:
k-nearest-neighbors, classification and regression trees, random forests. An overview of
neural networks and deep learning. Images, sounds, text, as sources of information. Text
mining: natural language processing, latent Dirichlet allocation, sentiment analysis.
Testi Adottati
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Books
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Bibliografia
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Bibliography
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Modalità di svolgimento
Lezioni frontali ed esercitazioni. Le tecniche verranno introdotte tramite esempi e descritte
tramite formule. Il focus sarà sull'utilizzo pratico di ciascuna tecnica, e l'interpretazione
dei risultati.
Teaching methods
The course is carried out through lectures and practicums. Techniques will be introduced
by examples and described in mathematical formulas. Focus will be on the practical
use of each technique, and interpretation of the results.
Regolamento Esame
L'esame si svolgera' in forma scritta, con un mix di domande aperte e chiuse. Le domande riguarderanno il materiale del corso, e includeranno codice R e output di R del quale verra' chiesta l'interpretazione.
Per accedere all'esame gli studenti devono aver prenotato per tempo su Delphi.
Lo studente dovrà dimostrare di essere in grado di scegliere la tecnica statistica piu' adeguata al problema in esame, di conoscerne limiti e vantaggi, e di saper implementare la tecnica e interpretarne i risultati.
Gli studenti che non superano o si ritirano dall'esame possono ripeterlo nella stessa sessione d'esame.
Exam Rules
The exam will be written, with a mix of open and closed form questions. Questions will cover the entire course material. Some questions will report either R code or R output, and will pertain the interpretation of the same.
Students must book for the exam. Students not booked in advance will not be allowed to take the exam.
Students will have to demonstrate to be able to choose the most appropriate statistical methodology, to know its limitations and strenghts, and to be able to implement each technique and interpret the results.
Students who fail or withdraw from the exam may take it again in the same exam session.
ALESSIO FARCOMENI
ALESSIO FARCOMENI
ALESSIO FARCOMENI
EN
IT
Obiettivi Formativi
OBIETTIVI FORMATIVI: Il corso discuterà analisi di big data per esercizi empirici
in economia, business, e finanza.
CONOSCENZA E CAPACITÀ DI COMPRENSIONE: Strumenti statistici e informatici
per classificazione supervisionata e non-supervisionata, inclusa regolarizzazione
e scelta dei predittori. Breve introduzione alle reti neurali.
CAPACITÀ DI APPLICARE CONOSCENZA E COMPRENSIONE: Verranno discusse le
principali applicazioni moderne. Verrà utilizzato il software R.
AUTONOMIA DI GIUDIZIO: Lo studente dovrà essere in grado di scegliere lo strumento più
utile, e interpretarne i risultati e le limitazioni.
ABILITÀ COMUNICATIVE: Lo studente dovrà essere in grado di comunicare i
risultati delle sue analisi attraverso grafici, tabelle, e brevi testi sintetici.
CAPACITÀ DI APPRENDIMENTO: Gli studenti saranno coscienti dei vantaggi e limiti delle
tecniche introdotte, e potranno approfondirle su testi appositi.
Learning Objectives
LEARNING OUTCOMES: The course will discuss big data analysis for empirical exercises
in economics, finance, and business.
KNOWLEDGE AND UNDERSTANDING: Statistical and machine learning tools for
supervised and unsupervised learning with big data will be detailed.
APPLYING KNOWLEDGE AND UNDERSTANDING: We will discuss the main applications,
with some more attention given to text mining. The R software will be used throughout.
MAKING JUDGEMENTS: The student will have to be able to choose the most appropriate
statistical method, interpret its result, and be aware of its limitations.
COMMUNICATION SKILLS: The student will have to be able to communicate the results
through graphs, tables, and comments.
LEARNING SKILLS: The students will be made aware of strengths and limitations of the
methods, and pointed to further reading.
ALESSIO FARCOMENI
Prerequisiti
I corsi: "Statistics”, “Econometrics”, “Microeconometrics” per l'LM in Economics.
I corsi: “Mathematics”, “"Statistics”, “Time series and econometrics”for LM in Finance and
Banking.
Prerequisites
Attending students must have a good knowledge of linear and logistic regression models.
For students of the LM in Economics, it is advised to have at least attended the following
courses: “Statistics”, “Econometrics”, and possibly also “Microeconometrics”.
For students of the LM in Finance and Banking, it is advised to have passed "Mathematics”,
and at least attended “Statistics”and possibly also “Time series and econometrics”.
Programma
Caratteristiche dei Big Data. Fonti di Big data. Architetture per i big data. Principi di
previsione e scelta dei parametri di tuning.
Apprendimento non-supervisionato: k-medie, PAM, k-medie trimmata.
Apprendimento supervisionato: regolarizzazione. Regressione ridge, LASSO, elastic net.
Metodi informatici: k-nn, C&RT, random forests, reti neurali.
Principali applicazioni: analisi testuale, di immagini, di suoni. Latent Dirichlet Allocation,
sentiment analysis.
Program
Characteristics of Big data. Sources of big data and motivating applications: web scraping,
social media, Google. Architectures for big data collection, analysis, and storage. Working
with a large sample size Principles of prediction and tuning parameter choice.
Unsupervised Learning: k-means, PAM, trimmed k-means. Supervised learning:
regularization and feature selection for linear and non-linear regression models. Ridge
regression, LASSO, elastic net. Machine learning approaches for supervised learning:
k-nearest-neighbors, classification and regression trees, random forests. An overview of
neural networks and deep learning. Images, sounds, text, as sources of information. Text
mining: natural language processing, latent Dirichlet allocation, sentiment analysis.
Testi Adottati
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Books
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Bibliografia
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Bibliography
Brad Boehmke, Brandon Greenwell (2019) Hands-on Machine Learning with R, Chapman
& Hall/CRC Press
Hastie T., Tibshirani R., Friedman J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Second Edition. Springer, Springer Series in Statistics.
Modalità di svolgimento
Lezioni frontali ed esercitazioni. Le tecniche verranno introdotte tramite esempi e descritte
tramite formule. Il focus sarà sull'utilizzo pratico di ciascuna tecnica, e l'interpretazione
dei risultati.
Teaching methods
The course is carried out through lectures and practicums. Techniques will be introduced
by examples and described in mathematical formulas. Focus will be on the practical
use of each technique, and interpretation of the results.
Regolamento Esame
L'esame si svolgera' in forma scritta, con un mix di domande aperte e chiuse. Le domande riguarderanno il materiale del corso, e includeranno codice R e output di R del quale verra' chiesta l'interpretazione.
Per accedere all'esame gli studenti devono aver prenotato per tempo su Delphi.
Lo studente dovrà dimostrare di essere in grado di scegliere la tecnica statistica piu' adeguata al problema in esame, di conoscerne limiti e vantaggi, e di saper implementare la tecnica e interpretarne i risultati.
Exam Rules
The exam will be written, with a mix of open and closed form questions. Questions will cover the entire course material. Some questions will report either R code or R output, and will pertain the interpretation of the same.
Students must book for the exam. Students not booked in advance will not be allowed to take the exam.
Students will have to demonstrate to be able to choose the most appropriate statistical methodology, to know its limitations and strenghts, and to be able to implement each technique and interpret the results.
ALESSIO FARCOMENI
ALESSIO FARCOMENI
ALESSIO FARCOMENI
Updated A.Y. 2021-2022
Updated A.Y. 2021-2022
The course will discuss big data analysis for empirical exercises in economics, finance, and business. We will start with an overview of applications of big data analytics and data sources; further clarifying the main characteristics, advantages, and limitations of big data. Statistical and machine learning tools for supervised and unsupervised learning with big data will be detailed, including regularization methods for shrinkage and feature selection. A brief introduction to neural networks and deep neural networks will be also given. Finally we will discuss the main applications, with some more attention given to text mining. The R software for statistical computing will be used throughout.
Schedule of Topics
Characteristics of Big data. Advantages, limitations and opportunities. Sources of big data and motivating applications: web scraping, social media, Google. Architectures for big data collection, analysis, and storage. Working with a large sample size: sub-sampling, batching. Principles of prediction and tuning parameter choice.
Unsupervised Learning: k-means, PAM, trimmed k-means. Supervised learning: regularization and feature selection for linear and non-linear regression models. Ridge regression, LASSO, elastic net.
Machine learning approaches for supervised learning: k-nearest-neighbors, classification and regression trees, random forests. An overview of neural networks and deep learning. An idea of transfer learning.
Images, sounds, text, as sources of information. Text mining: natural language processing, latent Dirichlet allocation, sentiment analysis.
Other topics might be mentioned, including market basket analysis.
Exam rules
The final evaluation is based on a written examination. There will be no mid-term exam.
The exam will cover the entire program. Students must book for the exam. Students not booked in advance will not be
allowed to take the exam. Final marks are uploaded on the
Delphi system so to be individually received by email by candidates.