Machine Learning

Content

  • Machine Learning and Economics
  • Intro to Python
    • Variables(text, digits)
    • Mulitline strings
    • List
    • Tuples
    • Dictionaries
    • Sets
    • If statements
    • For loops
    • Functions
  • Matplotlib package
  • Seaborn package
  • Numpy package
  • Pandas package
  • Scikit-learn package
    • Linear regression
      • Accuracy score (R^2)
      • Train_test_split()
      • Complexity of the model
      • Lasso regression
      • Ridge regression
    • Logistic regression
    • Support Vector Machine
    • Decision trees
    • Random forest
    • Xgboost
    • K-Fold cross validation

About course

Machine learning is used when you need to learn how to solve some class of problems for which it is difficult to write an explicit algorithm, but you can find many examples with correct answers. So, it is impossible to preset yourself a handwritten algorithm that would be able to distinguish a photo of a cat from pictures of the dog, but if you have enough pictures of both, you can use machine learning to build such an algorithm automatically. In this course, students will learn about principles and algorithms for turning training data into effective automated predictions. We will cover how to predict poverty scores, how to predict health outcomes for the people and so on. **Instructor**: Ilias Suvanov Lecuters will be held on Friday at 08:00

Beneficial links

Seminars

Instructor Schedule
Ilias Suvanov Lecuters will be held on Friday at 08:00

Course News

TBA

Lectures/Seminars

Date Title Materials View Additional Materials
L1 11th January What is Data, Why Python, Data-driven policy making. An introduction to machine learning. Basic terms, problem statements and application examples. Slides
S1 11th January Variables, Strings, Multiline Strings in Python, If Statment, List, Dictionary, Set, For Loop Statement in Python Seminar 1 nbviewer youtube tutorials
S2 18th January Pandas, numpy and matplotlib library Seminar 2 nbviewer Video youtube tutorials
S3 25th January Guest lecture by Ilya Schurov. Intro to Machine Learning Video
S4 1st February Linear Regression Theory Code Code
S5 8th February Train/Test Split Code
S6 15th February Presentation: Machine Learning for Everyone
S7 22nd February Logistic Regression
S8 1st March Logistic Regression Theory Code Code
S9 15th March Support Vector Machine (SVM) SVM Theory SVM Code
S10 29th March K-Fold Cross Validation. L1 and L2 regularization. K-fold K-fold

How to download .ipynb file from GitHub?

Grading Policy

Final Grade of course participants will be measured by the number of completed homeworks and FINAL exam. Additional points will be given to course participants for the participation in the class, presentations given in the class etc.

Control Work

Midterm

There would be NO MIDTERM exam for this course.

Oral Examination

Instructor reserves the right to require any course participant to sit for an individual oral examination (with turned on webcam and mic) before submitting the final grade to the registrar. Refusal to sit for an individual oral examination by course participant, may result in failing the course. Instructor will notify potential course participants about oral examination in the mid of April.

Final

For the FINAL exam you need to do an individual(groups are not allowed) analysis on the Kaggle platform, using any dataset on the platform. Your analysis should include Exploratory Data Analysis, Linear Regression model, Logistic Regression model and accuracy scores. You need to present your analysis in the class. Presentation should be no more than 5 minutes. The schedule for presentations can be found under the link https://docs.google.com/spreadsheets/d/1LqXnvP_nJlfSS4z8DBPtloJnbC1tX3rMAP0JZATS6WQ/edit#gid=0

Learning Python Materials

Foundational books

  1. Ben Stephenson - The Python Workbook: A Brief Introduction with Exercises and Solutions
  2. Nicola Lacey - Python by Example: Learning to Program in 150 Challenges
  3. Python documentation
  4. Jake VanderPlas - Python Data Science Handbook: Essential Tools for Working with Data
  5. Wes McKinney - Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
  6. Joel Grus - Data Science from Scratch: First Principles with Python

Machine Learning Materials

Online courses

  1. Ilya Schurov - Machine Learning at HSE(Russian language)
  2. MIT Introduction to Deep Learning
  3. Andrew Ng - Machine learning on Coursera
  4. Google - AI Adventures

Books

  1. Mathematics for Machine Learning - book with mathematical introduction to machine learning. You might be especially interested in Probability theory chapters.

Learning Statistics Materials

Online courses

  1. OpenIntro Statistics
  2. Khan Academy Statistics and probability

Books

  1. Christopher Barr, David M. Diez, and Mine Çetinkaya-Rundel - OpenIntro Statistics

Miscellaneous beneficial links

  1. Online practical exercises