Data Science with R

About course

Course is planned to be lectured for AUCA students in the Spring semester of 2020/2021 study year. This course is aimed at introducing programming and computational tools useful for future careers as data scientists. In the course, students will set up their own R programming environment; learn how to write, execute and modify R code and R scripts; load data sets into R, create effective numerical and graphical summary statistics, and see how to use R to perform some common statistical analyses; use programming techniques such as loops, conditionals and functions, to effectively solve practical and analytical issues that data scientists encounter when working with data.

Instructor: Ilias Suvanov

Lectures will be held on Tuesday at 08:00 and Thursday at 08:00

Content

  • Introduction
  • Difference between R vs Rstudio vs Kaggle notebook (Notepad vs MS Word vs Google doc)
  • Strings
  • Digits
  • Vectors
  • Factors
  • DataFrames
    • Selecting columns using $ operator
    • Selecting sub-table using [ , ] operator
    • summary()
    • table()
    • Importing data
    • Working with missing values
  • Tibbles
  • For loops
  • If statements
  • Functions
  • RMarkdown
  • Ggplot2 package (Visualization)
    • Scatter Plot
    • Line Graph
    • Bar Graph
    • Histogram
    • Density Plot
    • 2d Density Plot
    • Correlation Plot
    • Box Plot
    • Facet_grid()
    • Facet_wrap()
  • Plotly (Interactive Visualizations)
  • R base package for graphics
    • plot()
    • hist()
  • Dplyr package (Data manipulations)
    • Piping operator %>%
    • Select()
    • Mutate()
    • Filter()
    • Group_by()
    • Rename()
  • Tidyr package (Reshaping data format; wide and long data formats)
    • pivot_longer()
    • pivot_wider()
  • Merging two dataframes
    • cbind()
    • rbidn()
    • merge() (Left join, right join, outter join, self join)
  • Casual inference
    • Linear Regression
      • Interpreting regression table
      • R squared
      • Standard errors
      • T-statistics
      • Heteroskedasticity vs homoscedasticity
      • Robust standard errors
      • Classical approach to linear regression
      • Modern approach to linear regression
    • Probit and logit models
      • Interpreting coefficients
  • Latex

Learning resources

  1. Resources for Learning R
  2. Resources for Learning Ggplot2
  3. Resources for Learning Plotly
  4. Resources for Learning Dplyr
  5. Resources for Learning Rmarkdown
  6. Resources for Learning Econometrics Data Science