Advanced Statistics (CE) - Fall 2025
Topic outline
-
-
Class Meets from 1:30-3:00 in Z138 EXCEPT for 11/12/2025, where class will meet in Z104.
-
Please review and do all prep work/downloading prior to class start.
-
-
-
- Basic NumPy operations
- Visualizing distributions with histograms and KDEs
- Statistical summaries and outlier detection
- Distribution transformations
-
-
-
-
-
Opened: Wednesday, December 10, 2025, 12:00 AMDue: Friday, December 12, 2025, 11:59 PMYour goal is to train a model on the train split of the dataset, and predict the target variable in the dataset on the test split.
- Examine the origins of your datasets. Are your features interpretable? Do you have any hypothesis about their relative importance? Do the features tend to be correlated or not?
- Do you expect the dataset to be homogeneous or heterogeneous? How can you best visualize the variance of the dataset?
- Do you expect to have a significant portion of outliers? How would you filter them?
- Does the target variable look like it can be explained by the predictor variables? Can you visualize it?
- Once you have trained a model, how can you assess its performance? Where does it perform better, where does it perform worse? Which features ended up being important for the model?
-