Dataset Modeling
Completion requirements
Opened: Wednesday, December 10, 2025, 12:00 AM
Due: Friday, December 12, 2025, 11:59 PM
Your goal is to train a model on the train split of the dataset, and predict the target variable in the dataset on the test split.
- Examine the origins of your datasets. Are your features interpretable? Do you have any hypothesis about their relative importance? Do the features tend to be correlated or not?
- Do you expect the dataset to be homogeneous or heterogeneous? How can you best visualize the variance of the dataset?
- Do you expect to have a significant portion of outliers? How would you filter them?
- Does the target variable look like it can be explained by the predictor variables? Can you visualize it?
- Once you have trained a model, how can you assess its performance? Where does it perform better, where does it perform worse? Which features ended up being important for the model?