Project Overview: This project aimed to develop a multivariate regression model to predict diamond prices based on various attributes such as carat weight, cut quality, color grade, clarity, and dimensions.
Key Achievements:
Model Development: Used OneHotEncoding to handle categorical variables like cut, color, and clarity. The final model explained 92% of the variance in diamond prices, as indicated by an Adjusted R² of 0.92.
Performance Metrics: Achieved a Mean Absolute Error (MAE) of $732.62 and a Root Mean Squared Error (RMSE) of $1133.33, highlighting the model’s accuracy in predicting diamond prices within an acceptable range of error.
Impact: This model serves as a valuable tool for stakeholders in the diamond industry, providing insights that can aid in strategic pricing and inventory management.
Project Overview: In this assignment, I focused on developing optimal predictive models for forecasting the heating load in buildings using the EnergyUse-Heating dataset. The project involved implementing and comparing various regularization techniques, including LASSO, Ridge Regression, and Elastic Net, to improve model performance by addressing multicollinearity and overfitting issues.
Key Achievements:
Model Selection and Evaluation: I utilized LASSO for feature selection, Ridge Regression for handling multicollinearity, and Elastic Net for balancing feature selection and model complexity. Ridge Regression emerged as the best-performing model with an Adjusted R² of 0.91, indicating that 91% of the variance in heating load could be explained by the model.
Outlier Handling: Applied Tukey’s method to remove outliers, which enhanced the robustness and accuracy of the predictive models.
Impact: This analysis provided a reliable model for predicting heating load, crucial for energy efficiency and cost control in building operations, making it highly valuable for stakeholders in the energy management sector.
Project Overview: This project focused on developing a Logistic Regression model to classify different types of wheat based on their physical characteristics. The goal was to create a model that could accurately differentiate between wheat types, providing valuable insights for agricultural research and crop management.
Key Achievements:
High Accuracy: The Logistic Regression model demonstrated exceptional performance, achieving an overall accuracy of 100%, with perfect precision, recall, and F1 scores across all classes. This indicates the model's reliability in distinguishing between different wheat types without any errors.
Generalization Capability: The model showed strong generalization capability, as evidenced by the close convergence of training and cross-validation scores. This balance between bias and variance ensured that the model was neither overfitting nor underfitting, making it robust across different datasets.
Impact: The analysis provided a robust solution for classifying wheat types based on physical characteristics, with potential applications in improving crop classification and agricultural practices. The insights gained from this model can help optimize crop management and quality control.