HOME | DanielBoudagian

We used random forest and lasso regression models compared to linear regression as our baseline to analyze the data and concluded that alcohol, sulphates, and volatile acidity were the most important variables to predicting quality of wine. The use of Random Forest was due to its ease of use, flexibility for both classification and regression trees, as well as being much more accurate than bagging in the prediction of our models. LASSO Regression allows us to understand which variables should be considered as significant to the quality of wine and interpret it easily for management or business purposes. This knowledge could help producers create higher quality of wine and increase profitability for them.

View Project

For my class project final in MGSC 410, I conducted a comprehensive analysis of subscriber data for Rosetta Stone to identify key characteristics and behaviors that define different customer segments. I merged subscriber information and app activity data, addressing issues with null values through data cleaning and transformation. This process allowed for a detailed examination of customer lifetime value (LTV) and engagement patterns. Through my analysis, I discovered that app users are 2.23 times more likely to become lifetime subscribers compared to website users, while individual and European users are less likely to obtain lifetime subscriptions. Additionally, engaged customers significantly drive cross-sell and upsell revenue, highlighting the importance of customer engagement. I segmented the user base into four clusters: Regular Engagers, Moderate Engagers, Super Engaged Users, and High-Spending Users, each with distinct spending habits and engagement levels. This segmentation provided insights into which customers are most valuable and how to enhance their engagement. My analysis revealed that lifetime subscribers have a significantly higher LTV ($173.21) compared to limited subscribers ($81.40). Based on these findings, I recommended promoting lifetime subscriptions, extending trial periods to improve engagement, and targeting engaged demographics to maximize revenue from limited subscribers. This project demonstrated my ability to perform market research, analyze data, and provide strategic recommendations to improve business outcomes.

View Project

For my class project in MGSC 410, my team and I conducted an extensive analysis of the food scene around Chapman University. Our study aimed to identify key factors influencing the success of local restaurants. We examined various variables such as restaurant type, years in operation, cultural origin of cuisine, specialty, average meal price, proximity to Chapman University, seating capacity, ratings and reviews, competitor density, health inspection ratings, and alcohol availability. Using a supervised linear regression model, we predicted the years of operation for these restaurants. We found that proximity to Chapman University significantly impacts a restaurant's success. Through clustering analysis with K-means and Gaussian Mixtures, we identified distinct clusters of restaurants, revealing that restaurants closer to Chapman with higher ratings tend to perform better. Our study also highlighted the influence of culture and specialty on customer ratings, with more casual, finger food-oriented cultures being more successful. Additionally, we used a gradient boosting tree model to analyze the relationship between meal price and specialty on ratings, and a random forest model to determine the most important features affecting average meal prices. Our findings provided valuable insights for targeted marketing, expansion planning, and dynamic pricing strategies, demonstrating my ability to perform complex data analysis and provide actionable business recommendations.

View Project

Our project focused on image captioning using a subset of the MS-COCO 2017 dataset, containing around 100,000 images with descriptive captions. The goal was to develop a model that can learn image features and generate captions, which is important for creating alt text to aid accessibility for people with disabilities. The dataset includes 118,000 images for training, 40,700 for testing, and 5,000 for validation, each resized to 224 x 224 pixels and having five captions. Initially, we considered using VGG16 and GPT-2 but switched to EfficientNetB0 for feature extraction due to its efficiency, significantly reducing training time. We used an LSTM model for text generation because of its effectiveness in handling short-term dependencies in captions. Our final model performed well, achieving high accuracy in generating relevant captions, particularly for core objects within images. It closely aligned with the provided captions, though it occasionally struggled with less common objects. Through this project, we learned the importance of choosing efficient pre-trained models for feature extraction to handle large datasets effectively. We also realized the iterative nature of training neural networks, which requires fine-tuning different architectures and regularization techniques to achieve desired results. Future improvements include experimenting with RNNs or transformers for potentially better accuracy, trying different regularization methods to address overfitting, and adding object detection to improve caption relevance. Overall, this project provided valuable insights into using neural networks for image captioning and balancing accuracy with computational efficiency.

View Project

Daniel Boudagian

Portfolio

About

Blog

Contact