Welcome to my personal profile website. Here you can explore my resume, experience, projects, education, and more. Here, I am showcasing my skills and achievements in a professional and engaging manner. Take a look around and get to know me better. Feel free to reach out via the connect tab.
Check out my latest blog posts on Medium where I share insights and articles on data science, technology, and my personal projects. Click the button below to read more.
We used random forest and lasso regression models compared to linear regression as our baseline to analyze the data and concluded that alcohol, sulphates, and volatile acidity were the most important variables to predicting quality of wine. The use of Random Forest was due to its ease of use, flexibility for both classification and regression trees, as well as being much more accurate than bagging in the prediction of our models. LASSO Regression allows us to understand which variables should be considered as significant to the quality of wine and interpret it easily for management or business purposes. This knowledge could help producers create higher quality of wine and increase profitability for them.
I used AlphaFold and ESMFold to predict the 3D structure of the ELOV7_HUMAN protein, which is vital for the elongation of fatty acids and plays crucial roles in lipid metabolism and membrane function. When comparing the two models, AlphaFold demonstrates higher overall accuracy, particularly in modeling side chains and conserved regions, due to its combination of evolutionary data and deep learning techniques. However, it is computationally intensive and may struggle with regions lacking evolutionary information. In contrast, ESMFold, which employs language model-based techniques, offers faster and less resource-intensive predictions with good accuracy, especially for proteins with limited evolutionary data. While ESMFold's predictions align well with the UniProt 3D structure, particularly in core regions, they are slightly less precise in peripheral or less conserved areas compared to AlphaFold. Ultimately, AlphaFold's predicted structure closely matches the UniProt 3D structure with high confidence, whereas ESMFold provides a reasonably accurate, quicker alternative. The choice between these models depends on the study's specific requirements, balancing accuracy and computational efficiency.
I conducted an in-depth analysis of McDonald's Corporation for the Chapman University Investment Group, focusing on its financial performance, franchise business model, and investment potential. Utilizing Bloomberg terminals, I performed market research and developed comprehensive financial models, including DCF analysis and comparables. I managed a $45,000 endowment fund, where I oversaw budgeting and financial operations, and presented my findings to key stakeholders, recommending a buy for McDonald's based on its robust financial health, strong gross profit and EBITDA margins, and strategic initiatives to increase franchised restaurants and improve digital technology. My analysis highlighted potential risks such as shifts towards healthy eating and supply chain interruptions but emphasized the company's resilience and growth potential.
For my class project final in MGSC 410, I conducted a comprehensive analysis of subscriber data for Rosetta Stone to identify key characteristics and behaviors that define different customer segments. I merged subscriber information and app activity data, addressing issues with null values through data cleaning and transformation. This process allowed for a detailed examination of customer lifetime value (LTV) and engagement patterns.
Through my analysis, I discovered that app users are 2.23 times more likely to become lifetime subscribers compared to website users, while individual and European users are less likely to obtain lifetime subscriptions. Additionally, engaged customers significantly drive cross-sell and upsell revenue, highlighting the importance of customer engagement.
I segmented the user base into four clusters: Regular Engagers, Moderate Engagers, Super Engaged Users, and High-Spending Users, each with distinct spending habits and engagement levels. This segmentation provided insights into which customers are most valuable and how to enhance their engagement.
My analysis revealed that lifetime subscribers have a significantly higher LTV ($173.21) compared to limited subscribers ($81.40). Based on these findings, I recommended promoting lifetime subscriptions, extending trial periods to improve engagement, and targeting engaged demographics to maximize revenue from limited subscribers. This project demonstrated my ability to perform market research, analyze data, and provide strategic recommendations to improve business outcomes.
For my class project in MGSC 410, my team and I conducted an extensive analysis of the food scene around Chapman University. Our study aimed to identify key factors influencing the success of local restaurants. We examined various variables such as restaurant type, years in operation, cultural origin of cuisine, specialty, average meal price, proximity to Chapman University, seating capacity, ratings and reviews, competitor density, health inspection ratings, and alcohol availability.
Using a supervised linear regression model, we predicted the years of operation for these restaurants. We found that proximity to Chapman University significantly impacts a restaurant's success. Through clustering analysis with K-means and Gaussian Mixtures, we identified distinct clusters of restaurants, revealing that restaurants closer to Chapman with higher ratings tend to perform better.
Our study also highlighted the influence of culture and specialty on customer ratings, with more casual, finger food-oriented cultures being more successful. Additionally, we used a gradient boosting tree model to analyze the relationship between meal price and specialty on ratings, and a random forest model to determine the most important features affecting average meal prices.
Our findings provided valuable insights for targeted marketing, expansion planning, and dynamic pricing strategies, demonstrating my ability to perform complex data analysis and provide actionable business recommendations.
Our project focused on image captioning using a subset of the MS-COCO 2017 dataset, containing around 100,000 images with descriptive captions. The goal was to develop a model that can learn image features and generate captions, which is important for creating alt text to aid accessibility for people with disabilities. The dataset includes 118,000 images for training, 40,700 for testing, and 5,000 for validation, each resized to 224 x 224 pixels and having five captions.
Initially, we considered using VGG16 and GPT-2 but switched to EfficientNetB0 for feature extraction due to its efficiency, significantly reducing training time. We used an LSTM model for text generation because of its effectiveness in handling short-term dependencies in captions. Our final model performed well, achieving high accuracy in generating relevant captions, particularly for core objects within images. It closely aligned with the provided captions, though it occasionally struggled with less common objects.
Through this project, we learned the importance of choosing efficient pre-trained models for feature extraction to handle large datasets effectively. We also realized the iterative nature of training neural networks, which requires fine-tuning different architectures and regularization techniques to achieve desired results. Future improvements include experimenting with RNNs or transformers for potentially better accuracy, trying different regularization methods to address overfitting, and adding object detection to improve caption relevance. Overall, this project provided valuable insights into using neural networks for image captioning and balancing accuracy with computational efficiency.