About Me

👨‍🎓 Educational Background:

I am a proud alumnus of Georgia Tech, where I completed my Master's in Computational Data Science, and IIT Kharagpur, where I earned my Bachelor's in Mechanical Engineering. My academic journey through these prestigious institutions has ingrained in me a profound understanding of data science and technology, blending theoretical knowledge with practical application.

👨‍💻 Professional Experience:

I have recently embarked on an exciting journey as a Senior Data Scientist at Prudential Financial in their Chief Data Office. My extensive background in Data Science encompasses domains like Recommender Systems, NLP, Graph Machine Learning, and Statistical modeling. Prior to this, my role as a Senior Data Associate at Innovaccer Inc. involved developing innovative solutions in healthcare analytics. My internship at Prudential Financial was a pivotal experience, enhancing my skills in Graph machine learning and NLP.

⚽ Hobbies:

Beyond the professional realm, I am an avid sports enthusiast, enjoying Soccer, Tennis, Cricket, and more. I have a deep fascination with global cultures, driven by an eagerness to explore different countries and their unique stories. Additionally, I am a keen follower of tech and entrepreneurial podcasts, constantly seeking insights into the art of value creation in business.

Skills Summary

Programming Languages

  • Python
  • R
  • SQL
  • Cypher
  • C++
  • PySpark
  • C
  • PostgreSQL

Modeling Algorithms

  • Generative AI (LLMs, Multimodal)
  • CV
  • NLP
  • Regression
  • Neural Networks
  • Time Series
  • Recommender Systems
  • Traditional ML Models
  • Optimization
  • Graphs

Tools

  • AWS
  • Azure
  • GCP
  • GIT
  • Power BI
  • Visual Studio Code
  • ChatGPT
  • MongoDB
  • Elastic Search
  • Streamlit
  • Excel

Frameworks/Packages

  • scikit-learn
  • NLTK
  • Pandas, Numpy
  • PyTorch
  • Tensorflow
  • Spacy
  • Docker
  • LangChain
  • LlamaIndex
  • Neo4J
  • Streamlit

Work Experience

June 2023 - Aug 2023

Prudential Financial Prudential Financial

Graduate Data Science Intern (Chief Data Office)
Exploring the Synergies between Graph ML and Text Modality
  • Built an Information Extraction pipeline leveraging various BERT-based NLP utilities like NER, Topic Modeling, Coreference Resolution etc. to transform any raw unstructured text corpus into a structured Knowledge Graph(KG)
  • Designed hypothesis and applied centrality measures, various Graph-based ML algorithms like Louvain, Node2Vec, GCN, and Personalized PageRank for isolating suspicious communities, POIs. Augmenting graph embeddings with NLP attributes yielded a 37.5% increase in POI segmentation with a significant drop in False Positive Rate
  • Solutioned an approach to integrate enterprise-wide KG plugin with LLMs like ChatGPT for streamlining organization design and developing robust Pru-wide conversational agents using Retrieval Augmented Generation framework
Jun 2019 - June 2022

Innovaccer Inc Innovaccer

Senior Data Associate/Decision Scientist-I
Value Realization Project 1 : Implemented a care management strategy to identify and manage diabetic patients at high risk of readmission using ML Algorithms, SMOTE and ADASYN sampling technique
  • Employed feature engineering techniques to deduce significant features predictive of readmission and achieved an overall recall rate of 93% using SMOTE and Gradient Boosting Classifier/XGBoost
  • Enabled Mercy ACO to reduce the readmission rates from 12.1 % to 8.3 % for patients with successful protocol and provided an opportunity to save more than $2M in readmission costs in FY 2020 through automated alerts
Value Realization Project 2 : Built a Nursing Facility Recommendation Engine for post-acute care network expansion leveraging Matrix Factorization and Neural Collaborative filtering, for a health system in Iowa
  • Generated ratings and rankings for patient-snf interactions based on a weighted score of various features like ED utilization, LOS, staffing-related metrics, and explicit user feedback. Planned patient referrals harnessing geocode mapping
  • Enabled Mercy ACO(Accountable Care Organization) to reduce the average length of partnered SNF stay from 21.9 to 18 in FY 2021 and achieved a mean NDCG score of 0.92 and a RMSE score of 0.943
Additional Key Responsibilities
  • Organized end-to-end ETL scripts and data models and optimized the overall workflow to support migration from on-premises to AWS cloud platform, leveraging various AWS services (s3, Redshift, Aurora RDS, DMS, etc.)
  • Developed an automated framework using python/MongoDB/Incare API/Power BI which enabled Mercy ACO to effectively track the daily productivity of all Acute and Ambulatory health coaches.
May 2018 - July 2018

WNS Global Services WNS Global Services

Data Science Intern
  • Created an image tagging algorithm for images (associated with beverages) posted on social media for content cross-promotion using convolution neural networks by implementing YOLOv2 algorithm.
  • Built a multiclass sentiment classification algorithm using state-of-the-art Deep Learning frameworks like Feed forward neural networks and Bi-directional RNNs/LSTMs for classification of social media text posts

Latest works

Numerical Reasoning on Financial Reports with LLMs

Abstract

Financial reports offer critical insights into a company's operations, but require costly manual review. To address this, we leveraged finetuned Large Language Models (LLMs) to distill key indicators and operational metrics from these reports basis questions from the user. We leverage the FinQA dataset to fine-tune both Llama-2 7B and T5 models for customized question answering. We achieved results comparable to baseline on the final numerical answer, a competitive accuracy in numerical reasoning and calculation.

Business Value $$$

For FinTech, this offers a tool to quickly interpret complex financial documents, aiding in rapid and informed decision-making. It can enhance real-time analysis capabilities for investors, credit analysts, and regulatory compliance officers, providing a competitive edge in the fast-paced financial sector

Transform Financial Analysis: Instant Insight with AI-Driven Document Interpretation - Numerical Reasoning Pipeline
Extactive QA/Numerical Reasoning over Financial Reports

Vision-Meets-Text: Elevating E-Commerce with AI-Driven Multimodal Search

Abstract

This project explores enhancing Large Language Models' (LLMs) reasoning and question-answering capabilities by integrating Chain-of-Thought (CoT) reasoning with Visual Question Answering (VQA) techniques. Utilizing TextVQA and ScienceQA datasets, the research assesses the effectiveness of combining text and visual embedding methods to improve LLMs, focusing on solving multiple-choice questions and enhancing reasoning abilities.

Business Value $$$

Implementing this multimodal reasoning framework in the retail and e-commerce sector can significantly enhance customer search experience and product discoverability. By effectively integrating text and visual data, this approach offers more accurate and contextually relevant search results, driving sales and improving customer engagement.

arXiv GitHub
Multimodal QA Workflow!
Clarifying Vision: AI's Mastery in Interpreting Image Questions!

SynthetiQ: Innovating Data Landscapes with AI-Generated Insights

Abstract

This project demonstrates the innovative application of Large Language Models (LLMs) for generating synthetic medical and pharmacy claims data. The approach centers around converting tabular data into text prompts and fine-tuning LLMs to recreate realistic data patterns. This method addresses the limitations of conventional generative models by capturing time dependencies and complex relationships within healthcare data.

Business Value $$$

Utilizing Large Language Models for synthetic data generation provides a strategic advantage across various enterprises. It ensures data privacy and compliance, enriches data for robust analysis, and supports innovative solutions in fields like finance, retail, and healthcare. This technology facilitates deeper insights and decision-making while protecting sensitive information.

Details GitHub
Project Workflow!
Enhancing Data Depth: AI-Powered Qualitative Breakthroughs (Pros & Cons)

Robo Chef

Abstract

RoboChef merges image recognition with personalized recommendations, utilizing CNNs with ResNet models for food classification and a recommendation engine based on collaborative filtering and matrix factorization. It uses the Food-101 and Food.com datasets for accurate and user-tailored suggestions, optimizing with methods like SVD and NMF. Performance metrics include error rate and accuracy, improved by advanced training techniques.

Business Value $$$

RoboChef offers a scalable solution to enhance customer experience in the food and health sectors by providing personalized meal suggestions. It takes into account individual preferences and dietary restrictions, which could lead to increased customer satisfaction and loyalty. The adaptability of the system to various user constraints makes it a valuable tool for businesses aiming to cater to the personalized nutrition market.

Details GitHub
Snap & Savor: Click your cravings, get instant recipes with RoboChef!
Project Workflow!

EXPLORE: Explainable Song Recommender System

Abstract

This project report encapsulates the creation of a hybrid music recommendation system using Spotify's API and MLHD data, involving data extraction, machine learning algorithms, and a Tableau dashboard to deliver finely-tuned, user-centric music suggestions.

Business Value $$$

The hybrid recommendation approach, blending user behavior with content attributes, provides a more nuanced and accurate recommendation, crucial for platforms seeking to enhance user engagement and satisfaction. The explainability aspect of recommendations fosters user trust and transparency, a key differentiator in competitive markets. Additionally, the interactive dashboard can serve as a model for user-friendly interfaces, further improving user experience and increasing platform loyalty.

arXiv GitHub
Revolutionize Your Music Journey: Unveil the 'Why' Behind Every Beat with Our Explainable Recommender System!
Find Your Musical Match: Experience personalized connections through shared rhythms and beats

Forecast Fusion: Revolutionizing Slow-Selling SKUs Sales Predictions

Abstract

The "Best Buy Project" presentation showcases a sophisticated sales forecasting model for slow-selling SKUs. The model, developed using Croston and XGBoost methods, aims to predict sales with high precision a week in advance. It addresses challenges like data sparsity and variability in sales patterns, enhancing forecasting accuracy for products with intermittent demand.

Business Value $$$

This model offers e-commerce chains significant benefits, such as optimized inventory management, reduced costs from overstocking, and improved customer satisfaction through better product availability. Its ability to forecast demand for slow-moving items supports strategic decision-making, enabling more efficient supply chain operations and potentially boosting sales through targeted marketing and pricing strategies.

Details GitHub
Unveiling Patterns: Capturing the Rhythm of Seasons and Trends in Data
From Data to Decisions: Streamlining Sales in the Slow Lane with Precision and Profit!

Next-Gen Navigator: Pioneering the Future of Item Prediction

Abstract

The report presents innovative recommender systems using the HG-GNN and ISCON methods. These approaches enhance session-based recommendations by combining current user session data with historical user behavior patterns. Tested in different domains, these methods demonstrate improved accuracy in personalizing user experiences. This advancement signifies a major leap in recommendation technology, offering more relevant and engaging content to users.

Business Value $$$

These systems have the potential to revolutionize business engagement strategies. These systems not only cater to immediate user preferences but also smartly incorporate historical data, providing a highly personalized user experience. This can dramatically boost customer retention and spending, as users are more likely to engage with content or products that resonate with their unique interests and past behaviors.

Details GitHub
Next-Level Insights: Graph-Based Innovation in Item Prediction!
Smart Predictions: Revolutionizing Item Selection with Context-Aware Techniques! Paper Link