Get to know me better
I'm a Data Science graduate with expertise in building scalable ML pipelines and statistical models that drive business decisions. I specialize in transforming complex datasets into actionable insights using advanced analytics, LLMs, and cloud-based architectures. My passion lies in solving challenging problems at the intersection of technology and data, where I can leverage cutting-edge AI to create measurable impact.
MS in Data Science
Glassboro, NJ
dinakarreddy2027@gmail.com
+1 (856) 652-3960
The tools and technologies I work with
My data science implementations and research
Built a Databricks pipeline using medallion architecture (Bronze → Silver → Gold) to process billing and resource usage data.
Built a Databricks pipeline using medallion architecture (Bronze → Silver → Gold) to process billing and resource usage data. Designed dimension and fact tables in Vertabelo using staged ingestion.
Engineered a modular PySpark pipeline to clean and standardize timestamped stock trading data with custom transformers.
Engineered a modular PySpark pipeline to clean and standardize timestamped stock trading data. Applied custom transformers and used Nutter for unit testing.
Forecasted Microsoft (MSFT) stock prices using ARIMA, LSTM, and Linear Regression with LLM-powered sentiment analysis integration.
Forecasted Microsoft (MSFT) stock prices using ARIMA, LSTM, and Linear Regression. Evaluated models with RMSE and MAPE; visualized actual vs predicted trends. Integrated LLM-powered sentiment analysis from news articles using OpenAI and LangChain for enriched market context.
Performed comprehensive statistical analysis on insurance data with LLM-powered insights using RAG and domain knowledge extraction.
Performed univariate, bivariate, and multivariate statistical analysis on insurance data. Applied PCA, regression, and correlation analysis; validated normality using Mardia's test. Queried domain knowledge from PDFs via LangChain + RAG to generate LLM-driven insights and summaries.
Predictive model using 26 clinical features achieving 92% accuracy in CKD detection with comprehensive feature importance analysis.
Built a comprehensive healthcare predictive model using 26 clinical features to assess Chronic Kidney Disease risk. Applied hypothesis testing to identify significant relationships between comorbidities and CKD progression, creating actionable visualizations for clinical decision-making.
Analyzed insurance data using univariate and multivariate techniques including Box-Cox transformations and 3D visualizations.
Analyzed insurance data using univariate and multivariate techniques including Box-Cox transformations and 3D visualizations. Conducted Hotelling's T² test to compare sample and literature mean vectors, resulting in statistically significant differences.
Performed paired Hotelling's T² tests and MANOVA to analyze bone mineral content asymmetry and triathlon performance patterns.
Performed paired Hotelling's T² test to compare dominant vs non-dominant bone mineral content in women. Analyzed triathlon performance across age groups using MANOVA, identifying significant differences in SWIM and BIKE metrics.
Applied PCA and Factor Analysis on 23 companies' stock data and conducted MANOVA on insurance groups to identify patterns.
Conducted PCA and Factor Analysis (PC + MLE) on daily stock prices to extract 23 latent components. Used rotated loadings for clearer interpretation of company groupings in PC space. Compared smoker vs non-smoker and regional groups in health insurance data using MANOVA and Hotelling's T² intervals.
Investigated chemical contamination in wildlife preserve using multi-station time-series data from the VAST Challenge 2018.
Investigated chemical contamination in the Boonsong Lekagul Wildlife Preserve using multi-station time-series data from the VAST Challenge 2018. Analyzed pollutant trends (e.g., Methylosmoline, heavy metals, herbicides) and linked contamination patterns to potential sources.
My professional journey
Rowan University
GPA: 3.7/4.0. Specialized in statistical modeling, machine learning, and large language models. Relevant coursework includes Multivariate Statistics, Data Mining, Large Language Models, and Big Data Tools.
GlobalEdx.com, Hyderabad
Comprehensive training program covering Python programming for data science, including pandas, numpy, scikit-learn, and machine learning algorithms. Completed hands-on projects in data analysis, visualization, and predictive modeling.
Verzeo
Worked on machine learning projects using Python, focusing on model development, data preprocessing, and algorithm implementation. Gained practical experience in supervised and unsupervised learning techniques, feature engineering, and model evaluation.
LinkedIn Learning
Completed comprehensive training on AWS cloud services including S3, EC2, Lambda, and more for scalable data solutions.
LinkedIn Learning
Certified in graph database technologies and graph algorithms for advanced data analysis.
Coursera
Fundamentals of AI, machine learning, and deep learning concepts for business applications.
Feel free to reach out for collaborations or just a friendly hello