Hi, I'm Dinakar Reddy

Data Scientist | ML Engineer | Problem Solver

About Me

Get to know me better

Dinakar Reddy

I'm a Data Science graduate with expertise in building scalable ML pipelines and statistical models that drive business decisions. I specialize in transforming complex datasets into actionable insights using advanced analytics, LLMs, and cloud-based architectures. My passion lies in solving challenging problems at the intersection of technology and data, where I can leverage cutting-edge AI to create measurable impact.

Education

MS in Data Science

Location

Glassboro, NJ

Email

dinakarreddy2027@gmail.com

Phone

+1 (856) 652-3960

Technical Skills

The tools and technologies I work with

Languages

Python SQL R Java C++ C

Analytics & ML

Scikit-learn Statsmodels TensorFlow PCA Factor Analysis NLP Hotelling's T² MANOVA

Data Engineering

PySpark Databricks Delta Lake ETL/ELT Parquet

Visualization

Tableau Power BI ggplot2

Platforms

Azure Databricks AWS S3/EC2 Vertabelo

Databases

MySQL PostgreSQL Neo4j

Other Tools

OpenAI API LangChain FAISS RAG RStudio

Featured Projects

My data science implementations and research

Data Pipeline Data Engineering

Billing & Resource Cost Analysis Pipeline

Built a Databricks pipeline using medallion architecture (Bronze → Silver → Gold) to process billing and resource usage data.

Built a Databricks pipeline using medallion architecture (Bronze → Silver → Gold) to process billing and resource usage data. Designed dimension and fact tables in Vertabelo using staged ingestion.

Key Deliverables:
  • Developed gold-layer analytical tables for Monthly Charge Trends, Customer Spending, and Resource Cost Analysis
  • Ensured data quality via unit testing and delivered actionable insights through clean reports
  • Designed dimension and fact tables in Vertabelo using staged ingestion
Databricks AWS S3 SQL Vertabelo Parquet
Stock Data Processing Big Data

Scalable Stock Data Processing Pipeline

Engineered a modular PySpark pipeline to clean and standardize timestamped stock trading data with custom transformers.

Engineered a modular PySpark pipeline to clean and standardize timestamped stock trading data. Applied custom transformers and used Nutter for unit testing.

Key Features:
  • Stored outputs in silver layer Delta tables for optimized data access
  • Created gold-level analytics on Apple trends, return rates, and volumes
  • Applied custom transformers for data standardization
  • Implemented comprehensive unit testing with Nutter framework
PySpark Databricks Delta Lake SQL Nutter
Stock Forecasting AI AI/ML

Stock Forecasting + LLM Sentiment Analysis

Forecasted Microsoft (MSFT) stock prices using ARIMA, LSTM, and Linear Regression with LLM-powered sentiment analysis integration.

Forecasted Microsoft (MSFT) stock prices using ARIMA, LSTM, and Linear Regression. Evaluated models with RMSE and MAPE; visualized actual vs predicted trends. Integrated LLM-powered sentiment analysis from news articles using OpenAI and LangChain for enriched market context.

Technical Highlights:
  • Custom sentiment scoring pipeline using OpenAI GPT and LangChain
  • Ensemble modeling combining ARIMA, LSTM, and Linear Regression
  • Real-time news integration via NewsAPI for market sentiment analysis
  • Comprehensive model evaluation using RMSE and MAPE metrics
Python Scikit-learn TensorFlow Statsmodels OpenAI API LangChain Polygon.io NewsAPI
Insurance Data Mining Data Science

Insurance Data Mining & LLM Analysis

Performed comprehensive statistical analysis on insurance data with LLM-powered insights using RAG and domain knowledge extraction.

Performed univariate, bivariate, and multivariate statistical analysis on insurance data. Applied PCA, regression, and correlation analysis; validated normality using Mardia's test. Queried domain knowledge from PDFs via LangChain + RAG to generate LLM-driven insights and summaries.

Key Contributions:
  • Developed custom RAG pipeline with FAISS vector store for domain knowledge
  • Automated statistical report generation with LLM interpretation
  • Applied PCA, regression, and correlation analysis techniques
  • Validated findings using Mardia's test for multivariate normality
Pandas Scikit-learn Statsmodels OpenAI API LangChain FAISS RAG
Medical Research Healthcare Analytics

Chronic Kidney Disease Prediction

Predictive model using 26 clinical features achieving 92% accuracy in CKD detection with comprehensive feature importance analysis.

Built a comprehensive healthcare predictive model using 26 clinical features to assess Chronic Kidney Disease risk. Applied hypothesis testing to identify significant relationships between comorbidities and CKD progression, creating actionable visualizations for clinical decision-making.

Model Performance:
  • Achieved 92% accuracy in early CKD detection
  • Identified hemoglobin and blood pressure as top predictive features
  • Developed risk stratification visualizations for clinicians
  • Created high-risk patient profiles for targeted interventions
Python Pandas Seaborn Scikit-learn Matplotlib
Statistical Analysis Statistics

Multivariate Analysis: Insurance Data

Analyzed insurance data using univariate and multivariate techniques including Box-Cox transformations and 3D visualizations.

Analyzed insurance data using univariate and multivariate techniques including Box-Cox transformations and 3D visualizations. Conducted Hotelling's T² test to compare sample and literature mean vectors, resulting in statistically significant differences.

Key Findings:
  • Explored variable relationships by group (smoker, region) through pairwise plots
  • Applied Box-Cox transformations for data normalization
  • Conducted comprehensive hypothesis testing for group differences
  • Created 3D visualizations for complex multivariate relationships
R openxlsx ggplot2 MVN scatterplot3d Hotelling's T²
Sports Analytics Statistics

Bone Health & Triathlon Performance

Performed paired Hotelling's T² tests and MANOVA to analyze bone mineral content asymmetry and triathlon performance patterns.

Performed paired Hotelling's T² test to compare dominant vs non-dominant bone mineral content in women. Analyzed triathlon performance across age groups using MANOVA, identifying significant differences in SWIM and BIKE metrics.

Notable Results:
  • Constructed simultaneous confidence intervals for group comparisons
  • Validated assumptions of multivariate normality across datasets
  • Identified significant performance differences across age groups
  • Applied advanced multivariate statistical techniques
R openxlsx heplots MVN MANOVA
Financial Analysis Statistics

Stock Trends & Insurance Groups Analysis

Applied PCA and Factor Analysis on 23 companies' stock data and conducted MANOVA on insurance groups to identify patterns.

Conducted PCA and Factor Analysis (PC + MLE) on daily stock prices to extract 23 latent components. Used rotated loadings for clearer interpretation of company groupings in PC space. Compared smoker vs non-smoker and regional groups in health insurance data using MANOVA and Hotelling's T² intervals.

Key Insights:
  • Applied rotated loadings for clearer interpretation of company groupings
  • Confirmed multivariate differences in cost drivers and BMI by region
  • Used both PC and MLE methods for comprehensive factor analysis
  • Implemented advanced multivariate comparison techniques
R psych GPArotation corrplot MANOVA ICSNP
Environmental Data Visual Analytics

Environmental Contamination Analysis

Investigated chemical contamination in wildlife preserve using multi-station time-series data from the VAST Challenge 2018.

Investigated chemical contamination in the Boonsong Lekagul Wildlife Preserve using multi-station time-series data from the VAST Challenge 2018. Analyzed pollutant trends (e.g., Methylosmoline, heavy metals, herbicides) and linked contamination patterns to potential sources.

Visualization Features:
  • Visualized correlations between pollutant spikes and wildlife population decline
  • Developed interactive Tableau dashboards for spatiotemporal anomalies
  • Created environmental policy recommendations based on data insights
  • Analyzed endangered Rose-Crested Blue Pipit population trends
Tableau Visual Analytics Time-Series Geospatial

Experience & Education

My professional journey

Master of Science in Data Science

Rowan University

GPA: 3.7/4.0. Specialized in statistical modeling, machine learning, and large language models. Relevant coursework includes Multivariate Statistics, Data Mining, Large Language Models, and Big Data Tools.

Data Science Using Python

GlobalEdx.com, Hyderabad

Comprehensive training program covering Python programming for data science, including pandas, numpy, scikit-learn, and machine learning algorithms. Completed hands-on projects in data analysis, visualization, and predictive modeling.

Machine Learning Intern

Verzeo

Worked on machine learning projects using Python, focusing on model development, data preprocessing, and algorithm implementation. Gained practical experience in supervised and unsupervised learning techniques, feature engineering, and model evaluation.

AWS Essential Training for Developers

LinkedIn Learning

Completed comprehensive training on AWS cloud services including S3, EC2, Lambda, and more for scalable data solutions.

Neo4j Graph Data Science Certification

LinkedIn Learning

Certified in graph database technologies and graph algorithms for advanced data analysis.

AI for Everyone

Coursera

Fundamentals of AI, machine learning, and deep learning concepts for business applications.

Get In Touch

Feel free to reach out for collaborations or just a friendly hello

Contact Information

Location

Glassboro, NJ, USA