Dinakar Reddy Donthireddy

Data-driven graduate student specializing in statistical modeling, machine learning, and LLM integration. Transforming complex datasets into actionable insights through advanced analytics and visualization.

Explore My Work

About Me

πŸ“ Contact Information

πŸ“§ dinakarreddy2027@gmail.com

πŸ“± +1 (856) 652-3960

🏠 Glassboro, NJ

πŸŽ“ MS Data Science @ Rowan University

πŸ“Š GPA: 3.70/4.0

I'm a meticulous data science graduate student with a passion for creating statistical models, scaling data pipelines, and leveraging Large Language Models to drive meaningful outcomes. My academic journey has been focused on forecasting, multivariate analysis, end-to-end data processing, and real-time sentiment analysis integration.

What sets me apart is my ability to bridge the gap between complex technical implementation and compelling visual storytelling. I believe in making data-driven decisions by extracting actionable insights from the most complex datasets, whether it's through traditional statistical methods or cutting-edge AI techniques.

My experience spans from developing user-friendly applications for data structuring to building comprehensive analytical pipelines using modern cloud architectures. I'm particularly excited about the intersection of traditional data science and modern LLM capabilities.

Technical Skills

πŸ’» Programming Languages

Python R SQL Java C C++

πŸ”§ Data Engineering

PySpark Databricks Delta Lake Parquet Medallion Architecture ETL/ELT

πŸ€– Machine Learning & AI

Scikit-learn TensorFlow OpenAI API LangChain FAISS RAG

πŸ“Š Visualization & Analytics

Tableau Power BI ggplot2 scatterplot3d corrplot Excel

πŸ—„οΈ Databases & Cloud

MySQL PostgreSQL Neo4j AWS S3 AWS EC2

πŸ“ˆ Statistical Analysis

Statsmodels Hotelling's TΒ² MANOVA PCA Factor Analysis Time-Series

Featured Projects

πŸ—οΈ Billing & Resource Cost Analysis Pipeline

Enterprise-grade data pipeline with medallion architecture

Built a comprehensive Databricks pipeline using medallion architecture (Bronze-Silver-Gold) to process billing and resource usage data. Designed dimensional models and delivered actionable insights through analytical tables.

Databricks AWS S3 SQL Vertabelo Parquet

πŸ“¦ Scalable Stock Data Processing Pipeline

Modular PySpark pipeline with Delta Lake integration

Engineered a modular PySpark pipeline to clean and standardize timestamped stock trading data. Applied custom transformers and used Nutter for unit testing. Stored outputs in silver layer Delta tables and created gold-level analytics on Apple trends, return rates, and volumes.

PySpark Databricks Delta Lake SQL Nutter

πŸ“ˆ Stock Forecasting + LLM Sentiment Analysis

AI-powered stock prediction with news sentiment integration

Forecasted Microsoft stock prices using ARIMA, LSTM, and Linear Regression models. Integrated LLM-powered sentiment analysis from news articles to enrich market predictions with contextual insights.

Python TensorFlow OpenAI API LangChain Polygon.io

πŸ” Insurance Data Mining & LLM Analysis

Statistical analysis enhanced with RAG-powered insights

Performed comprehensive statistical analysis on insurance data using traditional methods. Implemented RAG (Retrieval-Augmented Generation) to query domain knowledge from PDFs and generate LLM-driven insights.

Pandas Scikit-learn LangChain FAISS RAG

πŸ“Š Multivariate Analysis: Insurance Data Project

Advanced statistical modeling and hypothesis testing

Conducted sophisticated multivariate analysis including PCA, Factor Analysis, MANOVA, and Hotelling's TΒ² tests. Applied Box-Cox transformations and created 3D visualizations for complex insurance datasets.

R ggplot2 scatterplot3d MVN MANOVA

πŸ“Š Multivariate Analysis: Bone Health & Triathlon Performance

Multivariate normality, Hotelling's TΒ², MANOVA

Compared dominant vs non-dominant bone mineral content using Hotelling’s TΒ². Used MANOVA to identify performance differences in triathlon disciplines (SWIM and BIKE) across age groups. Constructed confidence intervals and validated multivariate normality.

R openxlsx heplots MVN MANOVA

πŸ“‰ Multivariate Analysis: Stock Trends and Insurance Groups

Latent structure discovery using PCA, FA, and MANOVA

Conducted PCA and Factor Analysis on daily stock prices to extract 23 latent factors. Interpreted rotated loadings to group companies. Compared smoker/non-smoker and regional groups in insurance data using MANOVA and Hotelling’s TΒ².

R psych GPArotation corrplot MANOVA ICSNP

🌱 Environmental Visualization: VAST Challenge 2018 MC2

Spatiotemporal anomaly detection using visual analytics

Analyzed chemical contamination trends in Boonsong Lekagul Wildlife Preserve using time-series data. Linked pollution events to wildlife impact, and built Tableau dashboards to explore pollutant patterns and support policy insights.

Tableau Visual Analytics Time-Series

Education & Certifications

πŸŽ“ Education

Master's in Data Science

Rowan University

GPA: 3.70/4.0


Relevant Coursework:

  • Visual Analytics
  • Multivariate Statistics
  • Data Mining
  • Large Language Models
  • Data Warehousing
  • Big Data Tools & Techniques
  • Database Systems

πŸ† Certifications

  • AWS Essential Training for Developers
    LinkedIn Learning
  • Neo4j Graph Data Science Certification
    LinkedIn Learning
  • AI for Everyone
    Coursera
  • Machine Learning for All
    Coursera