Hi, I'm Dinakar Reddy

Data Scientist | ML Engineer | Problem Solver

About Me

Get to know me better

Dinakar Reddy

I'm a passionate Data Science graduate student at Rowan University, specializing in statistical modeling, machine learning, and large language models. My expertise lies in transforming complex data into actionable insights through advanced analytics and compelling visual storytelling.

With hands-on experience in building end-to-end data pipelines and implementing AI solutions, I bridge the gap between technical implementation and business value. My projects span predictive modeling, multivariate analysis, real-time sentiment analysis, and innovative applications of LLMs.

Education

MS in Data Science

Location

Glassboro, NJ

Email

dinakarreddy2027@gmail.com

Phone

+1 (856) 652-3960

Technical Skills

The tools and technologies I work with

Programming

Python R SQL Java PySpark C/C++

Data Engineering

Databricks Delta Lake ETL/ELT AWS S3 Medallion Arch. MySQL/PostgreSQL

ML & AI

Scikit-learn TensorFlow OpenAI API LangChain RAG FAISS

Analytics

Tableau Power BI ggplot2 Statsmodels PCA/FA MANOVA

Featured Projects

My data science implementations and research

Billing Analysis Pipeline Data Engineering

Billing & Resource Cost Analysis Pipeline

Designed and implemented an enterprise-grade data pipeline processing billing and cloud resource usage data using Databricks' medallion architecture. The pipeline ingests raw data into bronze tables, transforms it through silver layers with data quality checks, and produces gold-layer analytical tables optimized for business intelligence. Created dimensional models in Vertabelo for staged ingestion and implemented unit testing to ensure data integrity.

Databricks AWS S3 SQL Vertabelo Parquet
Stock Data Pipeline Big Data

Scalable Stock Data Processing Pipeline

Developed a high-performance PySpark pipeline for processing and analyzing timestamped stock trading data at scale. Implemented custom transformers for data standardization and quality checks, with Nutter-based unit testing. The pipeline leverages Delta Lake's ACID transactions to maintain data integrity while processing millions of records. Final gold-layer analytics provide insights into Apple stock trends, return rates, and trading volumes with optimized query performance.

PySpark Databricks Delta Lake SQL Nutter
Stock Forecasting AI/ML

Stock Forecasting + LLM Sentiment Analysis

Implemented a hybrid forecasting system combining traditional time-series models (ARIMA) with deep learning (LSTM) and sentiment analysis from news articles. The system fetches real-time stock data via Polygon.io and news via NewsAPI, processes them through OpenAI's LLM for sentiment scoring, and integrates these signals into the prediction models. Achieved 15% improvement in forecast accuracy over baseline models by incorporating the sentiment context.

Python TensorFlow OpenAI API LangChain Polygon.io
Insurance Data Mining Data Science

Insurance Data Mining & LLM Analysis

Conducted comprehensive statistical analysis of insurance data combined with LLM-powered insights. Implemented Retrieval-Augmented Generation (RAG) to query domain knowledge from insurance PDFs, enhancing traditional statistical methods with contextual understanding. The system performs univariate, bivariate, and multivariate analysis including PCA, regression, and correlation analysis, validated using Mardia's test for multivariate normality. LLM integration provides natural language explanations of statistical findings for business stakeholders.

Pandas Scikit-learn LangChain FAISS RAG
Multivariate Analysis Statistics

Multivariate Analysis: Insurance Data

Performed advanced multivariate statistical analysis on insurance datasets using R. Implemented Box-Cox transformations for normality, 3D visualizations using scatterplot3d, and Hotelling's T² tests to compare sample means against literature values. The analysis revealed statistically significant differences in insurance metrics across demographic groups and regions. Developed interactive visualizations to explore complex relationships between variables like BMI, smoking status, and insurance costs.

R ggplot2 scatterplot3d Hotelling's T² MVN
Bone Health Analysis Statistics

Bone Health & Triathlon Performance

Conducted paired Hotelling's T² tests to compare dominant vs non-dominant bone mineral content in women, revealing significant asymmetries. Analyzed triathlon performance across age groups using MANOVA, identifying key differences in swimming and biking metrics. Validated multivariate normality assumptions and constructed simultaneous confidence intervals for performance metrics. The analysis provided insights into age-related performance patterns and bone health relationships with athletic activity.

R heplots MANOVA MVN openxlsx
Stock Trends Analysis Statistics

Stock Trends & Insurance Groups

Applied Principal Component Analysis (PCA) and Factor Analysis to daily stock prices of 23 companies, extracting latent market factors. Used varimax rotation for clearer interpretation of sector groupings. Conducted MANOVA on health insurance data, confirming significant multivariate differences between smoker/non-smoker groups and regional variations in cost drivers. Implemented correlation heatmaps and loading plots to visualize complex relationships in high-dimensional financial and insurance data.

R psych GPArotation corrplot MANOVA
Environmental Visualization Visual Analytics

Environmental Visualization

Investigated chemical contamination patterns in the Boonsong Lekagul Wildlife Preserve as part of the VAST Challenge 2018. Developed interactive Tableau dashboards to analyze multi-station time-series data on pollutants like Methylosmoline and heavy metals. Identified spatiotemporal anomalies linking industrial activity to wildlife population decline, particularly the endangered Rose-Crested Blue Pipit. Created visual storytelling techniques to communicate complex environmental relationships to policymakers and stakeholders.

Tableau Visual Analytics Time-Series Geospatial
CKD Prediction Healthcare Analytics

Chronic Kidney Disease Prediction

Developed a predictive model using 26 clinical features to assess Chronic Kidney Disease (CKD) risk. Applied hypothesis testing to uncover significant relationships between comorbidities (diabetes, hypertension) and CKD progression. Created actionable visualizations identifying high-risk patient profiles (e.g., 60+ with anemia). The model achieved 92% accuracy in early detection, with feature importance analysis revealing key clinical markers for preventive care strategies.

Python Pandas Seaborn Scikit-learn Matplotlib

Experience & Education

My professional journey

2023 - 2025 (Expected)

Master of Science in Data Science

Rowan University

GPA: 3.7/4.0. Specialized in statistical modeling, machine learning, and large language models. Relevant coursework includes Multivariate Statistics, Data Mining, Large Language Models, and Big Data Tools.

2023

AWS Essential Training for Developers

LinkedIn Learning

Completed comprehensive training on AWS cloud services including S3, EC2, Lambda, and more for scalable data solutions.

2023

Neo4j Graph Data Science Certification

LinkedIn Learning

Certified in graph database technologies and graph algorithms for advanced data analysis.

2022

AI for Everyone

Coursera

Fundamentals of AI, machine learning, and deep learning concepts for business applications.

Get In Touch

Feel free to reach out for collaborations or just a friendly hello

Contact Information

Location

Glassboro, NJ, USA