Get to know me better
I'm a passionate Data Science graduate student at Rowan University, specializing in statistical modeling, machine learning, and large language models. My expertise lies in transforming complex data into actionable insights through advanced analytics and compelling visual storytelling.
With hands-on experience in building end-to-end data pipelines and implementing AI solutions, I bridge the gap between technical implementation and business value. My projects span predictive modeling, multivariate analysis, real-time sentiment analysis, and innovative applications of LLMs.
MS in Data Science
Glassboro, NJ
dinakarreddy2027@gmail.com
+1 (856) 652-3960
The tools and technologies I work with
My data science implementations and research
Designed and implemented an enterprise-grade data pipeline processing billing and cloud resource usage data using Databricks' medallion architecture. The pipeline ingests raw data into bronze tables, transforms it through silver layers with data quality checks, and produces gold-layer analytical tables optimized for business intelligence. Created dimensional models in Vertabelo for staged ingestion and implemented unit testing to ensure data integrity.
Developed a high-performance PySpark pipeline for processing and analyzing timestamped stock trading data at scale. Implemented custom transformers for data standardization and quality checks, with Nutter-based unit testing. The pipeline leverages Delta Lake's ACID transactions to maintain data integrity while processing millions of records. Final gold-layer analytics provide insights into Apple stock trends, return rates, and trading volumes with optimized query performance.
Implemented a hybrid forecasting system combining traditional time-series models (ARIMA) with deep learning (LSTM) and sentiment analysis from news articles. The system fetches real-time stock data via Polygon.io and news via NewsAPI, processes them through OpenAI's LLM for sentiment scoring, and integrates these signals into the prediction models. Achieved 15% improvement in forecast accuracy over baseline models by incorporating the sentiment context.
Conducted comprehensive statistical analysis of insurance data combined with LLM-powered insights. Implemented Retrieval-Augmented Generation (RAG) to query domain knowledge from insurance PDFs, enhancing traditional statistical methods with contextual understanding. The system performs univariate, bivariate, and multivariate analysis including PCA, regression, and correlation analysis, validated using Mardia's test for multivariate normality. LLM integration provides natural language explanations of statistical findings for business stakeholders.
Performed advanced multivariate statistical analysis on insurance datasets using R. Implemented Box-Cox transformations for normality, 3D visualizations using scatterplot3d, and Hotelling's T² tests to compare sample means against literature values. The analysis revealed statistically significant differences in insurance metrics across demographic groups and regions. Developed interactive visualizations to explore complex relationships between variables like BMI, smoking status, and insurance costs.
Conducted paired Hotelling's T² tests to compare dominant vs non-dominant bone mineral content in women, revealing significant asymmetries. Analyzed triathlon performance across age groups using MANOVA, identifying key differences in swimming and biking metrics. Validated multivariate normality assumptions and constructed simultaneous confidence intervals for performance metrics. The analysis provided insights into age-related performance patterns and bone health relationships with athletic activity.
Applied Principal Component Analysis (PCA) and Factor Analysis to daily stock prices of 23 companies, extracting latent market factors. Used varimax rotation for clearer interpretation of sector groupings. Conducted MANOVA on health insurance data, confirming significant multivariate differences between smoker/non-smoker groups and regional variations in cost drivers. Implemented correlation heatmaps and loading plots to visualize complex relationships in high-dimensional financial and insurance data.
Investigated chemical contamination patterns in the Boonsong Lekagul Wildlife Preserve as part of the VAST Challenge 2018. Developed interactive Tableau dashboards to analyze multi-station time-series data on pollutants like Methylosmoline and heavy metals. Identified spatiotemporal anomalies linking industrial activity to wildlife population decline, particularly the endangered Rose-Crested Blue Pipit. Created visual storytelling techniques to communicate complex environmental relationships to policymakers and stakeholders.
Developed a predictive model using 26 clinical features to assess Chronic Kidney Disease (CKD) risk. Applied hypothesis testing to uncover significant relationships between comorbidities (diabetes, hypertension) and CKD progression. Created actionable visualizations identifying high-risk patient profiles (e.g., 60+ with anemia). The model achieved 92% accuracy in early detection, with feature importance analysis revealing key clinical markers for preventive care strategies.
My professional journey
Rowan University
GPA: 3.7/4.0. Specialized in statistical modeling, machine learning, and large language models. Relevant coursework includes Multivariate Statistics, Data Mining, Large Language Models, and Big Data Tools.
LinkedIn Learning
Completed comprehensive training on AWS cloud services including S3, EC2, Lambda, and more for scalable data solutions.
LinkedIn Learning
Certified in graph database technologies and graph algorithms for advanced data analysis.
Coursera
Fundamentals of AI, machine learning, and deep learning concepts for business applications.
Feel free to reach out for collaborations or just a friendly hello