Rithika
Florian Johnson

MSCS at UMass Amherst
Data Scientist/ Data Engineer aspirant

DOWNLOAD RESUME

About Me

I hold a Master’s degree in Computer Science from the University of Massachusetts Amherst and am passionate about using data to address real-world challenges. Whether it's building scalable data pipelines, designing predictive models, or interpreting language through NLP, I enjoy organizing messy data into meaningful and actionable insights. Professionally, I’ve worked across diverse domains—from structured databases to unstructured social content. At S&P Global, I developed scalable data workflows using Python, SQL, Snowflake, and Databricks, reducing processing time for crop science data by 75% and enabling more efficient analytics and machine learning operations. My contributions also extend to award-winning projects, such as a music discovery app that won the People’s Choice Award at TikTok TechJam 2024. My academic and research work has been published in IEEE Xplore and Springer, covering topics like GAN-based abstract art generation, LSTM-driven COVID-19 forecasting, and transformer-based social media analysis. I bring a blend of technical depth, strategic thinking, and adaptability to help organizations make sense of complex data. I value clarity, collaboration, and ownership—consistently aiming to build reliable, insight-driven solutions that support impactful decision-making.

Work Experience

Data Analyst 1, S&P Global
Aug 2022 – Jul 2023
Technologies: Python, Snowflake, SQL, AWS S3, Tableau, Alteryx Designer, Airflow, Databricks

Implemented scalable ETL pipelines with versioning, cleaning, and structured acquisition from government, vendor, and agri-science data sources, reducing reporting latency by 90 minutes daily.
Automated preprocessing and analytical workflows, accelerating report delivery by 75% and enabling faster, more accurate insights for client-facing outputs.
Designed a predictive modeling pipeline for animal produce time-series forecasting in collaboration with global analysts, achieving 4% MSE and automating daily insights.
Built interactive Tableau dashboards to visualize key metrics and trends, improving communication across stakeholders, senior leadership, and 150+ client firms.

Research & Analysis, Data Analytics Intern, S&P Global
Mar 2022 – Jul 2022
Technologies: SQL, Snowflake, Python, VisualCron

Migrated data from SSMS to Snowflake by creating efficient views and stored procedures, cutting query times by 40% and enhancing downstream visualizations and model pipelines.
Developed Python scripts for structured data extraction from PDFs and automated ingestion using VisualCron, enabling daily pipeline refresh and schema population.

Associate Data Science Intern, Vimana Foundation
Feb 2020 – Aug 2020
Technologies: Python, MS Excel

Conducted EDA on market and component cost data for drone-based autonomous farming systems, optimizing part selection and cutting production costs by 27%.
Applied machine learning segmentation and classification models on drone imagery to detect rice crop diseases, achieving 85% accuracy and aiding precision agriculture deployment.

Projects

Miniature Relational Database Engine

Location: Amherst, USA

Date: May 2025

Developed a custom Java-based relational DBMS from scratch with support for SQL-like query execution, featuring B+ Tree indexing, buffer management, and page-level I/O tracking.

Implemented operators such as Table Scan, Selection, Projection, and Block Nested Loop Join, all integrated into a modular query pipeline with result materialization and CSV export support.

Benchmarked system performance on range queries with 5% selectivity, achieving under 6% I/O deviation compared to PostgreSQL, while supporting custom eviction and pinning policies in the buffer pool.

Explore

Prompt Stealing Attack in AI-Driven Assessment Content Generation

Location: Amherst, USA

Date: Dec 2024

Engineered a two-stage machine learning pipeline in Python and PyTorch to extract and reconstruct AI-generated assessment prompts from output responses, achieving 98% prompt recovery accuracy.

Used a Large Language Model (LLM) as an evaluative judge to score prompt reconstruction quality based on semantic similarity and intent preservation, enabling robust validation beyond lexical overlap.

The project exposed vulnerabilities in educational LLM-based systems by demonstrating how adversaries could reverse-engineer prompt content, raising critical concerns for model deployment in assessment settings.

Explore

Interactive Music Discovery and Artist Promotion System

Location: Amherst, USA

Date: November 2023

Built a scalable distributed system with MongoDB Atlas, integrating Spotify and YouTube APIs for artist recommendations and interactive music games, impacting over 500,000 users.

Applied audio analysis (pydub, librosa) and a custom popularity score to enhance emerging artist exposure by 80%, using a balanced metric of likes, views, shares, and follower counts.

Developed secure REST APIs in Python, enforced rate limits, used response caching, and maintained a microservices architecture, ensuring over 90% uptime and robust performance.

Explore

Semantic Profiling of Mass Shooter Narratives using NLP

Semantic Profiling of Mass Shooter Narratives using Natural Language Processing

Our research develops a fine-tuning strategy for large language models (LLMs) to detect violent tendencies in social media comments, crucial for identifying and mitigating potential mass shooting threats. We utilized the Mistral-7B model from Unsloth and BERT, exploring fine-tuning methodologies and an ensemble model approach to enhance performance in recognizing traits like terrorism, supremacism, and suicidal thoughts.

Using crossentropy as our loss function and role prompting with Mistral-7B, we improved trait identification. By fine-tuning Mistral-7B on violent tendencies, we created a specialized model capable of detecting subtle cues. Applying zero-shot chain-of-thought prompting on mass shooter manifestos further enhanced the model's ability to draw accurate conclusions from complex information.

Conducting a comparative study using PyTorch, we achieved a 20% enhancement in overall model performance and a 10% improvement in accuracy by implementing an ensemble model, surpassing individual multistage fine-tuned models.

Explore

Enhancing Content Categorization Systems

Location: Amherst, USA

Date: November 2023

Engineered a personalized content recommendation system by integrating LaMP architecture with a finely-tuned Flan-T5-base language model.

Curated a diverse dataset by combining LaMP News Categorization, AG News, and Book Depository datasets, optimizing for both diversity and computational efficiency.

Achieved a noteworthy 8% increase in model accuracy and 0.1 increase in F1 score by integrating a diverse Book Depository dataset, showcasing the model’s adaptability and effectiveness in categorizing content across varied domains.

Explore

Generating Abstract Art from Hand-Drawn Sketches using GANs

Location: Bangalore, India

Date: June 2022

Implemented CGAN, Cycle GAN, and Pix2Pix GAN models for transforming hand-drawn sketches into captivating abstract art.

Evaluated various image sharpening techniques, including Highboost Filter and Laplacian filter, to enhance model performance.

Pioneered creativity by developing a filter applicable in platforms like Instagram, providing a unique avenue for artistic expression.

Explore

Publications

Generating Abstract Art from Hand-Drawn Sketches using GANs

Publication: Springer

Date: June 2023

Authors: Chakrabarty, S., Johnson, R.F., Rashmi, M., Raha, R.

This paper explores the use of Generative Adversarial Networks (GANs) to create abstract art from hand-drawn sketches. The research was part of the Proceedings of International Joint Conference on Advances in Computational Intelligence (IJCACI 2022) and provides insights into the intersection of art and artificial intelligence.

Predicting the Number of New Cases of COVID-19 in India

Publication: IEEE Xplore

Date: December 2021

Authors: A. S., Johnson, R. F., R. k. N., M. T R., and V. V.

This study utilizes Survival Analysis and Long Short-Term Memory (LSTM) models to predict the number of new COVID-19 cases in India. Presented at the 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics, and Cloud), it introduces a new metric that significantly enhances the model's accuracy, offering a comprehensive analysis of pandemic trends using advanced computational methods.

Rithika Florian Johnson

About Me

Work Experience

Data Analyst 1, S&P Global Aug 2022 – Jul 2023 Technologies: Python, Snowflake, SQL, AWS S3, Tableau, Alteryx Designer, Airflow, Databricks

Research & Analysis, Data Analytics Intern, S&P Global Mar 2022 – Jul 2022 Technologies: SQL, Snowflake, Python, VisualCron

Associate Data Science Intern, Vimana Foundation Feb 2020 – Aug 2020 Technologies: Python, MS Excel

Projects

Miniature Relational Database Engine

Prompt Stealing Attack in AI-Driven Assessment Content Generation

Interactive Music Discovery and Artist Promotion System

Semantic Profiling of Mass Shooter Narratives using Natural Language Processing

Enhancing Content Categorization Systems

Generating Abstract Art from Hand-Drawn Sketches using GANs

Publications

Generating Abstract Art from Hand-Drawn Sketches using GANs

Predicting the Number of New Cases of COVID-19 in India

Rithika
Florian Johnson

Data Analyst 1, S&P Global
Aug 2022 – Jul 2023
Technologies: Python, Snowflake, SQL, AWS S3, Tableau, Alteryx Designer, Airflow, Databricks

Research & Analysis, Data Analytics Intern, S&P Global
Mar 2022 – Jul 2022
Technologies: SQL, Snowflake, Python, VisualCron

Associate Data Science Intern, Vimana Foundation
Feb 2020 – Aug 2020
Technologies: Python, MS Excel