SynthCoT-RAG
2024 | GitHubSynthCoT-RAG
A pipeline for improving classification using synthetic chain-of-thought reasoning and retrieval-augmented generation.
This project began as part of my submission for SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection. The final pipeline included an ensemble with a Graph Neural Network inspired by EmoGraph. Unfortunately, I did not have time to create a proper submission, nor did I have any paper writing experience at the time. The original competition repository can be found here.
I got a renewed interest in the project after AlphaEvolve by Google Deepmind released in 2025. AlphaEvolve uses a similar strategy of using LLM's to generate reasoning to improve performance on tasks. However, they implement a MAP-Elites type evolution algorithm improve model performance.
Overview
- Take a labeled training dataset (CSV format)
- Generate synthetic chain-of-thought explanations for each training example using an LLM
- Encode and store examples + reasoning in a FAISS vector database
- At inference time, retrieve the most semantically similar examples as few-shot context
- Prompt an LLM with retrieved reasoning to predict emotion labels
