← Back to Projects

Early Warning of Ukraine War Escalation via Social Media Analysis

A quantitative study on using grassroots social media discourse as a leading indicator for military conflict escalation, providing predictive power ahead of mainstream news.

Python LLM APIs (OpenAI, Anthropic) Pandas Time-Series Analysis Jupyter

Executive Summary

This project investigated whether grassroots social media discourse could serve as a leading indicator for military escalation in the Ukraine War. The primary hypothesis was that platforms with direct, unfiltered communication from conflict zones would reflect escalatory events before they are processed by traditional media. Using a novel LLM-based scoring system to quantify "escalation" across Telegram and news headlines, the analysis employed time-series techniques to identify lead-lag relationships.

The core finding is that discourse on Telegram leads mainstream news headlines in reporting escalatory events by approximately 24 hours, with a statistically significant peak correlation (r=0.38, p<.01). This suggests that monitoring specific social media channels can provide a critical early warning advantage for humanitarian, journalistic, and governmental organizations.

Key Findings

Telegram Leads the News by 24 Hours

The central hypothesis was confirmed. Cross-correlation analysis revealed that escalatory topics on Telegram, on average, precede their coverage in mainstream news by one full day. This is likely attributable to the direct, unmediated reports from soldiers and civilians on the ground, which bypass traditional editorial gatekeeping.

Escalation Score Over Time

This chart shows the complete 3-year evolution of the calculated escalation scores across all data sources, visualized as 7-day rolling averages.

Codex & Methodology

The project involved a multi-stage data pipeline to collect, process, and analyze the text data. The repository orchestrates LLM-powered scoring of Ukraine war headlines and social-media posts to quantify escalation trends and compare grassroots narratives with mainstream coverage.


# 01_fetch_headlines.ipynb
# Uses NewsAPI to fetch all Ukraine-related headlines day by day.
# Sends each headline through an OpenAI batch job that replies “YES”/“NO” 
# to keep only relevant headlines.

# 02_score_headlines.ipynb
# Builds JSONL tasks of daily headlines and scores them 0–10 using an 
# OpenAI model with a custom rubric.

# 03_analyze_scores.ipynb
# Loads scored CSVs, computes daily and weekly aggregates, and saves figures.
# Also analyzes and visualizes Truth Social scores using similar procedures.

# 05_scrape_telegram.ipynb
# Generates a sampling schedule (baseline plus ±14 days around major events).
# Defines Telegram channels, collects posts via Telethon, and stores them in SQLite.
                    

The full repository contains notebooks for scraping, scoring with multiple LLMs (GPT-4o, Claude family), statistical analysis, and visualization. A key innovation was the multi-dimensional scoring rubric, which evaluated not just escalation but also blame attribution and propaganda intensity.

Project Assets