Research on the Application of Computational Linguistics-Based Authorship Analysis in Military Intelligence
Doctor of Philosophy (PhD) in Linguistics โ Successfully Defended
Section Purpose: This section provides a high-level executive summary of the completed and successfully defended doctoral research by Dr. Ang Li (currently PI at Phaenarete ASI Lab). It outlines the core problem, the sponsoring entity, and the primary objective that the thesis successfully addressed, serving as an introduction to the intelligence gap closed by this research.
Abstract Summary
Traditional authorship analysis falls short in military intelligence contexts where communications are ultra-short and intentionally disguised by adversaries. This successfully defended thesis developed and validated a novel framework bridging traditional stylometry and modern deep learning to attribute highly obfuscated short-text intercepts reliably.
Project Metadata
- Final Length: ~52,000 words (Complete)
- Institution: The University of Edinburgh
- Sponsorship: MI6 (Secret Intelligence Service, UK)
- Core Output: Validated Authorship Analysis Framework
โ ๏ธ The Intelligence Gap
Section Purpose: Here we define the specific problems faced by analysts when dealing with intercepted communications. It highlights the dual challenge of "Short Texts" and "Adversarial Disguise," explaining why current commercial and academic models fail in operational intelligence environments.
The Short-Text Problem
Military commands, forum posts, and intercepted messages rarely exceed 50-200 words. Traditional stylometry (which relies on large corpuses to build an "idiolect" profile) degrades rapidly in accuracy when applied to micro-texts.
Adversarial Disguise
Targets are aware of interception. They employ synonym substitution, syntactic alterations, and zero-width characters to spoof or hide their identity. Standard Transformer models overfit on clean data and fail completely against disguised inputs.
๐ง The HSTAR Framework
Section Purpose: This section breaks down the developed technical solution: The Hybrid Stylometric-Transformer with Adversarial Resilience (HSTAR). The interactive architecture diagram below allows you to explore the different components of the fully realized model, from feature engineering to the explainability layer.
HSTAR Architecture Flow
๐ Experimental Results (Simulated)
Section Purpose: This section presents the quantitative validation of the thesis. Through interactive visualizations, it demonstrates how the developed HSTAR framework outperformed traditional baselines (SVM, basic BERT) across the two main challenge areas: text length and adversarial disguise.
Short-Text Attribution Accuracy
Context: Accuracy comparison across different document lengths. Hover over the bars for exact percentage values.
Model Robustness Under Adversarial Disguise
Context: How does accuracy drop when targets actively try to hide their identity? Lower drop indicates higher robustness.
๐ฏ Operational Integration & Impact
Section Purpose: A successful PhD must demonstrate a novel, significant contribution. This section outlines how Dr. Li's research successfully translates from academic theory into practical, deployable tools for MI6 and GCHQ workflows.
Real-Time Triage
HSTAR was validated for automatic flagging of high-probability target matches in streaming, short-burst intercepted communications.
Explainable Intelligence
The SHAP layer successfully provided analysts with human-readable reasoning, proven essential for building legally sound intelligence dossiers.
Ongoing Lab Work
Under Dr. Li's direction at Phaenarete ASI Lab, the framework is being expanded to cross-lingual attribution and wider network metadata analysis.
๐ Thesis Format & Compliance
Section Purpose: Confirms alignment with the University of Edinburgh's strict formatting and submission guidelines, while maintaining the confidentiality required by the research sponsor during the final defense.
-
โ๏ธ
Electronic Submission: As per University policy, the final approved thesis was submitted digitally via ERA (Edinburgh Research Archive).
-
โ๏ธ
Sensitivity Handling: The final title page bears the mandated โRestricted Access โ Approved by University Committeeโ note. No real classified intelligence data was used; all data was synthetic or open-source.
-
โ๏ธ
Bibliography & Contents: Maintained consistent author-date formatting (APA 7th). Full table of contents followed the abstract, including a list of tables and figures, strictly adhering to the formatting guidance document.