ANEP: Accurate Name Extraction from News Video Graphics

This repository contains the full implementation of ANEP (Accurate Name Extraction Pipeline), a hybrid Deep Learning (DL) and Generative AI (GenAI) system for extracting personal names from graphical overlays in broadcast and social media news videos.

📚 Table of Contents

Click-to-View

Project Overview
Key Features
Dataset & Model (Roboflow)
Getting Started
Dissertation
Contribution
License
Acknowledgments
Contact

📌 Project Overview

In today’s fast-paced digital news ecosystem, crucial information, such as the names of individuals featured in stories, is often displayed visually through broadcast graphics rather than spoken aloud. These elements appear in the form of lower-thirds, tickers, headlines, and other on-screen text overlays. However, their inconsistent styles, short display times, and frequent visual clutter make automated extraction of names a highly challenging task.

This project addresses that challenge through a novel, two-pronged solution for accurate name extraction from news video graphics:

ANEP Pipeline
A custom-built pipeline that integrates:
- YOLOv12 for detecting news-related graphical elements (e.g. headlines, tickers, etc.).
- Tesseract OCR with advanced preprocessing (CLAHE, thresholding, de-noising) to extract text from detected regions.
- Transformer-based Named Entity Recognition (NER) models (e.g. BERT, spaCy + GliNER) to identify and validate personal names in noisy OCR output.
- Clustering techniques to consolidate name variants and generate structured appearance timelines.
GenAI Pipelines
Parallel pipelines built using:
- Google Cloud Vision API for high-accuracy OCR,
- Gemini 1.5 Pro and LLaMA 4 Maverick, two powerful large multimodal models capable of extracting and reasoning over names directly from video frames.
These models are evaluated as alternatives to classical CV-NLP pipelines, with a focus on name extraction accuracy, runtime performance, and robustness to visual noise.

By combining traditional deep learning (DL) with cutting-edge GenAI, this project contributes a robust, scalable system for extracting names from video media, with direct applications in media monitoring, automated news summarisation, and AI-based fact-checking.

ANEP Architecture Overview

%%{init: {
  "themeVariables": {
    "fontSize":       "16px",
    "edgeLabelFontSize": "14px",
    "edgeLabelColor": "#37474F"
  }
}}%%

flowchart TB
  %% darker text shades on same fills
  classDef user      fill:#BBDEFB,stroke:#1976D2,stroke-width:2px,color:#0D47A1;
  classDef process   fill:#C8E6C9,stroke:#2E7D32,stroke-width:2px,color:#1B5E20;
  classDef datastore fill:#FFECB3,stroke:#FFA000,stroke-width:2px,color:#EF6C00;

  %% nodes
  User[User]:::user
  SM((Select Model)):::process
  UV((Upload Video)):::process
  D1[(D1: Uploaded Video)]:::datastore
  CS((Confirm Settings)):::process
  RA((Run Analysis)):::process
  Backend[Backend API]:::user
  D3[(D3: NGD)]:::datastore
  D2[(D2: Analysis Results)]:::datastore
  VR((View Results)):::process

  %% flows
  User -->|Model selection| SM
  User -->|Video file| UV

  UV -->|Video + metadata| D1

  D1 -->|Video metadata| CS
  SM -->|Selected model ID| CS

  CS -->|Confirmed settings| RA
  D1 -->|Video file| RA

  RA -->|Video + model ID| Backend

  Backend -->|Training/inference data| D3
  Backend -->|Processed results| D2

  D2 -->|Extracted names,<br>timestamps,<br>confidence scores| VR
  Backend -->|Log/progress stream| VR

  User -->|Downloaded results| VR

🔎 Key Features

Intelligent Frame Sampling & Deduplication
Efficiently processes long videos using perceptual hashing (DCT, ORB) to identify and retain only visually distinct frames, reducing redundancy while preserving key content.
YOLOv12-based Graphic Detection
Fine-tuned YOLOv12 model trained on a custom dataset detects six distinct classes of broadcast graphics: Breaking News, Digital On-Screen Graphics, Lower Thirds, Headlines, News Tickers, and Other Graphics.
Custom Annotated Dataset: NGD (News Graphics Dataset)
Purpose-built dataset containing 1,500+ annotated frames sourced from local and international news videos, across six classes: Breaking News, Lower Thirds, News Ticker, Digital On-Screen Graphics, Headlines, and Other.
OCR with Adaptive Preprocessing
Applies multi-method image preprocessing (CLAHE, thresholding, morphological operations, noise reduction) to maximise text clarity prior to recognition. Tesseract OCR is used with confidence scoring.
Named Entity Recognition (NER)
Combines spaCy with GLiNER (for zero-shot multilingual NER) and a fine-tuned Transformer model to identify real-world person names from noisy OCR text. Includes heuristic and linguistic validation.
Name Clustering & Deduplication
Clusters name variants using fuzzy string matching, token-based distance (Jaccard), and embedding-based cosine similarity to generate accurate, canonical name lists and appearance timelines.
GenAI Integration
Alternative pipelines using:
- Google Cloud Vision API + Gemini 1.5 Pro
- LLaMA 4 Maverick via OpenRouter
These systems extract names directly from video frames using multimodal reasoning and structured prompts.
Survey Dashboard & Evaluation Metrics
Includes a dedicated visualisation dashboard for survey findings on news consumption trends. Evaluation metrics include precision, recall, F1-score, and runtime comparisons across pipelines.
Progressive Web App (PWA)
Fully featured frontend built with React, Tailwind CSS, and Vite. Provides a clean, step-by-step UI for uploading videos, selecting models, and visualising extracted results.

🎯 Object Detection Performance (YOLO Models)

Model	Precision	Recall	[email protected]	[email protected]:0.95	Epochs	Type
YOLOv12(m) 🥇	`93.9%`	`93.5%`	`95.8%`	`88.7%`	102	Local
YOLOv8(m)	`92.6%`	`86.9%`	`93.7%`	`75.2%`	47	Local
YOLOv12(n) 🥈	`91.6%`	`90.8%`	`93.8%`	`85.4%`	120	Cloud
YOLOv11(n)	`91.2%`	`90.4%`	`93.1%`	`84.9%`	100	Cloud
YOLOv12(n) Reflect	`91.4%`	`85.7%`	`91.8%`	`80.4%`	72	Cloud
YOLO-NAS(n)	`85.1%`	`84.3%`	`91.0%`	`61.0%`	51	Cloud

🔍 Name Extraction Performance

Pipeline	Speed	Status
GVA + Gemini 1.5 🥇	94.68s ⚡	Production
ANEP Pipeline 🥈	542.15s 🐢	Explainable
LLaMA 4 Maverick 🥉	140.18s ⏱️	Experimental

📈 Performance Overview

graph LR
    A[Speed] -->|94.68s| B[GVA + Gemini]
    C[Accuracy] -->|82.22%| B
    D[Explainability] -->|High| E[ANEP]
    F[Balance] -->|68.10%| E
    G[Simplicity] -->|55.56%| H[LLaMA 4]
    I[Cost] -->|Low| H

    style B fill:#2ECC71,stroke:#27AE60,stroke-width:2px,color:#FFF
    style E fill:#F39C12,stroke:#E67E22,stroke-width:2px,color:#FFF
    style H fill:#E74C3C,stroke:#C0392B,stroke-width:2px,color:#FFF

🔐 Ethics & Data Usage

All data used in the NGD is sourced from publicly available news footage under fair use for research purposes.
No private personal data is collected or stored.
The system is NOT intended for surveillance or use in sensitive political contexts.

📊 Dataset & Model (Roboflow)

Explore the News Graphics Dataset (NGD) and experiment with the fine-tuned YOLOv12 model directly on Roboflow.

🚀 Getting Started

Prerequisites

Python 3.10+
Node.js 12+
CUDA-capable GPU (recommended)

Clone Repository

git clone https://github.com/AFLucas-UOM/Accurate-Name-Extraction
cd Accurate-Name-Extraction

Backend Configuration (`config.json`)

To enable the GenAI-based pipelines, create a config.json file inside the 6. GenAI API/ folder:

{
  "google_cloud_vision_api_key": "your-google-vision-api-key",
  "google_gemini_api_key": "your-gemini-api-key",
  "openrouter_api_key": "your-openrouter-api-key"
}

⚠️ Important: Never commit your API keys to GitHub.
Ensure that config.json is added to your .gitignore to keep sensitive credentials secure.

🎓 Dissertation

The full dissertation, containing methodology, evaluation, and survey results, is included in the 7. Documentation/ folder.

📄 Download PDF

📘 Citation

If you use the News Graphic Dataset (NGD) or the ANEP in your research, please cite the following:

📂 News Graphic Dataset (NGD)

@dataset{news_graphic_dataset,
  title     = {News Graphic Dataset (NGD)},
  type      = {Open Source Dataset},
  author    = {Andrea Filiberto Lucas, Dylan Seychell},
  year      = {2025},
  publisher = {Roboflow},
  howpublished = {\url{https://universe.roboflow.com/ict3909-fyp/news-graphic-dataset}},
  url       = {https://universe.roboflow.com/ict3909-fyp/news-graphic-dataset}
}

🎓 Dissertation

@thesis{lucas2025anep,
  title     = {Accurate Name Extraction from News Video Graphics},
  author    = {Andrea Filiberto Lucas, Dylan Seychell},
  year      = {2025},
  school    = {University of Malta},
  type      = {B.Sc. (Hons.) Dissertation}
}

✨ Contribution

Contributions to improve the code, add new features, or optimize model performance are welcome! Fork the repository, make your changes, and submit a pull request.

🪪 License

This project is licensed under the MIT License. See the LICENSE file for details.

🙏🏻 Acknowledgments

This project was developed as part of the ICT3909 Final Year Project course at the University of Malta, and submitted in partial fulfilment of the requirements for the B.Sc. (Hons.) in Information Technology (Artificial Intelligence). Supervised by Dr. Dylan Seychell.

✉️ Contact

For questions, collaboration, or feedback, please contact Andrea Filiberto Lucas

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
1. Dataset Downloader		1. Dataset Downloader
2. Dataset Visualisation		2. Dataset Visualisation
3. Yolo Training		3. Yolo Training
4. ANEP		4. ANEP
5. ANEP UI		5. ANEP UI
6. GenAI API		6. GenAI API
7. Documentation		7. Documentation
Other		Other
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.command		run.command

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ANEP: Accurate Name Extraction from News Video Graphics

📚 Table of Contents

📌 Project Overview

ANEP Architecture Overview

🔎 Key Features

🎯 Object Detection Performance (YOLO Models)

🔍 Name Extraction Performance

📈 Performance Overview

🔐 Ethics & Data Usage

📊 Dataset & Model (Roboflow)

🚀 Getting Started

Prerequisites

Clone Repository

Backend Configuration (`config.json`)

🎓 Dissertation

📘 Citation

📂 News Graphic Dataset (NGD)

🎓 Dissertation

✨ Contribution

🪪 License

🙏🏻 Acknowledgments

✉️ Contact

About

Uh oh!

Uh oh!

Languages

License

AFLucas-UOM/Accurate-Name-Extraction

Folders and files

Latest commit

History

Repository files navigation

ANEP: Accurate Name Extraction from News Video Graphics

📚 Table of Contents

📌 Project Overview

ANEP Architecture Overview

🔎 Key Features

🎯 Object Detection Performance (YOLO Models)

🔍 Name Extraction Performance

📈 Performance Overview

🔐 Ethics & Data Usage

📊 Dataset & Model (Roboflow)

🚀 Getting Started

Prerequisites

Clone Repository

Backend Configuration (config.json)

🎓 Dissertation

📘 Citation

📂 News Graphic Dataset (NGD)

🎓 Dissertation

✨ Contribution

🪪 License

🙏🏻 Acknowledgments

✉️ Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

Backend Configuration (`config.json`)