GitHub - notnews/msnbc_transcripts: MSNBC Transcripts: 2003--2022

MSNBC Transcripts: 2010--2022

We scraped https://www.msnbc.com/transcripts to get all the transcripts from 2010--2021.

year	 n_transcripts
2010      43
2011     115
2012     205
2013     175
2014     217
2015     986
2016     907
2017    1185
2018    1468
2019    1475
2020    1286
2021    1476
2022     131

When I scraped in 03/2025, I got the following (so essentially 2022)

year
2017       2
2020     703
2021    1479
2022    1156
2023      52
2024      48
2025      11

Scripts

Scrape
Quick Peek
[Upload to Dataverse][scripts/upload_to_dataverse.ipynb]

Data

The final data posted on the Harvard Dataverse includes 16k scripts spanning 2003--2014 that were scraped earlier. The data scraped in 2025 is stored under msnbc_transcripts_2022.csv.gz

The data are posted at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FUPJDE1

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MSNBC Transcripts: 2010--2022

Scripts

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

notnews/msnbc_transcripts

Folders and files

Latest commit

History

Repository files navigation

MSNBC Transcripts: 2010--2022

Scripts

Data

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages