Project Category: Partisianship_Messaging_Understanding
Mission: develop news-source specific article scrapers.
Developers: Roberto Vargas, PhD
Note: This is a generic template. Source specific templates are currently private. Please email [email protected] for further inquiries.
Overview:
[Insert .py file]: This .py script is meant to executed in parallel with other source-specific scrapers as a batch throughout the day. This template is deigned to be run every 7 hours.
The first execution create a temporary "day" file. With each subsequent execution the day file is amended to include new article information.
The last execution of the .py file exports the day file to a master file for the specific news source.
The following information is stored from each article:
article_link: The url for the article
header: The article header
pub_date: The publication date of the article
article_text: The full scraped text from the article
ExampleOutput: