Skip to content

RobJavVar/PMU_sourceSpecificScraper_publicTemplate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

PMU_sourceSpecificScraper_publicTemplate

Project Category: Partisianship_Messaging_Understanding
Mission: develop news-source specific article scrapers.
Developers: Roberto Vargas, PhD
Note: This is a generic template. Source specific templates are currently private. Please email [email protected] for further inquiries.

Overview:
[Insert .py file]: This .py script is meant to executed in parallel with other source-specific scrapers as a batch throughout the day. This template is deigned to be run every 7 hours.

The first execution create a temporary "day" file. With each subsequent execution the day file is amended to include new article information.
The last execution of the .py file exports the day file to a master file for the specific news source.

The following information is stored from each article:
article_link: The url for the article
header: The article header
pub_date: The publication date of the article
article_text: The full scraped text from the article

ExampleOutput:

About

Project Category: Partisianship_Messaging_Understanding | Mission: develop news-source specific article scrapers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published