This tool analyzes political speeches for patterns using the Fireworks AI API. It currently processes speech transcripts to identify and categorize instances where speakers attempt to undermine the legitimacy of institutions, individuals, or groups. But this can be easily changed by changing the prompt template.
- Analyze individual speeches or entire datasets
- Support for both sample and comprehensive analysis
- Configurable prompt templates via YAML
- Detailed JSON output with confidence scoring
- Robust error handling and logging
- Support for processing multiple CSV files
- Python 3.7+
- Fireworks AI API key
- Required Python packages (install via
pip install -r requirements.txt
):- pandas
- requests
- pyyaml
- tqdm
- Clone the repository:
git clone [repository-url]
cd [repository-name]
- Install dependencies:
pip install pandas requests pyyaml tqdm
- Set up your Fireworks API key:
export FIREWORKS_API_KEY="your-api-key-here"
The script supports the following command-line arguments:
Argument | Default | Description |
---|---|---|
--csv_path |
data/apb/apb_speeches.csv |
Path to CSV file(s) containing speeches |
--prompt_path |
prompts/delegitimatie.yaml |
Path to YAML file with prompt template |
--output_dir |
analysis_output |
Directory to save analysis results |
--sample |
0 |
Number of random speeches to analyze (0 for full analysis) |
--min-length |
50 |
Minimum text length of a speech to analyze |
Run a sample analysis of 10 speeches:
python apb_analysis.py --sample 10 --min-length 100
Analyze all speeches in a specific CSV:
python apb_analysis.py --csv_path path/to/speeches.csv --output_dir results
The analysis generates a JSON file with the following structure:
{
"status": "success",
"metadata": {
"min_length": 50,
"requested_samples": 10,
"successful_analyses": 8
},
"analyses": {
"speech_id": {
"speaker_name": "...",
"speaker_party": "...",
"date": "YYYY-MM-DD",
"text_length": 1234,
"source_file": "speeches_2023.csv",
"analysis": {
"gevonden_delegitimatie": [...],
"samenvatting": {...}
}
}
}
}
The analyzer currently identifies four types of delegitimation:
breakdown_communication
: Abrupt termination of debatediscrediting_information
: Unfounded doubt about facts/expertsdemonisering
: Distortion or exaggeration of positionsbedreiging
: Threats of action against persons/groups
Target groups include:
- Parliament
- Politicians
- Judiciary
- Media
- Civil servants
- Minority groups
The prompt template (delegitimatie.yaml
) defines:
- Analysis instructions
- Delegitimation definitions
- Output format requirements
- Confidence scoring guidelines
Modify the template to adjust analysis criteria or output format.
The FireworksProcessor
class handles API communication with configurable parameters:
- Maximum tokens
- API timeout
- Parse timeout
- Model selection
- Temperature and sampling parameters
The tool includes comprehensive error handling for:
- File loading issues
- API communication errors
- JSON parsing problems
- Timeout scenarios
All errors are logged with detailed information for debugging.
The work is licensed under GPLv3
Open an issue, submit a pull request or contact the developer [email protected]