Skip to content

Commit b17d424

Browse files
authored
Merge pull request #183 from boudinfl/v2.0
V2.0
2 parents f651015 + 67b46dc commit b17d424

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+642
-137444
lines changed

.gitignore

-1
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,4 @@ docs/build/html/_sources/*
55
dev/
66
.pytest_cache/
77
/venv/
8-
/venv2/
98
\.idea/

.travis.yml

-2
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@ python:
55
# command to install dependencies
66
install:
77
- pip install -r requirements.txt
8-
- python -m nltk.downloader stopwords
9-
- python -m nltk.downloader universal_tagset
108
- python -m spacy download en_core_web_sm
119
- python -m pip install cython
1210
- python -m pip install git+https://github.com/epfml/sent2vec

README.md

+20-26
Original file line numberDiff line numberDiff line change
@@ -25,16 +25,6 @@ To pip install `pke` from github:
2525
pip install git+https://github.com/boudinfl/pke.git
2626
```
2727

28-
`pke` also requires external resources that can be obtained using:
29-
30-
```bash
31-
python -m nltk.downloader stopwords
32-
python -m nltk.downloader universal_tagset
33-
python -m spacy download en_core_web_sm # download the english model
34-
```
35-
36-
As of April 2019, `pke` only supports Python 3.6+.
37-
3828
## Minimal example
3929

4030
`pke` provides a standardized API for extracting keyphrases from a document.
@@ -47,9 +37,9 @@ import pke
4737
# initialize keyphrase extraction model, here TopicRank
4838
extractor = pke.unsupervised.TopicRank()
4939

50-
# load the content of the document, here document is expected to be in raw
51-
# format (i.e. a simple text file) and preprocessing is carried out using spacy
52-
extractor.load_document(input='/path/to/input.txt', language='en')
40+
# load the content of the document, here document is expected to be a simple
41+
# test string and preprocessing is carried out using spacy
42+
extractor.load_document(input='text', language='en')
5343

5444
# keyphrase candidate selection, in the case of TopicRank: sequences of nouns
5545
# and adjectives (i.e. `(Noun|Adj)*`)
@@ -67,29 +57,33 @@ A detailed example is provided in the [`examples/`](examples/) directory.
6757

6858
## Getting started
6959

70-
Tutorials and code documentation are available at
71-
[https://boudinfl.github.io/pke/](https://boudinfl.github.io/pke/).
60+
To get your hands dirty with `pke`, we invite you to try our tutorials out.
61+
62+
| Name | Link |
63+
| ---------------------------------------------- | ---------- |
64+
| Getting started with `pke` and keyphrase extraction | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-1-graph-based-keyphrase-extraction.ipynb) |
65+
| Model parameterization | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-2-parameterization.ipynb) |
66+
| Benchmarking models | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keyphrasification/hands-on-with-pke/blob/main/part-3-benchmarking-models.ipynb) |
7267

7368
## Implemented models
7469

7570
`pke` currently implements the following keyphrase extraction models:
7671

7772
* Unsupervised models
7873
* Statistical models
79-
* TfIdf [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#tfidf)]
80-
* KPMiner [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#kpminer), [article by (El-Beltagy and Rafea, 2010)](http://www.aclweb.org/anthology/S10-1041.pdf)]
81-
* YAKE [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#yake), [article by (Campos et al., 2020)](https://doi.org/10.1016/j.ins.2019.09.013)]
74+
* FirstPhrases
75+
* TfIdf
76+
* YAKE [(Campos et al., 2020)](https://doi.org/10.1016/j.ins.2019.09.013)
8277
* Graph-based models
83-
* TextRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#textrank), [article by (Mihalcea and Tarau, 2004)](http://www.aclweb.org/anthology/W04-3252.pdf)]
84-
* SingleRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#singlerank), [article by (Wan and Xiao, 2008)](http://www.aclweb.org/anthology/C08-1122.pdf)]
85-
* TopicRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#topicrank), [article by (Bougouin et al., 2013)](http://aclweb.org/anthology/I13-1062.pdf)]
86-
* TopicalPageRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#topicalpagerank), [article by (Sterckx et al., 2015)](http://users.intec.ugent.be/cdvelder/papers/2015/sterckx2015wwwb.pdf)]
87-
* PositionRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#positionrank), [article by (Florescu and Caragea, 2017)](http://www.aclweb.org/anthology/P17-1102.pdf)]
88-
* MultipartiteRank [[documentation](https://boudinfl.github.io/pke/build/html/unsupervised.html#multipartiterank), [article by (Boudin, 2018)](https://arxiv.org/abs/1803.08721)]
78+
* TextRank [(Mihalcea and Tarau, 2004)](http://www.aclweb.org/anthology/W04-3252.pdf)
79+
* SingleRank [(Wan and Xiao, 2008)](http://www.aclweb.org/anthology/C08-1122.pdf)
80+
* TopicRank [(Bougouin et al., 2013)](http://aclweb.org/anthology/I13-1062.pdf)
81+
* TopicalPageRank [(Sterckx et al., 2015)](http://users.intec.ugent.be/cdvelder/papers/2015/sterckx2015wwwb.pdf)
82+
* PositionRank [(Florescu and Caragea, 2017)](http://www.aclweb.org/anthology/P17-1102.pdf)
83+
* MultipartiteRank [(Boudin, 2018)](https://arxiv.org/abs/1803.08721)
8984
* Supervised models
9085
* Feature-based models
91-
* Kea [[documentation](https://boudinfl.github.io/pke/build/html/supervised.html#kea), [article by (Witten et al., 2005)](https://www.cs.waikato.ac.nz/ml/publications/2005/chap_Witten-et-al_Windows.pdf)]
92-
* WINGNUS [[documentation](https://boudinfl.github.io/pke/build/html/supervised.html#wingnus), [article by (Nguyen and Luong, 2010)](http://www.aclweb.org/anthology/S10-1035.pdf)]
86+
* Kea [(Witten et al., 2005)](https://www.cs.waikato.ac.nz/ml/publications/2005/chap_Witten-et-al_Windows.pdf)
9387

9488
## Citing pke
9589

examples/2.txt

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Waiting for the wave to crest [wavelength services]
2+
Wavelength services have been hyped ad nauseam for years. But despite their
3+
quick turn-up time and impressive margins, such services have yet to
4+
live up to the industry's expectations. The reasons for this lukewarm
5+
reception are many, not the least of which is the confusion that still
6+
surrounds the technology, but most industry observers are still
7+
convinced that wavelength services with ultimately flourish

0 commit comments

Comments
 (0)