-
Notifications
You must be signed in to change notification settings - Fork 3
Correlations
An interesting type of data analysis includes identifying dependencies between variables. For two continuous variables we can, for example, compute correlation between them as an estimate of the dependency between them. Each concept in Event Registry can be seen as a time series where the value on a particular day corresponds to the number of articles that we collected that mention the concept. Given any input time series we can then compute which concepts are correlating the most with the provided input time series.
To compute what things in Event Registry correlate the most with a time series we can use the GetTopCorrelations()
class.
import { EventRegistry, GetTopCorrelations, QueryArticles } from "eventregistry";
const er = new EventRegistry({apiKey: "YOUR_API_KEY"});
const corr = new GetTopCorrelations(er);
Depending on what you want to use as the input time series, you have three options - (a) loading a time series of a concept/category, (b) loading a time series based on an article query, or (c) providing your own data.
Input time series based on a concept/category from Event Registry
To load a time series of a concept or a category, we can simply use the GetCounts()
class.
er.getConceptUri("Obama").then(() => {
const counts = new GetCounts();
corr.loadInputDataWithCounts(counts);
})
Input time series based on an article query
You can also form an article query using different set of conditions. The resulting set of articles also forms a time series that can be used as the input time series. In the bottom example we would find all articles that mention keyword "iphone" and use the obtained time series as the input data.
const query = new QueryArticles({ keywords: "iphone" });
corr.loadInputDataWithQuery(query)
Input time series based on the users input
The user is also able to provide his own input data. The data can be provided by calling the setCustomInputData()
method where the argument is expected to be a list of python tuples, containing date and count values.
const query = new QueryArticles({ keywords: "iphone" })
corr.setCustomInputData([("2015-01-01", 213), ("2015-01-02", 13), ("2015-01-03", 423), ...])
Once the user in some way provides the input data, we can compute the things that correlate the most with input data. Depending on the interests, the user can compute the correlations with either concepts or categories.
To compute top correlations with concepts, getTopConceptCorrelations()
method can be called:
const conceptInfo = corr.getTopConceptCorrelations({
conceptType: ["person", "org"],
exactCount: 10,
approxCount: 100,
});
The method arguments are as follows:
-
candidateConceptsQuery
: optional. An instance ofQueryArticles
that can be used to limit the space of concept candidates -
candidatesPerType
: IfcandidateConceptsQuery
is provided, then this number of concepts for each valid type will be return as candidates -
conceptType
: optional. A string or an array containing the concept types that are valid candidates on which to compute top correlations. Valid values areperson
,org
,loc
and/orwiki
-
exactCount
: the number of returned concepts for which the exact value of the correlation is computed -
approxCount
: the number of returned concepts for which only an approximate value of the correlation is computed -
returnInfo
: specifies the details about the concepts that should be returned in the output result
Alternatively, one can compute the list of categories that correlate the most with the input data. For this purpose, the getTopCategoryCorrelations
should be called:
const categoryInfo = corr.getTopCategoryCorrelations({
exactCount: 10,
approxCount: 100,
})
The method arguments are as follows:
-
exactCount
: the number of returned categories for which the exact value of the correlation is computed -
approxCount
: the number of returned categories for which only an approximate value of the correlation is computed -
returnInfo
: specifies the details about the categories that should be returned in the output result
Core Information
Usage tracking
Terminology
EventRegistry
class
ReturnInfo
class
Data models for returned information
Finding concepts for keywords
Filtering content by news sources
Text analytics
Semantic annotation, categorization, sentiment
Searching
Searching for events
Searching for articles
Article/event info
Get event information
Get article information
Other
Supported languages
Feed of new articles/events
Social media shares
Daily trends
Correlations
Mentions in news or social media
Find event for your own text
Article URL to URI mapping