Posts Tagged

Feature selection

Estimating the Effect of Feature Selection in Computational Text Analysis

Niels Goet / April 26, 2017 /

Below, I discuss and analyse pre-processing decisions in relation to an often-used application of text analysis: scaling. Here, I’ll be using a new tool, called preText (for R statistical software), to investigate the potential effect of different pre-processing options on our estimates. Replication material for this post may be found on my GitHub page. Feature Selection and Scaling Scaling algorithms rely on the bag-of-words (BoW) assumption, i.e. the idea that we can reduce text to individual words and sample them independently from a “bag” and still get some meaningful insights from the relative distribution of words across a corpus. For the demonstration below, I’ll be using the same selection of campaign speeches from one of my earlier blog posts, in which I used a …

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
connect.sid	1 day	This cookie is used for authentication and for secure log-in. It registers the log-in information.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_69029762_1	1 minute	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr	3 months	The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
NID	6 months	This cookie is used to a profile based on user's interest and display personalized ads to the users.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.

Cookie	Duration	Description
CONSENT	16 years 8 months 26 days 14 hours	No description
lang		This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
yt-remote-connected-devices	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos
yt-remote-device-id	never	This cookie is set by Youtube and stores user video player preferences for embedded YouTube videos

OxPol blog to discontinue from the start of the 2024-2025 academic year

Social Media: The Creative Destruction of Pakistani Politics

Mexico’s President-elect Claudia Sheinbaum and migration: What will change and what will remain the same?

The 2015 EU “Refugee Crisis” : Analysing the IOM’s Information Campaign in Senegal

Academic Publishing Guidance for Early Career Scholars: Insights from an Editors’ Roundtable

Inclusive Democracy: Labour Party’s Election Manifesto for Reducing the Voting Age to 16

From Research to Policy: A 5-Step Guide to Effective Engagement with Policymakers and Increased Visibility of Research

Small States shaping EU Policies: Latvia and the Disinformation Threat

OxPol blog to discontinue from the start of the 2024-2025 academic year

Social Media: The Creative Destruction of Pakistani Politics

Mexico’s President-elect Claudia Sheinbaum and migration: What will change and what will remain the same?

The 2015 EU “Refugee Crisis” : Analysing the IOM’s Information Campaign in Senegal

Academic Publishing Guidance for Early Career Scholars: Insights from an Editors’ Roundtable

Inclusive Democracy: Labour Party’s Election Manifesto for Reducing the Voting Age to 16

From Research to Policy: A 5-Step Guide to Effective Engagement with Policymakers and Increased Visibility of Research

Small States shaping EU Policies: Latvia and the Disinformation Threat

OxPol blog to discontinue from the start of the 2024-2025 academic year

Social Media: The Creative Destruction of Pakistani Politics

Mexico’s President-elect Claudia Sheinbaum and migration: What will change and what will remain the same?

The 2015 EU “Refugee Crisis” : Analysing the IOM’s Information Campaign in Senegal

Academic Publishing Guidance for Early Career Scholars: Insights from an Editors’ Roundtable

Inclusive Democracy: Labour Party’s Election Manifesto for Reducing the Voting Age to 16

From Research to Policy: A 5-Step Guide to Effective Engagement with Policymakers and Increased Visibility of Research

Small States shaping EU Policies: Latvia and the Disinformation Threat

Feature selection

Estimating the Effect of Feature Selection in Computational Text Analysis

Estimating the Effect of Feature Selection in Computational Text Analysis