RavenPack Data

Publish Date: Jun 10, 2018    Last Update: Dec 1, 2018

1. Introduction

RavenPack is a leading data provider for media data. The goal of this post is not to provide a comprehensive description of the data; I refer readers to the product page or the data manuals on WRDS for more details. Instead, this post provides some summary statistics that help the readers to replicate the empirical results in my research, “Beyond attention: the causal effect of media on information production”.

2. Data validation

2.1 Raw data

The first step is to make sure that you have correctly parsed the raw data from RavenPack. In my research, I use two datasets from RavenPack: the Dow Jones Edition and the PR Edition. The DJ Edition covers 2000 to 2017, while the PR Edition covers 2004 to 2017. Below find the number of observations for this raw data.

Table I - number of observations by year and database

year DJ Edition PR Edition
2000 4,491,341 -
2001 6,508,892 -
2002 4,730,005 -
2003 5,300,516 -
2004 8,692,589 1,573,859
2005 6,726,687 1,838,488
2006 7,199,716 1,998,733
2007 8,520,039 2,151,973
2008 8,979,586 2,403,572
2009 8,276,870 2,256,803
2010 8,051,297 2,781,074
2011 8,383,248 3,069,316
2012 7,979,823 3,336,613
2013 6,548,356 3,330,956
2014 6,527,539 3,325,558
2015 6,095,920 3,113,909
2016 6,249,422 3,272,910
2017 6,411,331 3,374,352

2.2 Data cleaning

My research uses an event-study approach where each event is a corporate press release. To construct the samples for analysis, I impose the following data filtering procedures

Table II - data filters for press release data

Data filters # of obs # of firms
Keep if RELEVANCE = 100 and in CRSP/Compustat 1,620,046 9,406
Keep Top 4 press release wires 1,502,021 9,386
Remove duplicated releases (ENS = 100) 1,068,148 9,373
Keep only one press release per firm-day 909,874 9,368
Keep only trading days 901,774 9,366
Keep if after April 1, 2006 738,196 8,756
Keep if issued in the first 30 seconds of an hour 188,981 7,911
Keep if issued in the first 10 seconds of an hour 131,683 7,503
Keep if issued in 7AM-9AM or 4PM 80,246 6,560

I will post the detailed code for this data construction process after the publication process. Stay tuned if you are interested.

3. Summary statistics

Finally, below are the summary statistics of the press release and news