Publish Date: Jun 10, 2018 Last Update: Dec 1, 2018
RavenPack is a leading data provider for media data. The goal of this post is not to provide a comprehensive description of the data; I refer readers to the product page or the data manuals on WRDS for more details. Instead, this post provides some summary statistics that help the readers to replicate the empirical results in my research, “Beyond attention: the causal effect of media on information production”.
2. Data validation
2.1 Raw data
The first step is to make sure that you have correctly parsed the raw data from RavenPack. In my research, I use two datasets from RavenPack: the Dow Jones Edition and the PR Edition. The DJ Edition covers 2000 to 2017, while the PR Edition covers 2004 to 2017. Below find the number of observations for this raw data.
Table I - number of observations by year and database
|year||DJ Edition||PR Edition|
2.2 Data cleaning
My research uses an event-study approach where each event is a corporate press release. To construct the samples for analysis, I impose the following data filtering procedures
Table II - data filters for press release data
|Data filters||# of obs||# of firms|
|Keep if RELEVANCE = 100 and in CRSP/Compustat||1,620,046||9,406|
|Keep Top 4 press release wires||1,502,021||9,386|
|Remove duplicated releases (ENS = 100)||1,068,148||9,373|
|Keep only one press release per firm-day||909,874||9,368|
|Keep only trading days||901,774||9,366|
|Keep if after April 1, 2006||738,196||8,756|
|Keep if issued in the first 30 seconds of an hour||188,981||7,911|
|Keep if issued in the first 10 seconds of an hour||131,683||7,503|
|Keep if issued in 7AM-9AM or 4PM||80,246||6,560|
I will post the detailed code for this data construction process after the publication process. Stay tuned if you are interested.
3. Summary statistics
Finally, below are the summary statistics of the press release and news