overview
What? Research teams (RTs) are solicited to participate in a project examining the variability of empirical finance results. In particular, the project focuses on the variablity in results when the same data are analyzed by different research teams. As there are various choices to make along the way, results are likely to show some heterogeneity. The variation in results will allow for valuable metascientific insights.
All RTs will be given access to 720 million trade records pertaining to 2002-2018 trading in the most active European index futures contract: EuroStoxx 50 futures. What sets this data apart from other run-of-the-mill trade data is that, for each trade, one observes whether it was an agency or principal trade. In other words, one learns if the trade was for an exchange member's own account or for a client. Each RT will freely analyze the data using their own methodology and will provide effect sizes and standard errors regarding several specific a-priori hypotheses. All RTs will be anonymized prior to reporting or subsequent sharing of the submitted results.
In particular, in Stage 1, each RT will submit a “short paper” summarizing their methodology and results (including the estimated effect size and its standard error for each of the hypotheses). In Stage 2, each short paper will be evaluated by two anonymous external researchers (peer evaluators, who are not RTs) who will rate the quality of the papers and provide feedback for improvements, and the RTs will submit a revised version of the short paper after receiving the two peer evaluations. In Stage 3, all RTs will be asked to read the 5 short papers with the highest average peer evaluations (as rated after Stage 1), and thereafter submit another revised version of their short paper.
When applying to participate, RTs will fill out a survey with background information about the team. After reporting their Stage 1 results, they will fill out an incentivized survey about their beliefs about the variation in Stage 1 results across RTs for each hypothesis. Sign-up as a research team! [sign-up closed: see Overview for details]
We expect that the total workload for research teams participating in the project will be between two and three weeks.
When? The project coordinators recruited research teams and peer evaluators between October 2020 and December 2020. Research teams will analyze the data starting in January 2021. The overall project will last until June 2021. For a detailed schedule of the project, please refer to the schedule.
Why participate? In addition to being part of a fascinating landmark project, all members of all RTs will be listed as co-authors on the final paper. They will become co-author of a paper that targets publication in a top scientific journal. The organizers of this project pulled it off once before in neuroscience which yielded a 2020 publication in Nature (article). The current project improves the previous design by adding peer feedback and involves the cooperation of one of the world's most successful exchanges as data sponsor.
Who? RTs consist of one or two participants. At least one of the members of the RT has to hold a PhD in finance or economics. The team should be sufficiently skilled in empirical finance, should have an understanding of market liquidity, and should be familiar with the analysis of large datasets. RTs need to apply to participate by filling out a brief survey about background characteristics and expertise in empirical finance and market liquidity. The project coordinators will decide whether the RT is sufficiently qualified to participate. After an RT is invited to participate, an agreement is signed where the project coordinators pledge to ensure anonymity (i.e., not revealing the identity of RTs and peer evaluators to anyone outside the project coordination team and the external researchers evaluating the short papers) and RTs promise to honor the non-disclosure agreement (NDA), to keep their analysis and report and all data pertaining to the project confidential, and to delete data within one year after receiving the data and send a confirmation email when done. Peer evaluators will also have to honor the NDA.
Contact. In case you have any questions, please contact the project coordinators via info@fincap.academy.
about the data
The data pertain to 17 years (2002-2018) of trading of EuroStoxx 50 futures, which are among the world’s most actively traded index derivatives. They give investors exposure to “Europe,” or, more precisely, to a basket of euro-area blue-chip equities. All trading is done through an electronic limit-order book (see, e.g., Parlour and Seppi, 2008). Please find more background information on the futures in this factsheet.
The data consist of 720 million trade records and will be made available in monthly gzipped semicolon separated text files (“csv”). Each zipped monthly file is no larger than 50 MB. The data is clean in the sense that for all files the format is identical. Please find below the first ten lines of the December 2018 file as an example.
DATETIME; EXPIRATION; BUY_SELL_ID; TRADE_SIZE; MATCH_PRICE; AGGRESSOR_FLAG;ACCOUNT_ROLE; EXEC_TYPE_ID
2018-12-03 08:00:06.400; 201812; S; 2; 3229; N; A; F
2018-12-03 08:00:06.410; 201812; S; 1; 3229; N; A; F
2018-12-03 08:00:06.410; 201812; S; 1; 3229; N; A; F
2018-12-03 08:00:06.410; 201812; B; 4; 3229; Y; A; F
2018-12-03 08:00:06.540; 201812; S; 1; 3229; N; A; F
2018-12-03 08:00:06.550; 201812; B; 2; 3229; Y; A; F
2018-12-03 08:00:06.550; 201812; S; 1; 3229; N; A; F
2018-12-03 08:00:06.630; 201812; B; 1; 3229; Y; A; F
2018-12-03 08:00:06.630; 201812; S; 1; 3229; N; A; F
The variables are defined as follows (the characterizations are short and therefore imprecise, please refer to any standard textbook on futures to get a detailed description of what futures are and how they are traded):
- DATETIME: Time stamp of the trade denoted as YYYY-MM-DD hh:mm:ss.sss where ss.sss denotes seconds up to a third decimal (i.e., the precision is tens of milliseconds as the last digit is always zero).
- EXPIRATION: The expiration date of the futures contract being traded. All data pertain to Eurex trading in EuroStoxx 50 (SX5E) futures contracts. Expiration months are: March, June, September, and December. Contracts expire on the third Friday of the expiration month. The notation of expiration is YYYYMM (where MM is in [03, 06, 09, 12]).
- BUY_SELL: This indicator shows if the trade record is for a buyer “B” (who goes long the index) or for a seller “S” (who goes short the index).
- TRADE_SIZE: This is the size of the trade expressed in number of contracts. The contract value per index point is EUR 10 (e.g., per contract traded, the long side is entitled to receive 10 euro from the short side of the trade each time the index increases by one point).
- MATCH_PRICE: The price at which the trade between buyer and the seller is concluded (i.e., the long and the short side of the trade, respectively).
- AGGRESSOR_FLAG: If the trade record pertains to a market order (or marketable limit order) that is executed against a standing limit order, this flag takes the value “Y”. If the record pertains to a limit order, resting in the book before being matched with an incoming market order, or to an order in an auction (e.g., the opening and closing auction), then this flag takes the value “N”. This flag became available as of November 2009.
-
ACCOUNT_ROLE: This variable is either:
- A: Agency trade (i.e., a trade an exchange member does for a client).
- M: Market-maker principal trade (i.e., a trade an exchange member does for his own account in his role as market maker).
- P: Non-market-maker principal trade (i.e., a trade an exchange member does for his own account).
- P.S.: The distinction between M and P is not an economically meaningful one for the purpose of this project.
-
EXEC_TYPE_ID: This variable is:
- F if the full order was executed in the trade.
- P if the order was only partially executed in the trade.
- N if not assigned.
ex-ante hypotheses
In this project, the RTs will test six hypotheses. These hypotheses will be about price discovery, realized spread, the frequency of client trades, the use of market orders, and gross trading revenue. The RTs get precise details on the hypotheses upon reception of the data.
data analysis & short paper
For each hypothesis being tested, research teams will submit...
- an effect size estimate for each hypothesis,
- an estimate of the corresponding standard error, and
- the analysis scripts used to generate the results
After receiving the data and hypotheses, the RTs have time until March 26, 2021, to do the analyses and submit a short paper and the analysis code used for all the analyses. The paper is a maximum of 5 pages long. The paper reports the coefficient (effect size) and the standard error for each hypothesis, and uses the remainder to precisely describe the methodology (in contrast, motivation, summary statistics, etc. are not needed). In addition to submitting the short paper in PDF format, RTs also have to fill in a web form, providing the effect size and standard error for each hypothesis. Submission of the short paper and the required analysis scripts will be administered via upload forms on the project webpage.
data usage & authorship
The data are initially shared under a limited data use agreement; the principal restriction is that users of the data will not be allowed to release, publicize, or discuss their results until the end of a specified embargo period.
For the final paper to be produced from this project (including analysis of RTs beliefs, RTs analysis, and the assessment by the reviewers of the analysis of the RTs), the project coordinators will draft the manuscript. All members of each RT will be offered co-authorship on the paper(s); each RT is limited to no more than two participants. Authorship will be limited to RTs who submit their results and reports for all stages by the respective deadlines. Co-authors from the RTs will be given two weeks to review any drafts of papers prior to submission.
Each member of the RTs must sign a consent form before obtaining access to the data. Sharing of data and/or results or discussing outcomes from the analyses with any other person during the embargo period is strictly forbidden (see information on the confidentiality agreement). Sharing information during the embargo will compromise the entire belief elicitation and data analysis process of the research teams in the project.
belief elicitation
After RTs have submitted their short paper in Stage 1, they will receive a survey to measure their beliefs about the variation in results across RTs in Stage 1. This survey includes two belief questions per hypothesis: RTs will be asked to predict the standard deviation (STD) of the effect size and the t-statistic estimates across the RTs.
Every one out of five RTs will be paid a monetary reward according to the precision of their belief about the variation across research teams. If a RT is randomly drawn for payout, another random draw determines which belief question will be paid out (i.e., only one hypothesis and the associated belief is randomly drawn for payout for a RT). The goal of the RT will be to as closely as possible guess the actual variation in results across RTs. Details about the scoring rule, which will be used to determine payments, will be provided in due time.
RTs will receive a link to the survey via e-mail. Importantly, each RT will only give one answer on each question, implying that the RT members should find a consensus when answering the survey.