class: center, middle, inverse, title-slide .title[ # TidyFinance: Empirical asset pricing ] .subtitle[ ## www.tidy-finance.org ] .author[ ### Patrick Weiss ] .date[ ### October 13, 2022 ] --- class: chapter-slide # Workshop outline --- ## Is empirical asset pricing important? .pull-left[ - Asset prices are essential for efficient **resource allocation**. - **Theoretical** insights have to be tested empirically. - There are tons of **research articles** produced every year. - Stock, bond, currency **markets** are enormous and affect our daily life. ] .pull-right[ <middle><center><img src="data:image/png;base64,#images/fearless_girl.jpg" alt="Image of a bronze statue of a girl in front of a statue of a bull." width="400"/></center></middle> ] --- ## What will be covered in the workshop? **1. Introduction** **2. Data** **3. CAPM** **4. Portfolio sorts** --- ## What will you learn today? **Data** - Most important databases (e.g., CRSP) and some open source data - Tools to organize your data **CAPM** - Most used model in empirical asset pricing - Determine CAPM-betas with rolling-window regressions **Portfolio sorts** - Common tool to infer risk-return relations - Susceptible to p-hacking and data mining --- ### The basis for this workshop: #TidyFinance is ... .pull-left[ - ... an open-source `{bookdown}` available at [tidy-finance.org](https://www.tidy-finance.org). - ... a step towards **reproducible finance** by providing a fully transparent code base. - ... a resource for students, lecturers, and professionals using `R` for applications in finance. - ... a **tidy** approach to finance. - ... continuously maintained and expanded. ] .pull-right[ <middle><center><img src="data:image/png;base64,#images/cover.jpg" alt="Cover image of 'Tidy Finance with R'. The figure reads Tidy Finance with R followed by the author's names; Christoph Scheuch, Stefan Voigt, and Patrick Weiss." width="400"/></center></middle> ] --- class: middle .pull-left[ ### `R` is among the best choices for finance programming. - Free, open-source software with a diverse, active **community**. - Actively-maintained **packages** for all kinds of applications. - Smooth integration with other **programming languages**. - **RStudio**
] .pull-right[ ### `{Tidyverse}` is the way for data analysis. - Messy data cause pain. **Tidy data** cause joy. - Compose simple functions with the **pipe**. - Designed for **humans**. - **Consistent** across packages. ] --- ## Today's speaker: Patrick Weiss .pull-left[ <middle><center><img style="border-radius: 50%;" alt="Portrait of Patrick Weiss" src="data:image/png;base64,#images/pic_patrick.jpg" width="200px"/></center></middle> .center[**Patrick Weiss**
[patrick.weiss@wu.ac.at](mailto:patrick.weiss@wu.ac.at)] ] .pull-right[ - Post-doc at WU Vienna - External lecturer at Reykjavik University - Research focus on empirical asset pricing with equities and bonds - Published in JFE on 'The maturity premium' - Co-author of TidyFinance ] --- class: chapter-slide # Data --- background-image: url("data:image/png;base64,#images/wrds_logo.png") background-position: 90% 60% background-size: 400px ## Where to get data? - Gathering high quality data is the basis for any for research project - Luckily, the Wharton Research Data Service (i.e., WRDS) exists - WRDS combines many data providers in one, easy-to-use platform + CRSP, + Compustat, + TRACE, + Optionmetrics, and so much more.. --- ## How to get the data? - You can access the WRDS-webpage and use their interface to download data - WRDS supports many different data formats, if you prefer other software - However, the easiest way is a remote connection via the `RPostgres` package + Check out Chapter 3 in Tidy Finance with R + Join **TidyFinance: Financial Data in R** on November 24, 2022 - For the purpose of this workshop, all data is stored in my `SQLite`-database + One file for all data instead of several files --- ## The US stock market - Large parts of the academic literature focus on US stock markets - Stocks are listed on US exchanges (NYSE, AMEX, NASDAQ, and some smaller ones) - Extensive data on prices is provided by the Center for Research in Security Prices (CRSP), maintained by the University of Chicago, Booth School of Business - Full sample ranges from December 1925 and is continuously updated <center><img alt="Logo of CRSP" src="data:image/png;base64,#images/crsp_logo.png" width="350px"/></center> --- ## The CRSP record - Data processing for CRSP involves several steps: + Defining the sample to US stocks (i.e., `shrcd%in%c(10,11)`) + Add delisting returns + We explain the data processing in Chapter 3 of Tidy Finance - We stored monthly processed (!) data in `tidy_finance.sqlite` - The data contains + Stock identifier `permno` + Time identifier `month` + Price `altprc`, return `ret`, and shares outstanding `shrout` + listing exchanges `exchcd` and firm's industry `siccd` --- ## Fama-French data - Kenneth French provides data on his webpage including + Risk-free rates + Risk factors returns + Test assets for various tests - Access via the `frenchdata` package - Check out all available data on the webpage or via `get_french_data_list()` --- ## Load data - We connect to our existing database and load + Fama-French factors including the risk-free rate + CRSP data + CPI data, which we directly match to crsp ``` r library(RSQLite) tidy_finance <- dbConnect(SQLite(), "data/tidy_finance.sqlite", extended_types = TRUE) # Connect to sql factors_ff_monthly <- tbl(tidy_finance, "factors_ff_monthly") %>% collect() crsp_monthly <- tbl(tidy_finance, "crsp_monthly") |> collect() |> left_join(tbl(tidy_finance, "cpi_monthly") |> collect(), by = "month") |> left_join(factors_ff_monthly, by = "month") ``` --- ## The CRSP record overview: Exchanges ``` r crsp_monthly |> count(exchange, date) |> ggplot(aes(x = date, y = n, color = exchange)) + geom_line() + labs(x = NULL, y = NULL, color = NULL, title = "Number of securities by exchange") ``` {#id .class width=50% height=50%} --- ## The CRSP record overview: Industries ``` r crsp_monthly |> group_by(month, industry) |> summarize(mktcap = sum(mktcap / cpi) / 1000000, .groups = 'drop') |> ggplot(aes(x = month, y = mktcap, color = industry)) + geom_line() + labs(x = NULL, y = NULL, title = "Market cap by industry (in trillion USD)") ``` {#id .class width=50% height=50%} --- ## Risk-free rate and excess returns - Excess return for a stock is the difference between the stock return and the return on the risk-free security over the same period ``` r factors_ff_monthly %>% ggplot(aes(x = month, y = 100 * rf)) + geom_line() + labs(x = NULL, y = NULL, title = "Risk free rate (in percent)") ``` {#id .class width=50% height=50%} --- ## Excess returns in the CRSP sample ``` r crsp_monthly |> group_by(month) |> summarise(across(ret_excess, list(mean = mean, sd = sd, min = min, q25 = ~ quantile(., 0.25), median = median, q75 = ~ quantile(., 0.75), max = max), .names = "{.fn} return")) |> summarise(across(-month, mean)) ``` --- ## Alternative data sources - Yahoo!finance provides some interesting data + Access is easy with the `tidyquant` package + Check out the introductory Chapter 1 on Tidy Finance - Macroeconomic data is available from FRED, via `fredr` or `alfred` - Bloomberg and Refinitiv data is also available via R packages - More risk factors from [Global factor data](https://jkpfactors.com/) and [Open source asset pricing](https://www.openassetpricing.com/data) - Crypto data from [coinmarketcap](https://www.tidy-finance.org/coinmarketcap.com) via `crypto2` package --- class: chapter-slide # The Capital Asset Pricing Model (CAPM) --- ## Implementing the CAPM - The **CAPM** of Sharpe (1964), Lintner (1965), and Mossin (1966) originates the literature on asset pricing models - Asset risk is the **covariance** of its return with the **market** portfolio return - The higher the co-movement the less desirable the asset is, hence, the asset **price is lower** and the expected **return is higher** - **Risk premium** is driven by investors' risk aversion and is equal to the expected value of excess market return --- ## Regression specification of stock `\(i\)` `$$r_{i,t} - r_{f,t} = \alpha_i + \underbrace{\frac{Cov(r_i, r_m)}{\text{Var}(r_m)}}_{\beta_i} (r_{m,t} - r_{f,t}) + \varepsilon_{i, t}$$` - Expected returns are determined by the **price of risk** (market risk premium) and the **co-movement** of asset `\(i\)` with the market, `\(\beta_i\)` - To determine whether an asset generates **abnormal returns** relative to the CAPM, we evaluate whether the intercept coefficient `\(\alpha_i\)` is statistically distinguishable from zero --- ## The market factor - The market risk premium is the market excess return `\(z_t = r_{m,t} - r_{f,t}\)` - We proxy for the market with the value-weighted portfolio of all US based common stocks in CRSP (provided by Kenneth French) - **Sharpe ratio** is computed as `\(\frac{\hat{\mu_z}}{\hat{\sigma}_z}\)` ``` r factors_ff_monthly |> mutate(mkt_excess = 100 * mkt_excess) |> summarise(across(mkt_excess, list(mean = ~ 12 * mean(.), sd = ~sqrt(12) * sd(.), Sharpe = ~sqrt(12) * mean(.)/sd(.)), .names = "{.fn} (annualized)")) ``` --- ## CAPM - Introductory example (I) - Implementation of the CAPM is a simple **linear regression** (estimated via OLS) with `lm` - Let us look at an example for Apple Inc. ``` r apple_monthly <- crsp_monthly |> filter(permno == 19764) # 19764 = APPLE lm(ret_excess ~ mkt_excess, data = apple_monthly) |> broom::tidy() ``` --- ## CAPM - Introductory example (II) ``` r apple_monthly |> ggplot(aes(x = mkt_excess, y = ret_excess)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + labs(x = "Market Excess Returns", y = "Apple Excess Returns") ``` {#id .class width=50% height=50%} --- ## CAPM - Rolling-window regressions - The example before assumes that **beta is constant** over time. Is that true? - Rolling-window regressions allow for (some) variation in betas - Here, we show how to implement rolling-window regressions `slider` ``` r library(slider) ``` - Notice that rolling-window regressions on individual stocks can be computationally expensive + You can still run code on your machine, but **parallelization** becomes relevant + We cover parallelized rolling-window regressions in **Chapter 6.3** in TidyFinance --- ## Function to estimate betas - We repeat the estimation of the CAPM with `lm` - Yet we only extract the relevant estimate of beta - Additionally, we implement a filter for a minimum number of observations ``` r estimate_capm <- function(data, min_obs = 1) { if (nrow(data) < min_obs) { beta <- as.numeric(NA) } else { fit <- lm(ret_excess ~ mkt_excess, data = data) beta <- as.numeric(coefficients(fit)[2]) } return(beta) } ``` --- ## Rolling-window function for betas ``` r roll_capm_estimation <- function(data, months, min_obs) { data <- data |> arrange(month) betas <- slide_period_vec( .x = data, .i = data$month, .period = "month", .f = ~ estimate_capm(., min_obs), .before = months - 1, .complete = FALSE ) return(tibble( month = unique(data$month), beta = betas )) } ``` --- ## Estimating rolling-window betas: Test set - We select a sample of firms to represent a test set ``` r examples <- tribble( ~permno, ~company, 14593, "Apple", 10107, "Microsoft", 93436, "Tesla", 17778, "Berkshire Hathaway" ) ``` --- ## Estimating rolling-window betas: Estimation - Then, we apply the function for rolling-window beta estimation ``` r beta_examples <- crsp_monthly |> inner_join(examples, by = "permno") |> group_by(permno) |> mutate(roll_capm_estimation(cur_data(), months = 60, min_obs = 48)) |> ungroup() |> select(permno, company, month, beta) |> drop_na() ``` - In this case, a mutate is sufficient. For more complex applications you can `nest()` the data and `map()` it to the function --- ## Estimating rolling-window betas: Results ``` r beta_examples |> ggplot(aes(x = month, y = beta, color = company, linetype = company)) + geom_line() + labs(x = NULL, y = NULL, color = NULL, linetype = NULL, title = "Monthly beta estimates for example stocks using 5 years of data") ``` {#id .class width=50% height=50%} --- ## Recap: Computing beta for the CRSP universe - Market beta for month `\(t\)` is estimated with data from prior to and including `\(t\)`, e.g., 5 years of monthly data - Rolling-window regressions are straightforward from a methodological perspective but tricky to implement - Let us load the result, we computed for you ``` r beta <- tbl(tidy_finance, "beta") |> collect() |> inner_join(crsp_monthly, by = c("month", "permno")) |> drop_na(beta_monthly) ``` --- ## Beta summary across industries ``` r beta |> group_by(industry, permno) |> summarise(beta = mean(beta_monthly)) |> ggplot(aes(x = reorder(industry, beta, FUN = median), y = beta)) + geom_boxplot() + coord_flip() + labs(x = NULL, y = NULL, title = "Average beta estimates by industry") ``` {#id .class width=50% height=50%} --- class: chapter-slide # Portfolio sorts --- ## The basic intuition behind portfolio sorts - Portfolio sorts are a way to estimate a (non-linear) **relation** between a stock **characteristic** and **expected returns** - The return differentials between extreme portfolios motivate cross-sectional risk premiums - Academics have identified an large number of relevant characteristics, called the **factor zoo** - The resulting factors are often used in factor pricing models, like the famous **Fama-French 3-Factor** Model (Fama and French, 1992) - An alternative way to estimate cross-sectional return drivers are **Fama-MacBeth** regressions (see Fama and MacBeth (1973) and Chapter 11) --- ## Portfolio sorting procedure - How to conduct portfolio sorts? + Take a sorting variable associated with a stock + Each month sort stocks into ten portfolios + Weight the individual stock returns over the next month to a portfolio return + Compute the return differential between the top and bottom portfolio + This return differential is an estimate for the risk premium associated with the sorting variable - You end up with several monthly estimates of risk premiums, their average is the **expected risk premium** --- ## Portfolio sorts based on rolling-window betas - **What does the CAPM imply?** - Univariate portfolio analysis (non-parametric technique): ``` r beta_portfolios <- beta |> group_by(month) |> mutate(breakpoint = median(beta_monthly), portfolio = case_when(beta_monthly <= breakpoint ~ "low", beta_monthly > breakpoint ~ "high")) |> group_by(portfolio, month) |> summarise(ret = weighted.mean(ret_excess, mktcap_lag)) beta_portfolios |> summarize(mean = mean(ret*100)) ``` --- ## Positive alpha? - We saw a positive return difference in means. However: 1. This is not a **statistical test** of the return difference 2. It is not meaningful, if we consider the **CAPM** a valid benchmark - Instead, we look at estimates of **alpha** to infer about risk premiums ``` r beta_portfolios |> pivot_wider(names_from = portfolio, values_from = ret) |> mutate(high_low = high - low) |> left_join(factors_ff_monthly) |> lm(high_low ~ mkt_excess, data = _) |> broom::tidy() ``` --- ## Bad news for CAPM? <center><img alt="Logo of CRSP" src="data:image/png;base64,#images/alpha_beta_sorts.png" width="600px"/></center> - Figure shows decile portfolio sorts based on *lagged* beta (1 lowest, 10 highest) - Each bar corresponds to the CAPM alpha of the value-weighted portfolio performance --- ## Extensions to portfolio sorts - The CAPM *test* is only one example of how to use portfolio sorts - Portfolio sorts also extend to multiple characteristics at once with **multivariate portfolio sorts** - The famous Fama and French (1992) model's factors come from a 2x3 sort on **size and book-to-market** (+ the market factor) - The **factor zoo** shows how many applications portfolio sorts have in the academic literature --- ## Researchs' discretion in portfolio sorts - Researchers have to make **numerous decision** when conducting portfolio sorts: + How many portfolios? + How to weight portfolio returns? + Single or double sorts? + Sample construction rules? - These decision are seemingly innocuous, but have a **large impact** on the estimates of risk premiums - There is a clear potential for **p-hacking** and **data mining** --- ## Non-standard errors in portfolio sorts <center><img alt="Logo of CRSP" src="data:image/png;base64,#images/NSE_plot_figure_1and2.jpg" width="500px"/></center> The degree of variation in risk premium estimates. See [Walter, Weber, and Weiss (2022)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4164117) for details. --- class: chapter-slide # Final remarks --- ## Conclusion - Now, you know some important concepts in **empirical asset pricing** and ways to implement them - How can you apply what you have learnt? + The CAPM can be tested with freely available data from Yahoo!Finance and Kenneth French + Portfolio sorts require more data, but you can reach out to me - This workshop covers several topics and leaves some details due to time constraints: Check out [Tidy Finance with R](https://www.tidy-finance.org) to find out more. - There is another workshop aimed at an introduction to financial data + Join **TidyFinance: Financial Data in R** on November 24, 2022 --- class: middle, left background-image: url("data:image/png;base64,#images/cover.jpg") background-position: right background-size: 400px ### Reach out with comments, questions, or <br> suggestions:
[contact@tidy-finance.org](mailto:contact@tidy-finance.org) ##
visit [tidy-finance.org](https://www.tidy-finance.org)