TidyFinance: Empirical asset pricing

class: center, middle, inverse, title-slide

.title[
# TidyFinance: Empirical asset pricing
]
.subtitle[
## www.tidy-finance.org
]
.author[
### Patrick Weiss
]
.date[
### October 13, 2022
]

---

class: chapter-slide

# Workshop outline

---
## Is empirical asset pricing important?

.pull-left[
- Asset prices are essential for efficient **resource allocation**.

- **Theoretical** insights have to be tested empirically.

- There are tons of **research articles** produced every year.

- Stock, bond, currency **markets** are enormous and affect our daily life.
]
.pull-right[
&nbsp;

<middle><center><img src="data:image/png;base64,#images/fearless_girl.jpg" alt="Image of a bronze statue of a girl in front of a statue of a bull." width="400"/></center></middle>
]

---
## What will be covered in the workshop?

**1. Introduction**

**2. Data**

**3. CAPM**

**4. Portfolio sorts**

---
## What will you learn today?

**Data**
- Most important databases (e.g., CRSP) and some open source data
- Tools to organize your data

**CAPM**
- Most used model in empirical asset pricing
- Determine CAPM-betas with rolling-window regressions

**Portfolio sorts**
- Common tool to infer risk-return relations
- Susceptible to p-hacking and data mining

---

### The basis for this workshop: #TidyFinance is ...

.pull-left[
- ... an open-source `{bookdown}` available at [tidy-finance.org](https://www.tidy-finance.org).

- ... a step towards **reproducible finance** by providing a fully transparent code base.

- ... a resource for students, lecturers, and professionals using `R` for applications in finance.

- ... a **tidy** approach to finance.

- ... continuously maintained and expanded.
]

.pull-right[
<middle><center><img src="data:image/png;base64,#images/cover.jpg" alt="Cover image of 'Tidy Finance with R'. The figure reads Tidy Finance with R followed by the author's names; Christoph Scheuch, Stefan Voigt, and Patrick Weiss." width="400"/></center></middle>
]

---
class: middle

.pull-left[
### `R` is among the best choices for finance programming.

- Free, open-source software with a diverse, active **community**.

- Actively-maintained **packages** for all kinds of applications.

- Smooth integration with other **programming languages**.

- **RStudio** <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:steelblue;overflow:visible;position:relative;"><path d="M225.8 468.2l-2.5-2.3L48.1 303.2C17.4 274.7 0 234.7 0 192.8v-3.3c0-70.4 50-130.8 119.2-144C158.6 37.9 198.9 47 231 69.6c9 6.4 17.4 13.8 25 22.3c4.2-4.8 8.7-9.2 13.5-13.3c3.7-3.2 7.5-6.2 11.5-9c0 0 0 0 0 0C313.1 47 353.4 37.9 392.8 45.4C462 58.6 512 119.1 512 189.5v3.3c0 41.9-17.4 81.9-48.1 110.4L288.7 465.9l-2.5 2.3c-8.2 7.6-19 11.9-30.2 11.9s-22-4.2-30.2-11.9zM239.1 145c-.4-.3-.7-.7-1-1.1l-17.8-20c0 0-.1-.1-.1-.1c0 0 0 0 0 0c-23.1-25.9-58-37.7-92-31.2C81.6 101.5 48 142.1 48 189.5v3.3c0 28.5 11.9 55.8 32.8 75.2L256 430.7 431.2 268c20.9-19.4 32.8-46.7 32.8-75.2v-3.3c0-47.3-33.6-88-80.1-96.9c-34-6.5-69 5.4-92 31.2c0 0 0 0-.1 .1s0 0-.1 .1l-17.8 20c-.3 .4-.7 .7-1 1.1c-4.5 4.5-10.6 7-16.9 7s-12.4-2.5-16.9-7z"/></svg>
]

.pull-right[
### `{Tidyverse}` is the way for data analysis.

- Messy data cause pain. **Tidy data** cause joy.

- Compose simple functions with the **pipe**.

- Designed for **humans**.

- **Consistent** across packages.
]

---
## Today's speaker: Patrick Weiss

&nbsp;

.pull-left[
<middle><center><img style="border-radius: 50%;" alt="Portrait of Patrick Weiss" src="data:image/png;base64,#images/pic_patrick.jpg" width="200px"/></center></middle>

.center[**Patrick Weiss**

- External lecturer at Reykjavik University

- Research focus on empirical asset pricing with equities and bonds

- Published in JFE on 'The maturity premium'

- Co-author of TidyFinance
]
---
class: chapter-slide

# Data

---
background-image: url("data:image/png;base64,#images/wrds_logo.png")
background-position: 90% 60%
background-size: 400px

## Where to get data?

- Gathering high quality data is the basis for any for research project

- Luckily, the Wharton Research Data Service (i.e., WRDS) exists

- WRDS combines many data providers in one, easy-to-use platform
  + CRSP,
  + Compustat,
  + TRACE,
  + Optionmetrics, and so much more..
  
---
## How to get the data?

- You can access the WRDS-webpage and use their interface to download data

- WRDS supports many different data formats, if you prefer other software

- However, the easiest way is a remote connection via the `RPostgres` package
  + Check out Chapter 3 in Tidy Finance with R
  + Join **TidyFinance: Financial Data in R** on November 24, 2022 
  
- For the purpose of this workshop, all data is stored in my `SQLite`-database
  + One file for all data instead of several files
---
## The US stock market

- Large parts of the academic literature focus on US stock markets

- Stocks are listed on US exchanges (NYSE, AMEX, NASDAQ, and some smaller ones)

- Extensive data on prices is provided by the Center for Research in Security Prices (CRSP), maintained by the University of Chicago, Booth School of Business

- Full sample ranges from December 1925 and is continuously updated

---
## The CRSP record

- Data processing for CRSP involves several steps:
  + Defining the sample to US stocks (i.e., `shrcd%in%c(10,11)`)
  + Add delisting returns
  + We explain the data processing in Chapter 3 of Tidy Finance
  
- We stored monthly processed (!) data in `tidy_finance.sqlite`

- The data contains
  + Stock identifier `permno`
  + Time identifier `month`
  + Price `altprc`, return `ret`, and shares outstanding `shrout`
  + listing exchanges `exchcd` and firm's industry `siccd`

---
## Fama-French data

- Kenneth French provides data on his webpage including
  + Risk-free rates
  + Risk factors returns
  + Test assets for various tests
  
- Access via the `frenchdata` package

- Check out all available data on the webpage or via `get_french_data_list()`

---
## Load data

- We connect to our existing database and load
  + Fama-French factors including the risk-free rate
  + CRSP data
  + CPI data, which we directly match to crsp

``` r
library(RSQLite)
tidy_finance <- dbConnect(SQLite(), "data/tidy_finance.sqlite", 
                          extended_types = TRUE) # Connect to sql

factors_ff_monthly <- tbl(tidy_finance, "factors_ff_monthly") %>% collect()

crsp_monthly <- tbl(tidy_finance, "crsp_monthly") |>  collect() |> 
  left_join(tbl(tidy_finance, "cpi_monthly") |> collect(), by = "month") |>
  left_join(factors_ff_monthly, by = "month") 
```

---
## The CRSP record overview: Exchanges

``` r
crsp_monthly |>  
  count(exchange, date) |> 
  ggplot(aes(x = date, y = n, color = exchange)) + geom_line() +
  labs(x = NULL, y = NULL, color = NULL, title = "Number of securities by exchange")
```

![Number of securities by exchange](data:image/png;base64,#W4U_AP_slides_files/figure-html/unnamed-chunk-1-1.png){#id .class width=50% height=50%}

---
## The CRSP record overview: Industries

``` r
crsp_monthly |> 
  group_by(month, industry) |>  
  summarize(mktcap = sum(mktcap / cpi) / 1000000, .groups = 'drop') |>  
  ggplot(aes(x = month, y = mktcap, color = industry)) + geom_line() +
  labs(x = NULL, y = NULL, title = "Market cap by industry (in trillion USD)")
```

![Market cap by industry (in trillion USD)](data:image/png;base64,#W4U_AP_slides_files/figure-html/unnamed-chunk-2-1.png){#id .class width=50% height=50%}

---
## Risk-free rate and excess returns

- Excess return for a stock is the difference between the stock return and the return on the risk-free security over the same period

``` r
factors_ff_monthly %>% ggplot(aes(x = month, y = 100 * rf)) + 
  geom_line() +  labs(x = NULL, y = NULL, title = "Risk free rate (in percent)") 
```

![Risk free rate (in percent)](data:image/png;base64,#W4U_AP_slides_files/figure-html/unnamed-chunk-4-1.png){#id .class width=50% height=50%}

---
## Excess returns in the CRSP sample

``` r
crsp_monthly  |>  
  group_by(month) |> 
  summarise(across(ret_excess, 
                   list(mean = mean, sd = sd, min = min, 
                        q25 = ~ quantile(., 0.25), 
                        median = median, 
                        q75 = ~ quantile(., 0.75), max = max),
                   .names = "{.fn} return")) |> 
  summarise(across(-month, mean))
```

---
## Alternative data sources

- Yahoo!finance provides some interesting data
  + Access is easy with the `tidyquant` package
  + Check out the introductory Chapter 1 on Tidy Finance
  
- Macroeconomic data is available from FRED, via `fredr` or `alfred`

- Bloomberg and Refinitiv data is also available via R packages

- More risk factors from [Global factor data](https://jkpfactors.com/) and [Open source asset pricing](https://www.openassetpricing.com/data)

- Crypto data from [coinmarketcap](https://www.tidy-finance.org/coinmarketcap.com) via `crypto2` package 
---
class: chapter-slide

# The Capital Asset Pricing Model (CAPM)

---
## Implementing the CAPM

- The **CAPM** of Sharpe (1964), Lintner (1965), and Mossin (1966) originates the literature on asset pricing models

- Asset risk is the **covariance** of its return with the **market** portfolio return

- The higher the co-movement the less desirable the asset is, hence, the asset **price is lower** and the expected **return is higher**

- **Risk premium** is driven by investors' risk aversion and is equal to the expected value of excess market return

---
## Regression specification of stock `$i$`

`$$r_{i,t} - r_{f,t} = \alpha_i + \underbrace{\frac{Cov(r_i, r_m)}{\text{Var}(r_m)}}_{\beta_i} (r_{m,t} - r_{f,t}) + \varepsilon_{i, t}$$`

- Expected returns are determined by the **price of risk** (market risk premium) and the **co-movement** of asset `$i$` with the market, `$\beta_i$`

- To determine whether an asset  generates **abnormal returns** relative to the CAPM, we evaluate whether the intercept coefficient `$\alpha_i$` is statistically distinguishable from zero

---
## The market factor

- The market risk premium is the market excess return `$z_t = r_{m,t} - r_{f,t}$` 
- We proxy for the market with the value-weighted portfolio of all US based common stocks in CRSP (provided by Kenneth French)
- **Sharpe ratio** is computed as `$\frac{\hat{\mu_z}}{\hat{\sigma}_z}$`

``` r
factors_ff_monthly |>  
    mutate(mkt_excess = 100 * mkt_excess) |>  
    summarise(across(mkt_excess, list(mean = ~ 12 * mean(.), 
                                      sd = ~sqrt(12) * sd(.), 
                                      Sharpe = ~sqrt(12) * mean(.)/sd(.)),
                     .names = "{.fn} (annualized)"))
```

---
## CAPM - Introductory example (I)

- Implementation of the CAPM is a simple **linear regression** (estimated via OLS) with `lm`

- Let us look at an example for Apple Inc.

``` r
apple_monthly <- crsp_monthly |> filter(permno == 19764) # 19764 = APPLE
lm(ret_excess ~ mkt_excess, data = apple_monthly) |> 
  broom::tidy()
```

---
## CAPM - Introductory example (II)

``` r
apple_monthly |>  ggplot(aes(x = mkt_excess, y = ret_excess)) + geom_point() + 
  geom_smooth(method = "lm", se = FALSE) + 
  labs(x = "Market Excess Returns", y = "Apple Excess Returns")
```

![Market Excess Return and Apple Excess Returns](data:image/png;base64,#W4U_AP_slides_files/figure-html/unnamed-chunk-8-1.png){#id .class width=50% height=50%}

---
## CAPM - Rolling-window regressions

- The example before assumes that **beta is constant** over time. Is that true?

- Rolling-window regressions allow for (some) variation in betas

- Here, we show how to implement rolling-window regressions `slider`

``` r
library(slider)
```

- Notice that rolling-window regressions on individual stocks can be computationally expensive
  + You can still run code on your machine, but **parallelization** becomes relevant
  + We cover parallelized rolling-window regressions in **Chapter 6.3** in TidyFinance

---
## Function to estimate betas

- We repeat the estimation of the CAPM with `lm`
- Yet we only extract the relevant estimate of beta
- Additionally, we implement a filter for a minimum number of observations

``` r
estimate_capm <- function(data, min_obs = 1) {
  if (nrow(data) < min_obs) {
    beta <- as.numeric(NA)
  } else {
    fit <- lm(ret_excess ~ mkt_excess, data = data)
    beta <- as.numeric(coefficients(fit)[2])
  }
  return(beta)
}
```

---
## Rolling-window function for betas

``` r
roll_capm_estimation <- function(data, months, min_obs) {
  data <- data |>
    arrange(month)

betas <- slide_period_vec(
    .x = data,
    .i = data$month,
    .period = "month",
    .f = ~ estimate_capm(., min_obs),
    .before = months - 1,
    .complete = FALSE
  )

return(tibble(
    month = unique(data$month),
    beta = betas
  ))
}
```

---
## Estimating rolling-window betas: Test set

- We select a sample of firms to represent a test set

``` r
examples <- tribble(
  ~permno, ~company,
  14593, "Apple",
  10107, "Microsoft",
  93436, "Tesla",
  17778, "Berkshire Hathaway"
)
```

---
## Estimating rolling-window betas: Estimation

- Then, we apply the function for rolling-window beta estimation

- In this case, a mutate is sufficient. For more complex applications you can `nest()` the data and `map()` it to the function

---
## Estimating rolling-window betas: Results

``` r
beta_examples |> ggplot(aes(x = month, y = beta, color = company, linetype = company)) +
  geom_line() + labs(x = NULL, y = NULL, color = NULL, linetype = NULL,
    title = "Monthly beta estimates for example stocks using 5 years of data")
```

![Monthly beta estimates for example stocks using 5 years of data](data:image/png;base64,#W4U_AP_slides_files/figure-html/unnamed-chunk-14-1.png){#id .class width=50% height=50%}

---
## Recap: Computing beta for the CRSP universe

- Market beta for month `$t$` is estimated with data from prior to and including `$t$`, e.g., 5 years of monthly data 
- Rolling-window regressions are straightforward from a methodological perspective but tricky to implement

- Let us load the result, we computed for you

``` r
beta <- tbl(tidy_finance, "beta") |> collect() |> 
  inner_join(crsp_monthly, by = c("month", "permno")) |> 
  drop_na(beta_monthly)
```

---
## Beta summary across industries

``` r
beta |> group_by(industry, permno) |> summarise(beta = mean(beta_monthly)) |> 
  ggplot(aes(x = reorder(industry, beta, FUN = median), y = beta)) +
  geom_boxplot() + coord_flip() +
  labs(x = NULL, y = NULL, title = "Average beta estimates by industry")
```

![Average beta estimates by industry](data:image/png;base64,#W4U_AP_slides_files/figure-html/unnamed-chunk-16-1.png){#id .class width=50% height=50%}

---
class: chapter-slide

# Portfolio sorts

---
## The basic intuition behind portfolio sorts

- Portfolio sorts are a way to estimate a (non-linear) **relation** between a stock **characteristic** and **expected returns**

- The return differentials between extreme portfolios motivate cross-sectional risk premiums

- Academics have identified an large number of relevant characteristics, called the **factor zoo**

- The resulting factors are often used in factor pricing models, like the famous **Fama-French 3-Factor** Model (Fama and French, 1992)

- An alternative way to estimate cross-sectional return drivers are **Fama-MacBeth** regressions (see Fama and MacBeth (1973) and Chapter 11)
---
## Portfolio sorting procedure

- How to conduct portfolio sorts? 
  + Take a sorting variable associated with a stock
  + Each month sort stocks into ten portfolios
  + Weight the individual stock returns over the next month to a portfolio return
  + Compute the return differential between the top and bottom portfolio
  + This return differential is an estimate for the risk premium associated with the sorting variable
  
- You end up with several monthly estimates of risk premiums, their average is the **expected risk premium**
---
## Portfolio sorts based on rolling-window betas

- **What does the CAPM imply?**

- Univariate portfolio analysis (non-parametric technique):

``` r
beta_portfolios <- beta  |> group_by(month) |> 
  mutate(breakpoint = median(beta_monthly),
         portfolio = case_when(beta_monthly <= breakpoint ~ "low",
                               beta_monthly > breakpoint ~ "high")) |> 
  group_by(portfolio, month) |> 
  summarise(ret = weighted.mean(ret_excess, mktcap_lag))

beta_portfolios |> summarize(mean = mean(ret*100))
```

---
## Positive alpha?

- We saw a positive return difference in means. However:
  1. This is not a **statistical test** of the return difference
  2. It is not meaningful, if we consider the **CAPM** a valid benchmark
  
- Instead, we look at estimates of **alpha** to infer about risk premiums

---
## Bad news for CAPM?

- Figure shows decile portfolio sorts based on *lagged* beta (1 lowest, 10 highest)
- Each bar corresponds to the CAPM alpha of the value-weighted portfolio performance

---
## Extensions to portfolio sorts

- The CAPM *test* is only one example of how to use portfolio sorts

- Portfolio sorts also extend to multiple characteristics at once with **multivariate portfolio sorts**

- The famous Fama and French (1992) model's factors come from a 2x3 sort on **size and book-to-market** (+ the market factor)

- The **factor zoo** shows how many applications portfolio sorts have in the academic literature

---
## Researchs' discretion in portfolio sorts

- Researchers have to make **numerous decision** when conducting portfolio sorts:
  + How many portfolios?
  + How to weight portfolio returns?
  + Single or double sorts?
  + Sample construction rules?

- These decision are seemingly innocuous, but have a **large impact** on the estimates of risk premiums

- There is a clear potential for **p-hacking** and **data mining**

---
## Non-standard errors in portfolio sorts

The degree of variation in risk premium estimates. See [Walter, Weber, and Weiss (2022)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4164117) for details.
---
class: chapter-slide

# Final remarks

---
## Conclusion

- Now, you know some important concepts in **empirical asset pricing** and ways to implement them

- How can you apply what you have learnt?
  + The CAPM can be tested with freely available data from Yahoo!Finance and Kenneth French
  + Portfolio sorts require more data, but you can reach out to me
  
- This workshop covers several topics and leaves some details due to time constraints: Check out [Tidy Finance with R](https://www.tidy-finance.org) to find out more.

- There is another workshop aimed at an introduction to financial data
  + Join **TidyFinance: Financial Data in R** on November 24, 2022

---
class: middle, left
background-image: url("data:image/png;base64,#images/cover.jpg")
background-position: right
background-size: 400px

### Reach out with comments, questions, or <br> suggestions: <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:blue;overflow:visible;position:relative;"><path d="M64 112c-8.8 0-16 7.2-16 16v22.1L220.5 291.7c20.7 17 50.4 17 71.1 0L464 150.1V128c0-8.8-7.2-16-16-16H64zM48 212.2V384c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V212.2L322 328.8c-38.4 31.5-93.7 31.5-132 0L48 212.2zM0 128C0 92.7 28.7 64 64 64H448c35.3 0 64 28.7 64 64V384c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V128z"/></svg> [contact@tidy-finance.org](mailto:contact@tidy-finance.org)

&nbsp;

## <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M464 256A208 208 0 1 1 48 256a208 208 0 1 1 416 0zM0 256a256 256 0 1 0 512 0A256 256 0 1 0 0 256zM294.6 135.1c-4.2-4.5-10.1-7.1-16.3-7.1C266 128 256 138 256 150.3V208H160c-17.7 0-32 14.3-32 32v32c0 17.7 14.3 32 32 32h96v57.7c0 12.3 10 22.3 22.3 22.3c6.2 0 12.1-2.6 16.3-7.1l99.9-107.1c3.5-3.8 5.5-8.7 5.5-13.8s-2-10.1-5.5-13.8L294.6 135.1z"/></svg> visit [tidy-finance.org](https://www.tidy-finance.org)