A few years ago …

Two PhD students in Vienna

  • Hardly any public code or data
  • Hard to reproduce papers
  • 80% of time spent preparing data

Since then …

What is Tidy Finance?

A transparent, open-source approach to research in financial economics, featuring multiple programming languages

What is Tidy Finance?

A transparent, open-source approach to research in financial economics, featuring multiple programming languages


The tidyfinance packages is a simple way to:

  • Load our approach into R
  • Use helper functions to download & process data
  • Easily compile multiple data sources

Why tidy?

  1. Write code that is easy to read for humans
  2. Compose simple functions to solve complex problems
  3. Embrace functional programming for reproducible results
  4. Reuse data structures across applications

Recap: what is tidy data?

© Julia Lowndes & Allison Horst

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)
 

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)
 

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)
 

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)
 

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)
 
# A tibble: 12 × 5
  date       risk_free mkt_excess     smb     hml
  <date>         <dbl>      <dbl>   <dbl>   <dbl>
1 2022-01-01    0         -0.0625 -0.0594  0.128 
2 2022-02-01    0         -0.0229  0.022   0.0309
3 2022-03-01    0.0001     0.0305 -0.0161 -0.0181
4 2022-04-01    0.0001    -0.0946 -0.0139  0.0616
5 2022-05-01    0.0003    -0.0034 -0.0183  0.0839
# ℹ 7 more rows

Deep dive: download raw data

raw_data <- frenchdata::download_french_data(dataset)
raw_data <- raw_data$subsets$data[[1]]

Deep dive: parse dates

raw_data <- frenchdata::download_french_data(dataset)
raw_data <- raw_data$subsets$data[[1]]

if (grepl("monthly", type)) {
  processed_data <- raw_data |>
    mutate(date = floor_date(ymd(paste0(date, "01")), "month"))
} else {
  processed_data <- raw_data |> 
    mutate(date = ymd(date))
}

Deep dive: transform numeric columns

raw_data <- frenchdata::download_french_data(dataset)
raw_data <- raw_data$subsets$data[[1]]

if (grepl("monthly", type)) {
  processed_data <- raw_data |>
    mutate(date = floor_date(ymd(paste0(date, "01")), "month"))
} else {
  processed_data <- raw_data |> mutate(date = ymd(date))
}

processed_data <- processed_data |>
  mutate(across(-date, ~na_if(.,-99.99)),
         across(-date, ~ . / 100))

Deep dive: rename columns

raw_data <- frenchdata::download_french_data(dataset)
raw_data <- raw_data$subsets$data[[1]]

if (grepl("monthly", type)) {
  processed_data <- raw_data |>
    mutate(date = floor_date(ymd(paste0(date, "01")), "month"))
} else {
  processed_data <- raw_data |> mutate(date = ymd(date))
}

processed_data <- processed_data |>
  mutate(across(-date, ~na_if(.,-99.99)),
         across(-date, ~ . / 100)) |>
  rename_with(tolower)

processed_data |>
  rename_with(tolower) |>
  ... # some more steps

List of supported data sources

Currently 32 data sets supported from these domains:

  • Fama-French factors
  • Q factors
  • Goyal-Welch macroeconomic predictors
  • Wharton Research Database Service (WRDS)

Easy to extend supported sources



Example: load packages

library(tidyfinance)
library(dplyr)

Example: load stock returns

library(tidyfinance)
library(dplyr)

crsp <- download_data(
  "wrds_crsp_monthly", "2022-01-01", "2022-12-31"
)

Example: load factors

library(tidyfinance)
library(dplyr)

crsp <- download_data(
  "wrds_crsp_monthly", "2022-01-01", "2022-12-31"
)

factors <- download_data(
  "factors_ff3_monthly", "2022-01-01", "2022-12-31"
)

Example: join data

library(tidyfinance)
library(dplyr)

crsp <- download_data(
  "wrds_crsp_monthly", "2022-01-01", "2022-12-31"
)

factors <- download_data(
  "factors_ff3_monthly", "2022-01-01", "2022-12-31"
)

stock_returns <- crsp |> 
  left_join(factors, join_by(month == date)) 

Example: winsorize column

library(tidyfinance)
library(dplyr)

crsp <- download_data(
  "wrds_crsp_monthly", "2022-01-01", "2022-12-31"
)

factors <- download_data(
  "factors_ff3_monthly", "2022-01-01", "2022-12-31"
)

stock_returns <- crsp |> 
  left_join(factors, join_by(month == date)) 
  
stock_returns <- stock_returns |> 
  mutate(mktcap_winsorized = winsorize(mktcap, 0.05))

Example: summary statistics

stock_returns |> 
  create_summary_statistics(mktcap, mktcap_winsorized)
 

Example: summary statistics

stock_returns |> 
  create_summary_statistics(mktcap, mktcap_winsorized)
 
# A tibble: 2 × 7
  variable              n  mean     sd   min   q50      max
  <chr>             <int> <dbl>  <dbl> <dbl> <dbl>    <dbl>
1 mktcap            52711 9335. 63165.  1.17  546. 2852312.
2 mktcap_winsorized 52711 4154.  8550. 18.5   546.   34005.

Example: assign portfolios by dates

stock_returns |> 
  group_by(date) |> 
  mutate(
    portfolio = assign_portfolio(
      pick(everything()), "mktcap_winsorized", n_portfolios = 10
    )
  )
 

Example: assign portfolios by dates

stock_returns |> 
  group_by(date) |> 
  mutate(
    portfolio = assign_portfolio(
      pick(everything()), "mktcap_winsorized", n_portfolios = 10
    )
  )
 
# A tibble: 52,711 × 4
  date       permno mktcap_winsorized portfolio
  <date>      <dbl>             <dbl>     <int>
1 2022-01-01  10032             2170.         7
2 2022-02-01  10032             2289.         7
3 2022-03-01  10032             2299.         7
4 2022-04-01  10032             2260.         8
5 2022-05-01  10032             2351.         8
# ℹ 52,706 more rows

Example: assign portfolios by dates

stock_returns |> 
  group_by(date) |> 
  mutate(
    portfolio = assign_portfolio(
      pick(everything()), "mktcap_winsorized", n_portfolios = 10
    )
  )
 


Currently working on calculate_portfolio_returns()

  • Calculate value-weighted and equal-weighted returns for different sorting methods

Example: estimate model

stock_returns |> 
  estimate_model("ret_excess ~ mkt_excess + smb + hml")
 

Example: estimate model

stock_returns |> 
  estimate_model("ret_excess ~ mkt_excess + smb + hml")
 
  mkt_excess       smb        hml
1  0.8467784 0.7519564 0.07131767

Example: estimate model

stock_returns |> 
  estimate_model("ret_excess ~ mkt_excess + smb + hml")
 


Curently working on roll_capm_estimation():

  • Estimate betas for different lookbacks and multiple factors

Goals of tidyfinance package



Make researchers more efficient



Make teachers more effective

Tidy approach to financial data

  • Check out open source content at tidy-finance.org
  • Get in touch for teaching materials
  • Submit issues to extend supported types
  • Follow me for news: linkedin.com/in/christophscheuch