Accessing and Managing Financial Data with
Tidy Finance

useR! Conference

Christoph Scheuch

2024-07-09

A few years ago …

Two PhD students in Vienna

Hardly any public code or data
Hard to reproduce papers
80% of time spent preparing data

Since then …

What is Tidy Finance?

A transparent, open-source approach to research in financial economics, featuring multiple programming languages

What is Tidy Finance?

A transparent, open-source approach to research in financial economics, featuring multiple programming languages

The tidyfinance packages is a simple way to:

Load our approach into R
Use helper functions to download & process data
Easily compile multiple data sources

Why tidy?

Write code that is easy to read for humans
Compose simple functions to solve complex problems
Embrace functional programming for reproducible results
Reuse data structures across applications

Recap: what is tidy data?

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)

A consistent interface to financial data

library(tidyfinance)

download_data(
  type = "factors_ff3_monthly", 
  start_date = "2022-01-01", 
  end_date = "2022-12-31"
) |> 
  print(n = 5)

# A tibble: 12 × 5
  date       risk_free mkt_excess     smb     hml
  <date>         <dbl>      <dbl>   <dbl>   <dbl>
1 2022-01-01    0         -0.0625 -0.0594  0.128 
2 2022-02-01    0         -0.0229  0.022   0.0309
3 2022-03-01    0.0001     0.0305 -0.0161 -0.0181
4 2022-04-01    0.0001    -0.0946 -0.0139  0.0616
5 2022-05-01    0.0003    -0.0034 -0.0183  0.0839
# ℹ 7 more rows

Deep dive: download raw data

raw_data <- frenchdata::download_french_data(dataset)
raw_data <- raw_data$subsets$data[[1]]

Deep dive: parse dates

raw_data <- frenchdata::download_french_data(dataset)
raw_data <- raw_data$subsets$data[[1]]

if (grepl("monthly", type)) {
  processed_data <- raw_data |>
    mutate(date = floor_date(ymd(paste0(date, "01")), "month"))
} else {
  processed_data <- raw_data |> 
    mutate(date = ymd(date))
}

Deep dive: transform numeric columns

raw_data <- frenchdata::download_french_data(dataset)
raw_data <- raw_data$subsets$data[[1]]

if (grepl("monthly", type)) {
  processed_data <- raw_data |>
    mutate(date = floor_date(ymd(paste0(date, "01")), "month"))
} else {
  processed_data <- raw_data |> mutate(date = ymd(date))
}

processed_data <- processed_data |>
  mutate(across(-date, ~na_if(.,-99.99)),
         across(-date, ~ . / 100))

Deep dive: rename columns

raw_data <- frenchdata::download_french_data(dataset)
raw_data <- raw_data$subsets$data[[1]]

if (grepl("monthly", type)) {
  processed_data <- raw_data |>
    mutate(date = floor_date(ymd(paste0(date, "01")), "month"))
} else {
  processed_data <- raw_data |> mutate(date = ymd(date))
}

processed_data <- processed_data |>
  mutate(across(-date, ~na_if(.,-99.99)),
         across(-date, ~ . / 100)) |>
  rename_with(tolower)

processed_data |>
  rename_with(tolower) |>
  ... # some more steps

List of supported data sources

Currently 32 data sets supported from these domains:

Fama-French factors
Q factors
Goyal-Welch macroeconomic predictors
Wharton Research Database Service (WRDS)

Easy to extend supported sources

Example: load packages

library(tidyfinance)
library(dplyr)

Example: load stock returns

library(tidyfinance)
library(dplyr)

crsp <- download_data(
  "wrds_crsp_monthly", "2022-01-01", "2022-12-31"
)

Example: load factors

library(tidyfinance)
library(dplyr)

crsp <- download_data(
  "wrds_crsp_monthly", "2022-01-01", "2022-12-31"
)

factors <- download_data(
  "factors_ff3_monthly", "2022-01-01", "2022-12-31"
)

Example: join data

library(tidyfinance)
library(dplyr)

crsp <- download_data(
  "wrds_crsp_monthly", "2022-01-01", "2022-12-31"
)

factors <- download_data(
  "factors_ff3_monthly", "2022-01-01", "2022-12-31"
)

stock_returns <- crsp |> 
  left_join(factors, join_by(month == date))

Example: winsorize column

library(tidyfinance)
library(dplyr)

crsp <- download_data(
  "wrds_crsp_monthly", "2022-01-01", "2022-12-31"
)

factors <- download_data(
  "factors_ff3_monthly", "2022-01-01", "2022-12-31"
)

stock_returns <- crsp |> 
  left_join(factors, join_by(month == date)) 
  
stock_returns <- stock_returns |> 
  mutate(mktcap_winsorized = winsorize(mktcap, 0.05))

Example: summary statistics

stock_returns |> 
  create_summary_statistics(mktcap, mktcap_winsorized)

Example: summary statistics

stock_returns |> 
  create_summary_statistics(mktcap, mktcap_winsorized)

# A tibble: 2 × 7
  variable              n  mean     sd   min   q50      max
  <chr>             <int> <dbl>  <dbl> <dbl> <dbl>    <dbl>
1 mktcap            52711 9335. 63165.  1.17  546. 2852312.
2 mktcap_winsorized 52711 4154.  8550. 18.5   546.   34005.

Example: assign portfolios by dates

stock_returns |> 
  group_by(date) |> 
  mutate(
    portfolio = assign_portfolio(
      pick(everything()), "mktcap_winsorized", n_portfolios = 10
    )
  )

Example: assign portfolios by dates

stock_returns |> 
  group_by(date) |> 
  mutate(
    portfolio = assign_portfolio(
      pick(everything()), "mktcap_winsorized", n_portfolios = 10
    )
  )

# A tibble: 52,711 × 4
  date       permno mktcap_winsorized portfolio
  <date>      <dbl>             <dbl>     <int>
1 2022-01-01  10032             2170.         7
2 2022-02-01  10032             2289.         7
3 2022-03-01  10032             2299.         7
4 2022-04-01  10032             2260.         8
5 2022-05-01  10032             2351.         8
# ℹ 52,706 more rows

Example: assign portfolios by dates

stock_returns |> 
  group_by(date) |> 
  mutate(
    portfolio = assign_portfolio(
      pick(everything()), "mktcap_winsorized", n_portfolios = 10
    )
  )

Currently working on calculate_portfolio_returns()

Calculate value-weighted and equal-weighted returns for different sorting methods

Example: estimate model

stock_returns |> 
  estimate_model("ret_excess ~ mkt_excess + smb + hml")

Example: estimate model

stock_returns |> 
  estimate_model("ret_excess ~ mkt_excess + smb + hml")

  mkt_excess       smb        hml
1  0.8467784 0.7519564 0.07131767

Example: estimate model

stock_returns |> 
  estimate_model("ret_excess ~ mkt_excess + smb + hml")

Curently working on roll_capm_estimation():

Estimate betas for different lookbacks and multiple factors

Goals of `tidyfinance` package

Make researchers more efficient

Make teachers more effective

Tidy approach to financial data

Check out open source content at tidy-finance.org
Get in touch for teaching materials
Submit issues to extend supported types
Follow me for news: linkedin.com/in/christophscheuch

Accessing and Managing Financial Data with Tidy Finance

A few years ago …

Since then …

What is Tidy Finance?

What is Tidy Finance?

Why tidy?

Recap: what is tidy data?

A consistent interface to financial data

A consistent interface to financial data

A consistent interface to financial data

A consistent interface to financial data

A consistent interface to financial data

Deep dive: download raw data

Deep dive: parse dates

Deep dive: transform numeric columns

Deep dive: rename columns

List of supported data sources

Easy to extend supported sources

Example: load packages

Example: load stock returns

Example: load factors

Example: join data

Example: winsorize column

Example: summary statistics

Example: summary statistics

Example: assign portfolios by dates

Example: assign portfolios by dates

Example: assign portfolios by dates

Example: estimate model

Example: estimate model

Example: estimate model

Goals of tidyfinance package

Tidy approach to financial data

Accessing and Managing Financial Data with
Tidy Finance

Goals of `tidyfinance` package