library(fixest)
library(modelsummary)
library(dplyr)
library(labelled)
library(gt)
In this post, we have a look at creating a basic regression table with modelsummary
and fixest
.
Setup
First, we load the necessary libraries.
Then, we prepare the data. We simply use the data that comes with the fixest
package. For a prettier output in the table later on, we add a label to our variable of interest.
data(trade)
<- trade |>
trade_labelled ::mutate(log_dist_km = labelled(dist_km, label = "Log (distance [km])")) dplyr
As a next step, we run two regressions: One with Fixed Effects, the other one with a Poisson Pseudo-Maximum-Likelihood (PPML) model. With this, we have a very basic Gravity Model in two variations.
<-
gravity_ols feols(Euros ~ log_dist_km | Origin + Destination + Product + Year,
data = trade_labelled)
<-
gravity_pois fepois(Euros ~ log_dist_km |
+ Destination + Product + Year,
Origin data = trade_labelled)
Creating a table
With our two regressions ready, we’re all set to create a regression table.
Basic table
Creating a summary table of our equations is very straight-forward with modelsummary
: We create a list of the models we want to show, and then input that to modelsummary
.
list(gravity_ols,
|>
gravity_pois) modelsummary()
(1) | (2) | |
---|---|---|
log_dist_km | -47643.024 | -0.002 |
(10905.572) | (0.000) | |
Num.Obs. | 38325 | 38325 |
R2 | 0.286 | 0.751 |
R2 Adj. | 0.285 | 0.751 |
R2 Within | 0.031 | 0.269 |
R2 Within Adj. | 0.031 | 0.269 |
AIC | 1533832.6 | 1e+12 |
BIC | 1534328.7 | 1e+12 |
RMSE | 118498460.83 | 88327120.62 |
Std.Errors | by: Origin | by: Origin |
FE: Origin | X | X |
FE: Destination | X | X |
FE: Product | X | X |
FE: Year | X | X |
Prettier table
However, the output in Table 1 is not very pretty yet. It’s not entirely clear yet what the independent variable is, we don’t know what (1) and (2) stand for, and we have a mass of goodness-of-fit measures. Let’s customize our table!
As a first step, we make create some helper functions. These help with formatting the table.
Code
# format numbers: thousand separator
<- function(x, n_digits = 2) {
f ifelse(is.na(x),
"",
formatC(
x,digits = n_digits,
big.mark = ",",
format = "f"
))
}
<- purrr::partial(f, n_digits = 0)
f_0
# function for GOF measures we don't want to change
<- function(x) list("raw" = x, "clean" = x, "fmt" = NA) keep_format
Then, we create a list where we format our goodness-of-fit (GOF) measures. Some of the default names are not so pretty, e.g. Num.Observations without a space between the two words – so we switch them to shorter or nicer names.
Code
# format # observations and R^2, keep the rest
<- list(
gof_tidy list(
"raw" = "nobs",
"clean" = "Observations",
"fmt" = f_0
),list(
"raw" = "r.squared",
"clean" = "R\u00B2",
"fmt" = 3
),keep_format("FE: Origin"),
keep_format("FE: Destination"),
keep_format("FE: Product"),
keep_format("FE: Year")
)
Let’s change the labels for our regression. We do this by adding names to the list’s input (lines 1–2).
As a next step, let’s use the label we added earlier on, by setting coef_rename
to true. Let’s also format the numbers using the formatting function we set up earlier, f
.
Let’s omit some of the goodnes-of-fit (gof) indicators, since we don’t need all of them here. We do this with the gof_map
argument, to which we supply our GOF list from the last step. Alternatively, we could use a regex in the gof_omit
argument: anything that matches the expression in line 4 will not be included.
Also, I’m used to adding stars where a coefficient is significant. This is not added by default, so let’s simply set the stars
argument to true.
Then, we’re setting the output to gt
, which gives us the possibility to further style the table with the package gt
. We add a header detailing our dependent variable. Then, we add a spanner to tell readers that OLS and Poisson are regression models.
list(OLS = gravity_ols,
Poisson = gravity_pois) |>
modelsummary(
coef_rename = TRUE,
gof_map = gof_tidy,
fmt = f,
stars = TRUE,
output = "gt"
) |>
# add header and spanner
tab_header(title = "Dependent variable: Trade flow [€]") |>
tab_spanner(
label = "Regression model",
columns = c("OLS", "Poisson")
)
Dependent variable: Trade flow [€] | ||
---|---|---|
Regression model | ||
OLS | Poisson | |
Log (distance [km]) | -47,643.02*** | -0.00*** |
(10,905.57) | (0.00) | |
Observations | 38,325 | 38,325 |
R² | 0.286 | 0.751 |
FE: Origin | X | X |
FE: Destination | X | X |
FE: Product | X | X |
FE: Year | X | X |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
For advanced features, check out the documentation
These are some pretty normal results. However, you may to e.g. bootstrap standard errors, omit coefficients, or add more information. I really recommend checking out modelsummary
’s documentation!
Citation
@online{zeller2023,
author = {Zeller, Sarah},
title = {Creating a Basic Regression Table with `Modelsummary`},
date = {2023-11-28},
url = {https://sarahzeller.github.io/blog/posts/showing-regression-results/},
langid = {en}
}