Creating a basic regression table with modelsummary

code
summary
Author

Sarah Zeller

Published

November 28, 2023

In this post, we have a look at creating a basic regression table with modelsummary and fixest.

Setup

First, we load the necessary libraries.

library(fixest)
library(modelsummary)
library(dplyr)
library(labelled)
library(gt)

Then, we prepare the data. We simply use the data that comes with the fixest package. For a prettier output in the table later on, we add a label to our variable of interest.

data(trade)
trade_labelled <- trade |> 
  dplyr::mutate(log_dist_km = labelled(dist_km, label = "Log (distance [km])"))

As a next step, we run two regressions: One with Fixed Effects, the other one with a Poisson Pseudo-Maximum-Likelihood (PPML) model. With this, we have a very basic Gravity Model in two variations.

gravity_ols <-
  feols(Euros ~ log_dist_km | Origin + Destination + Product + Year,
        data = trade_labelled)
gravity_pois <-
  fepois(Euros ~ log_dist_km |
           Origin + Destination + Product + Year,
         data = trade_labelled)

Creating a table

With our two regressions ready, we’re all set to create a regression table.

Basic table

Creating a summary table of our equations is very straight-forward with modelsummary: We create a list of the models we want to show, and then input that to modelsummary.

list(gravity_ols,
     gravity_pois) |>
  modelsummary()
Table 1: Some regressions – basic
 (1)   (2)
log_dist_km -47643.024 -0.002
(10905.572) (0.000)
Num.Obs. 38325 38325
R2 0.286 0.751
R2 Adj. 0.285 0.751
R2 Within 0.031 0.269
R2 Within Adj. 0.031 0.269
AIC 1533832.6 1e+12
BIC 1534328.7 1e+12
RMSE 118498460.83 88327120.62
Std.Errors by: Origin by: Origin
FE: Origin X X
FE: Destination X X
FE: Product X X
FE: Year X X

Prettier table

However, the output in Table 1 is not very pretty yet. It’s not entirely clear yet what the independent variable is, we don’t know what (1) and (2) stand for, and we have a mass of goodness-of-fit measures. Let’s customize our table!

As a first step, we make create some helper functions. These help with formatting the table.

Code
# format numbers: thousand separator
f <- function(x, n_digits = 2) {
  ifelse(is.na(x),
         "",
         formatC(
           x,
           digits = n_digits,
           big.mark = ",",
           format = "f"
         ))
}

f_0 <- purrr::partial(f, n_digits = 0)
  
#  function for GOF measures we don't want to change
keep_format <- function(x) list("raw" = x, "clean" = x, "fmt" = NA)

Then, we create a list where we format our goodness-of-fit (GOF) measures. Some of the default names are not so pretty, e.g. Num.Observations without a space between the two words – so we switch them to shorter or nicer names.

Code
# format # observations and R^2, keep the rest
gof_tidy <- list(
  list(
    "raw" = "nobs",
    "clean" = "Observations",
    "fmt" = f_0
  ),
  list(
    "raw" = "r.squared",
    "clean" = "R\u00B2",
    "fmt" = 3
  ),
  keep_format("FE: Origin"),
  keep_format("FE: Destination"),
  keep_format("FE: Product"),
  keep_format("FE: Year")
)

Let’s change the labels for our regression. We do this by adding names to the list’s input (lines 1–2).

As a next step, let’s use the label we added earlier on, by setting coef_rename to true. Let’s also format the numbers using the formatting function we set up earlier, f.

Let’s omit some of the goodnes-of-fit (gof) indicators, since we don’t need all of them here. We do this with the gof_map argument, to which we supply our GOF list from the last step. Alternatively, we could use a regex in the gof_omit argument: anything that matches the expression in line 4 will not be included.

Also, I’m used to adding stars where a coefficient is significant. This is not added by default, so let’s simply set the stars argument to true.

Then, we’re setting the output to gt, which gives us the possibility to further style the table with the package gt. We add a header detailing our dependent variable. Then, we add a spanner to tell readers that OLS and Poisson are regression models.

list(OLS = gravity_ols,
     Poisson = gravity_pois) |>
  modelsummary(
    coef_rename = TRUE,
    gof_map = gof_tidy,
    fmt = f,
    stars = TRUE,
    output = "gt"
  ) |>
  # add header and spanner
  tab_header(title = "Dependent variable: Trade flow [€]") |>
  tab_spanner(
    label = "Regression model",
    columns = c("OLS", "Poisson")
    )
Table 2: Some regressions – prettier
Dependent variable: Trade flow [€]
Regression model
OLS Poisson
Log (distance [km]) -47,643.02*** -0.00***
(10,905.57) (0.00)
Observations 38,325 38,325
0.286 0.751
FE: Origin X X
FE: Destination X X
FE: Product X X
FE: Year X X
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

For advanced features, check out the documentation

These are some pretty normal results. However, you may to e.g. bootstrap standard errors, omit coefficients, or add more information. I really recommend checking out modelsummary’s documentation!

Citation

BibTeX citation:
@online{zeller2023,
  author = {Zeller, Sarah},
  title = {Creating a Basic Regression Table with `Modelsummary`},
  date = {2023-11-28},
  url = {https://sarahzeller.github.io/blog/posts/showing-regression-results/},
  langid = {en}
}
For attribution, please cite this work as:
Zeller, Sarah. 2023. “Creating a Basic Regression Table with `Modelsummary`.” November 28, 2023. https://sarahzeller.github.io/blog/posts/showing-regression-results/.