J. Bayoán Santiago Calderón

Research Economist

Bureau of Economic Analysis

Biography

José Bayoán Santiago Calderón is a research economist in the national economic accounts research group at the Bureau of Economic Analysis. Before joining the federal statistical system, Dr. Santiago Calderón had years of experience in the private sector as a research scientist at various companies. Bayoán also held academic appointments with the Biocomplexity Institute and Initiative at the University of Virginia, where he started his career in public service.

His research has centered on improving decision-making, emphasizing the public good (e.g., science policy). His transdisciplinary research approach has enabled him to routinely collaborate across disciplines and develop a diverse set of domain knowledge and methodological toolset. He also participates in various open-source software communities (e.g., JuliaLang) and civic activism (e.g., Code4PR, Mentes Puertorriqueñas en Accion).

I read quite a bit of manga, manhwa & manhua as well as watching anime, donghua and KDramas. Sometimes, I even have time and energy to play some videogames. Check out the relevant profiles: PSN Profiles, MyAnimeList, MyDramaList.

Interests

Science Policy
Data Science
Repurposing Administrative Data for Statistical Purposes
Computational Economics

Education

PhD in Economics, 2019
Claremont Graduate University
MA in Economics, 2015
Claremont Graduate University
BA in Economics, 2014
Southwestern University

Skills

Technical

Statistics, Data Science, Machine Learning

Regression Analysis
Econometrics
Data Analysis

Programming

Julia, R, Python , SQL , Git , Linux
Scientific Computing, Software Development
High-Performance Computing , Cloud Computing

Methods

Agent-Based Modeling (ABM)
Social Network Analysis
Geographic Information Systems (GIS)
Text Mining, Natural Language Processing (NLP)

Hobbies

My dog

I have a two years old doggo named Sadaharu

Reading

Manga (One Piece, One Punch-Man)

TV Shows

Currently watching some animes like Spy x Family and a bunch of isekais

Videogames

Currently playing Baldur’s Gate III and Stellaris

Experience

Research Economist · Full-time

Bureau of Economic Analysis

May 2021 – Present Suitland, MD

My research focus is in the areas of the digital economy, intellectual property products (IPPs), and own account procurement. Some of my work include exploring a range of measurement issues concerning intangibles assets such as software (e.g., own account, open-source) and data.

Supervisor: Jon D. Samuels

Senior Scientist II · Contract · Part-time

Pumas-AI, Inc.

August 2018 – December 2023 Remote

My strategic & scientific consulting work included projects across multiple therapeutic areas such as rare diseases, metabolic diseases, pediatrics, oncology, and vaccines. I conducted multiple clinical trial evaluations of the safety and efficacy of formulations to support drug development strategies at the company (e.g., study design, stop/go decisions, model development, biomarker exploration, dose selection) and regulatory processes (e.g., type-C meetings).

My work in the product development team was primarily the development of the module for bioequivalence (BE) analysis in the Pumas ecosystem. This included the design, implementation, testing, documentation, maintenance, and coordination with the other components of the ecosystem.

Supervisors:

Joga Gobburu, PhD (Strategic & Scientific Consulting)
Vijay Ivaturi, PhD (Product Development)

Postdoctoral Research Associate · Full-time

University of Virginia

May 2019 – May 2021 Arlington, VA

Worked on multiple projects with federal and state agencies helping them meet their missions. These included:

Sponsor: National Center for Science and Engineering Statistics (NCSES)
- Measuring the Scope and Impact of Open Source Software
- Skilled Technical Workforce
Sponsor: Defense Advanced Research Projects Agency (DARPA)
- Computational Simulation of Online Social Behavior (SocialSim) [see Summary]
Arlington County Police Department (ACPD)
- Evaluation of the Arlington Restaurant Initiative

Other work activities include:

Assisted the infrastructure team on helping the team best use UVA computing resources (e.g., high-performance computing) and best practices (e.g., version control).
Served as project lead and instructor for the Data Science for the Public Good Young Scholars Program (DSPG).

Supervisor: Sallie Ann Keller, PhD

Research Assistant · Contract

QuantEcon

June 2018 – May 2019 Remote

Worked on creating and improving the QuantEcon lectures for Julia and its related open source ecosystem (e.g., updating lectures from Julia v0.6 to Julia v1).

Supervisor: Jesse Perla, PhD

Data Science for the Public Good Fellow · Contract

Virginia Tech

May 2018 – August 2020 Arlington, VA

As a fellow for the Data Science for the Public Good (DSPG) program, I worked on two projects:

Measuring the Scope and Impact of Open Source Software
Evaluation of the Arlington Restaurant Initiative

Supervisor: Gizem Korkmaz, PhD

Research Fellow · Contract

Michigan State University

June 2016 – July 2016 East Lansing, MI

Teaching Assistant for ECSP 891 Advanced Research of the American Economic Association Summer Program.

Supervisor: Lisa DeNell Cook, PhD

Data Scientist · Contract

Res-Intel

September 2016 – August 2018 Remote

Worked on three projects:

Res-Intel Software Development
Behavioral program analyses for Southern California Edison (SCE) (example)
California Advanced Homes Program Study (proposal, results)

Supervisor: Hal T. Nelson, PhD

Teaching Assistant · Contract

Johns Hopkins University

June 2015 – August 2015 Baltimore, MD

Fundamentals of Microeconomics (15S.MICO.JHU.1A, 15S.MICO.JHU.2A)

Supervisor: Sean Gibbons

Research Assistant · Part-time

Center for Neuroeconomics Studies

September 2014 – May 2015 Claremont, CA

Assisted the data collection and analysis of several experiments. Some tasks included recruitment, training, running experiments (human and animal subjects). Some of the methods for the data collection and analysis included computer laboratory experiments, drug studies (e.g., alcohol, testosterone), biometric research such as electroencephalogram (EGG) and electrocardiogram (ECG), eye-tracking, and blood work. Several of the tools used included z-Tree and iMotions-BIOPAC.

Supervisor: Paul Joseph Zak, PhD

Intern · Contract · Full-time

Sapientis

May 2011 – August 2011 San Juan, PR

Summer intern through the Agents of Change Empowerment and Retention Program (PARACa) fellowship, a Mentes Puertorriqueñas en Acción initiative. Worked on the annual report to the state senate on the status of the K-12 public education system titled “El estado actual de las escuelas públicas en Plan de Mejoramiento en Puerto Rico, año escolar 2010-2011”. Assisted the Coalition for Equity and High Quality Education (CECE, for its Spanish acronym) and members of the school community in the choosing and design of the advocacy plan for the year 2011-2012.

Supervisor: David Ortiz

Certificates

Preparing Future Faculty: Certificate in College Teaching

Claremont Graduate University Aug 2019

The certificate in college teaching helps students become an inclusive leader of teaching and learning, connecting you with like-minded faculty who seek to build excellence and foster inclusivity in teaching. Based on the Scholarship of Teaching (SoTL), the program helps students develop pedagogical knowledge and skills through workshops, courses, teaching clinics, and individual consulting on all aspects of teaching and learning, including developing teaching philosophy statements, syllabi, and electronic portfolios.

Statistics with R

Duke University Mar 2017

In this Specialization, you will learn to analyze and visualize data in R and create reproducible data analysis reports, demonstrate a conceptual understanding of the unified nature of statistical inference, perform frequentist and Bayesian statistical inference and modeling to understand natural phenomena and make data-based decisions, communicate statistical results correctly, effectively, and in context without relying on statistical jargon, critique data-based claims and evaluated data-based decisions, and wrangle and visualize data with R packages for data analysis. You will produce a portfolio of data analysis projects from the Specialization that demonstrates mastery of statistical data analysis from exploratory analysis to inference to modeling, suitable for applying for statistical analysis or data scientist positions.

See certificate

Machine Learning

University of Washington Feb 2017

This specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data. Learners will implement and apply predictive, classification, clustering, and information retrieval machine learning algorithms to real datasets throughout each course in the specialization. They will walk away with applied machine learning and Python programming experience.

See certificate

Fundamentals of Computing

Rice University Jun 2016

This specialization covers much of the material that first-year Computer Science students take at Rice University. Students learn sophisticated programming skills in Python from the ground up and apply these skills in building more than 20 fun projects. The Specialization concludes with a Capstone exam that allows the students to demonstrate the range of knowledge that they have acquired in the Specialization.

See certificate

Data Science

Johns Hopkins University Sep 2015

The Data Science Specialization covers the concepts and tools for an entire data science pipeline. Successful participants learn how to use the tools of the trade, think analytically about complex problems, manage large data sets, deploy statistical principles, create visualizations, build and evaluate machine learning algorithms, publish reproducible analyses, and develop data products.

See certificate

Featured Publications

Gizem Korkmaz, J. Bayoán Santiago Calderón, Brandon L. Kramer, Ledia Guci, Carol A. Robbins

April, 2024 Research Policy

From GitHub to GDP: A framework for measuring open source software innovation

Open source software (OSS) is software that anyone can review, modify, and distribute freely, usually with only minor restrictions such as giving credit to the creator of the work. The use of OSS is growing rapidly, due to its value in increasing firm and economy-wide productivity. Despite its widespread use, there is no standardized methodology for measuring the scope and impact of this fundamental intangible asset. This study presents a framework to measure the value of OSS using data collected from GitHub, the largest platform in the world with over 100 million developers. The data include over 7.6 million repositories where software is developed, stored, and managed. We collect information about contributors and development activity such as code changes and license detail. By adopting a cost estimation model from software engineering, we develop a methodology to generate estimates of investment in OSS that are consistent with the U.S. national accounting methods used for measuring software investment. We generate annual estimates of current and inflation-adjusted investment as well as the net stock of OSS for the 2009–2019 period. Our estimates show that the U.S. investment in 2019 was $37.8 billion with a current-cost net stock of $74.3 billion.

J. Bayoán Santiago Calderón, Dylan G Rassier

October, 2022 NBER/CRIW-TPEG

Valuing the US Data Economy Using Machine Learning and Online Job Postings

With the recent proliferation of data collection and uses in the digital economy, the understanding and statistical treatment of data stocks and flows is of interest among compilers and users of national economic accounts. In this paper, we measure the value of own-account data stocks and flows for the US business sector by summing the production costs of data-related activities implicit in occupations. Our method augments the traditional sum-of-costs methodology for measuring other own-account intellectual property products in national economic accounts by proxying occupation-level time-use factors using a machine learning model and the text of online job advertisements (Blackburn 2021). In our experimental estimates, we find that annual current-dollar investment in own-account data assets for the US business sector grew from $84 billion in 2002 to $186 billion in 2021, with an average annual growth rate of 4.2 percent. Cumulative current-dollar investment for the period 2002–2021 was $2.6 trillion. In addition to annual current-dollar investment, we present historical-cost net stocks, real growth rates, and effects on value-added by industrial sector.

Chris Rackauckas, Yingbo Ma, Andreas Noack, Vaibhav Dixit, Patrick Kofod Mogensen, Chris Elrod, Mohammad Tarek, Simon Byrne, Shubham Maddhashiya, J. Bayoán Santiago Calderón, Michael Hatherly, Joakim Nyberg, Jogarao V.S. Gobburu, Vijay Ivaturi

March, 2022 bioRxiv New Results

Accelerated Predictive Healthcare Analytics with Pumas, A High Performance Pharmaceutical Modeling and Simulation Platform

Pharmacometric modeling establishes causal quantitative relationships between administered dose, tissue exposures, desired and undesired effects and patient’s risk factors. These models are employed to de-risk drug development and guide precision medicine decisions. However, pharmacometric tools have not been designed to handle today’s heterogeneous big data and complex models. We set out to design a platform that facilitates domain-specific modeling and its integration with modern analytics to foster innovation and readiness in healthcare. Pumas demonstrates estimation methodologies with dramatic performance advances. New ODE solver algorithms, such as coeficient-optimized higher order integrators and new automatic stiffness detecting algorithms which are robust to frequent discontinuities, give rise to a median 4x performance improvement across a wide range of stiff and non-stiff systems seen in pharmacometric applications. These methods combine with JIT compiler techniques, such as statically-sized optimizations and discrete sensitivity analysis via forward-mode automatic differentiation, to further enhance the accuracy and performance of the solving and parameter estimation process. We demonstrate that when all of these techniques are combined with a validated clinical trial dosing mechanism and non-compartmental analysis (NCA) suite, real applications like NLME fitting see a median 81x acceleration while retaining the same accuracy. Meanwhile in areas with less prior software optimization, like optimal experimental design, we see orders of magnitude performance enhancements over competitors. Further, Pumas combines these technical advances with several workflows that are automated and designed to boost productivity of the day-to-day user activity. Together we show a fast pharmacometric modeling framework for next-generation precision analytics.

J. Bayoán Santiago Calderón

July, 2020 6th JuliaCon Conference

Econometrics.jl

Econometrics.jl is a package for econometrics analysis. It provides a series of most common routines for applied econometrics such as models for continuous, nominal, and ordinal outcomes, longitudinal estimators, variable absorption, and support for convenience functionality such as weights, rank deficient, and robust variance covariance estimators. This study complements the package through a discussion of the motivation, placing the contribution within the Julia ecosystem and econometrics software in general, and provides insights on current gaps and ways the Julia ecosystem can evolve.

J. Bayoán Santiago Calderón

May, 2019 CGU Theses & Dissertations

On Cluster Robust Models

Cluster robust models are a kind of statistical models that attempt to estimate parameters considering potential heterogeneity in treatment effects. Absent heterogeneity in treatment effects, the partial and average treatment effect are the same. When heterogeneity in treatment effects occurs, the average treatment effect is a function of the various partial treatment effects and the composition of the population of interest. The first chapter explores the performance of common estimators as a function of the presence of heterogeneity in treatment effects and other characteristics that may influence their performance for estimating average treatment effects. The second chapter examines various approaches to evaluating and improving cluster structures as a way to obtain cluster-robust models. Both chapters are intended to be useful to practitioners as a how-to guide to examine and think about their applications and relevant factors. Empirical examples are provided to illustrate theoretical results, showcase potential tools, and communicate a suggested thought process.