Dr. Jonathan Louis Gu | ML Engineer & Economist

Everest and Frankie

I built this game with Everest (my toddler son) and Frankie (my puppy son) in mind.

About

I'm a PhD Economist and ML Engineer specializing in building rigorous, high-impact systems where economic theory meets computational power. My work at Instacart handles billions in GMV through marketplace optimization systems that integrate causal inference with reinforcement learning algorithms. My research on correcting conventional update rules in reinforcement learning demonstrates my approach: looking beyond immediate objectives to maximize long-term value across all states of a system.

My research combines the mathematical precision of economics with the computational power of machine learning. Whether analyzing student decision-making through regression discontinuity or optimizing bidding strategies in complex auctions, I focus on building systems that deliver measurable impact while requiring minimal maintenance. I've shown that the correct update rule must simulate sequences toward the end state rather than focusing on immediate rewards—a principle that guides my approach to both technical and business challenges.

I've been shaped by experiences across multiple cultures - from Hong Kong to California to Boston. As I wrote in my essay "Man Move Live," this multicultural perspective informs my problem-solving approach: "树挪死, 人挪活" (Tree Move Dead, Man Move Live). To truly generate innovative ideas, I exist in a flow of interactions between many disciplines. The extent of bounded rationality in human decision-making is difficult to measure, but my work proves that rigorously defined algorithms can find optimal policies even when "perfect play" is intractable.

Get in Touch

📧 jon_th_ngu@gm_il.com (Guess the _)
📱 5•• [dot] ••• [dot] 5••9
LinkedIn

Reinforcement Learning

Corrected update rule for reinforcement learning that properly maximizes long-term value functions across all states. Expert in balancing exploration vs. exploitation and simulation-based policy optimization using TensorFlow and PyTorch.

Causal Inference

Regression discontinuity design to identify causal effects in economic policy. Control function approach to mitigate endogeneity bias. Techniques for measuring incremental impact when multiple interventions overlap.

Economic Mechanisms

Applied microeconomics and structural econometric modeling. Expertise in marketplace design, auction theory, and multi-agent incentive structures. Specialized in empirically measuring student/consumer responses to policy changes.

Statistical Prediction

Created statistical models for performance in competitive environments that outperform standard double exponential distributions. Implemented profitable betting systems with 15.8% ROI in high-uncertainty environments.

Engineering & ML Systems

Python, Julia, SQL for production ML systems handling billions in GMV. Built transparent algorithms with minimal maintenance requirements through careful anticipation of failure modes and graceful degradation paths.

Education

PhD in Economics from UCLA (2014-2020), BA in Statistics and Economics from UC Berkeley (2007-2011). Research with Prof. Susan Athey's team at Microsoft Research on search auction mechanisms.

Experience

2020 - Present

Senior ML Engineer & Economist II

Instacart San Francisco

Optimized Bidding Algorithm

Created and implemented the Ads Optimized Bidding Algorithm using my corrected reinforcement learning update rule to properly simulate outcomes to the end state. Controls >60% of advertiser spend and indirectly informs an additional 20% of ads revenue. Simplified advertiser workflow—they specify only their objective and budget; the algorithm automatically allocates spend to maximize long-term ROI across all possible states.

Reinforcement Learning Value Function Optimization Multi-state Simulation

Econometric Modeling for Marketplaces

Architected marketplace optimization engine using structural economic modeling techniques similar to my college choice research. Identified causal effects of policy changes using regression discontinuity and control function approaches to mitigate endogeneity. System handles over $30B in GMV annually while maintaining theoretical constraints from economic theory.

Structural Econometrics Causal Identification Marketplace Equilibrium

Multi-Stage Decision Process for Critical Items

Designed system to predict "the value of customer patience lost if item X is not found" - a key component in my patent #7. Modeled this as a complex multi-stage decision process where agents make sequential choices under uncertainty. The system optimizes order routing by quantifying the precise business cost of missing items, looking beyond immediate outcomes to future states.

Sequential Decision Modeling Expectation Maximization State-Space Optimization

Causal Effects with Overlapping Interventions

Applied techniques from my "Synthetic Treatment Effects for Targeting" patent to develop distribution-free confidence intervals using bootstrapping. This approach solves the critical problem of measuring incremental effects when multiple marketing interventions overlap—a challenge similar to identifying student responses to financial incentives in my college choice research.

Distribution-Free Methods Bootstrapped Confidence Intervals Treatment Effect Estimation

2013 - 2014

Research Scientist

Microsoft Research Boston

Contributed to Professor Susan Athey's team analyzing Bing search auctions where I applied game theory to model optimal bidding strategies. This work presaged my later research on reinforcement learning for auction optimization, as both involve finding equilibrium behavior in complex games where the number of possible future states becomes intractable through conventional analysis.

Auction Theory Game Theory Search Ads

2006 - Present

Horse Race Predictionist

Independent Research Hong Kong (Part-Time)

Collaborated with Professor Ming Gao Gu to create statistical models that more accurately capture the natural distribution of performance in competitive environments. This work directly connects to my reinforcement learning research: both involve creating policies that maximize expected rewards under conditions of uncertainty and bounded rationality. Achieved 15.8% ROI through algorithms that look beyond standard double exponential distributions to capture nuanced performance patterns.

15.8% ROI through algorithmic betting strategies

Statistical Modeling Prediction Systems Optimization

Research

Updating the Update Rule in Reinforcement Learning

Developed a corrected update rule for reinforcement learning that addresses limitations in the conventional Williams (1992) approach. My research demonstrates that when updating parameters of a value function, we should look ahead toward the end of the game to maximize the entire value function rather than focusing on simplistic one-step increases.

College Choice and State-Based Grant Aid

Exploited discontinuity in GPA requirements for Cal Grant aid to identify student responses to changes in state-based grant aid. Research showed each additional dollar of state-based aid reduces net-tuition by 73 cents, with only 7.4% of recipients switching from not enrolling to enrolling, demonstrating the efficiency of targeted educational subsidies.

Causal Inference in Marketing Attribution

Pioneered innovative methods for measuring incremental impact in marketing interventions, tackling the challenging problem of attribution when multiple marketing channels overlap. Created distribution-free confidence intervals using bootstrapping and empirical data to provide reliable uncertainty estimates for business decisions.

Production-Grade ML Systems with Minimal Maintenance

Researched techniques for building robust ML systems that require minimal on-call engineering support. Focused on integrating large language models into production systems for business-critical decision making with explicit attention to failure modes, graceful degradation, and transparency in decision-making algorithms.

Patents

Auction for Double Wide Ads

Describes how to implement an algorithm that can compare ads that take up different sizes on the user interface, enabling fair competition between different ad formats in marketplace auctions.

Synthetic Treatment Effects for Targeting

Describes how to rank users generally by their "causal"/"incremental" response to marketing materials, enabling more efficient allocation of marketing resources.

A Reinforcement Learning Algorithm for Optimized Bidding

Describes how to adjust advertisers' bids day to day based on feedback from real-world performance. This is done in a transparent, "non-blackbox" manner that maintains advertiser trust while maximizing performance.

Dynamic Offer Targeting

Describes how to target marketing materials to customers based on the response of users with similar purchase behavior, enabling personalized marketing at scale.

Bucketing Likelihoods from Targeting

Describes how to construct confidence intervals in a believable, distribution-free manner using bootstrapping and empirical data, providing reliable uncertainty estimates for business decisions.

Critical Items: Predict the VCP Lost If Item Is Not Found

The invention optimizes order routing by quantifying the precise business cost of missing items. It enables intelligent decisions about which warehouse should fulfill an order based on item availability and criticality, maximizing order profitability and customer satisfaction.

Writings

Man Move Live: Reflections on Growth and Change

"树挪死, 人挪活" (Shu Nuo Si, Ren Nuo Huo) — "Tree Move Dead, Man Move Live." This Chinese saying captures my journey through different cultures and environments. Like roots seeking new soil, my transplant from California to Boston to San Francisco has forced me to grow in unexpected ways, teaching me that change begets enlightenment. I believe that to truly generate innovative ideas, I must exist in a flow of interactions between many disciplines — not just academia, but also philosophy, creative exploration, and cultural navigation.

I've come to know that I don't learn new things; I only get used to them. In this vein, each new environment has gotten me used to different ways of thinking, communicating, and problem-solving. This perspective has proven invaluable in my work on complex reinforcement learning algorithms and structural economic modeling, where adapting to new problem spaces and maintaining intellectual flexibility is essential for innovation.

Looking Beyond Immediate Rewards in ML Systems

The conventional Williams (1992) approach to reinforcement learning is fundamentally shortsighted. When designing the Critical Items System at Instacart, I applied my research showing that we should not simply update parameters based on immediate outcomes. Instead, we must simulate sequences all the way toward the end state in a manner that ratifies our value function as correct. This approach maximizes the value of every state in the system, not just the current one—a critical distinction when prioritizing items in customer orders where the consequences extend beyond immediate metrics.

Using Discontinuities for Causal Identification

My research on college choice exploits a natural discontinuity in GPA requirements for Cal Grant aid to measure precisely how students respond to financial incentives. I found that each dollar of state-based aid reduces student-paid tuition by 73 cents, with only 7.4% of recipients switching from not enrolling to enrolling. This approach to causal inference through regression discontinuity allows us to separate correlation from causation, creating reliable predictions even when faced with complex multi-agent decision-making processes and limited data.

Modeling Multi-Stage Decision Processes

Human decision-making often involves sequential choices under uncertainty. My research models how students make application decisions without knowing admission outcomes, then make enrollment decisions once uncertainty is resolved. The same principles apply to marketplace optimization—agents act with incomplete information, requiring systems that can handle the full distribution of potential outcomes rather than merely optimizing for expected values. This approach has proven crucial in building production systems that maintain performance across edge cases and unexpected scenarios.