← Back to all insights

Archie Abrams

How Shopify thinks about growth

strategic thinkingexperience advantage

Tip

Archie Abrams talks about long-term experiment results at Shopify: “I think it’s in the 30 to 40% range. 30 to 40% of experiments that show short-term lift show no long-term lift when we look back a year later. The most common is it actually isn’t a long-term lift from a lot of things that you might think in the short-term are. It’s usually more pull-forward effect than you fully realize.”

Turns out AI product experiments work the same way.

Your growth team ships an experiment: new AI feature that recommends next actions in the workflow. Week 1 metrics look incredible—30% lift in feature adoption, 15% lift in daily active users, team celebrates. Your VP of Product wants to roll it out globally immediately to hit quarterly targets.

Younger leaders see short-term lift and declare victory. They ship globally, update the board deck with the win, move on to the next experiment. They haven’t run enough experiments over years to know that early metrics often don’t hold.

You’ve seen this pattern before. In 2018, you shipped a notification feature—huge early engagement lift, disappeared after 3 months when users turned notifications off. In 2020, you added gamification—initial spike in activity, 6 months later usage returned to baseline. You know from experience that week 1 metrics predict almost nothing about year 1 value.

So before you roll out globally, you insist on keeping a holdout group and measuring this cohort 3 months out, 6 months out, 12 months out. The VP thinks you’re being overly cautious. The team thinks you’re slowing down momentum. Month 3: lift drops to 15%. Month 6: lift drops to 5%. Month 12: no measurable difference. Turns out the AI feature didn’t create new value—it just pulled forward actions users would have done anyway, but slightly earlier in their workflow.

This judgment—knowing that short-term experiment wins often don’t translate to long-term value—comes from running enough experiments over decades to see the full lifecycle. Junior growth leaders optimize for shipping wins and hitting quarterly metrics. You’ve watched enough “wins” evaporate to know the difference between genuine value creation versus pull-forward effects, novelty effects, and Hawthorne effects. That calibration only comes from seeing both patterns play out repeatedly.

Context

Archie Abrams is VP of Product and Head of Growth at Shopify, leading over 600 people across product, design, engineering, and growth marketing. Shopify runs long-term holdout experiments (1-3 years) and discovered that 30-40% of experiments showing short-term wins show zero long-term lift when measured a year later.

For experienced executives evaluating AI product experiments, this pattern recognition is critical—you’ve run enough experiments over decades to know early metrics are unreliable predictors of lasting value. That wisdom comes from seeing the full experiment lifecycle repeatedly.