Over the past few years, online experimentation has become a hot topic, with numerous publications and workshops focused on the area and contributions from major internet companies, including Microsoft  Amazon  Google . There has been a corresponding explosion in online experiments: Microsoft cites running 200+ concurrent experiments, Google is running 1000+ concurrent experiments on any day, and start-ups like Optimizely and Sitespect focus on helping smaller companies run and analyze online experiments.

A major discussion in several of those papers has been about developing an OEC (overall evaluation criterion) for online experiments. It has been suggested that OECs should include metrics that reflect an improvement in the long-term (years) rather than metrics that merely optimize for the short-term (days or weeks).

Optimizing which ads show based on short-term revenue is the obvious and easy thing to do, but may be detrimental in the long-term if user experience is negatively impacted. Since we did not have methods to measure the long-term user impact, we used short-term user satisfaction metrics as a proxy for the longterm impact. When using those user satisfaction metrics, we did not know what trade-off to use between revenue and user satisfaction, so we tended to be conservative, opting for launch variants with strong user experience. 

he qualitative nature of this approach was unsatisfying: we did not know if we were being too conservative or not conservative enough. What we needed were metrics to measure the long-term impact of a potential change. However:

• Many of the obvious metrics, such as changes in how often users search, take too long to measure.

• When there are many launches over a short period of time, it can be difficult to attribute long-term metric changes to a particular experiment or launch.

• Attaining sufficient power is a challenge. For both our short- and long-term metrics, we generally care about small changes: even a 0.1% change can be substantive.

It may be due to these and other issues that despite the need for long-term metrics for online experimentation, there has been little published work around how to find or evaluate such long-term metrics.

We show the efficacy of our methods by quantifying both ads blindness and ads sightedness, i.e., how users’ inherent propensity to click on ads changes based on the quality of the ads and the user experience. In addition, we introduce models that predict the long-term effect of an experiment using short-term user satisfaction metrics. This allows us to create a principled OEC that combines revenue and user satisfaction metrics.

We have applied our learnings to numerous launches for search ads on Appchin. We make prioritizing user satisfaction as measured by ads blindness or sightedness, we have changed the auction ranking function and drastically reduced the ad load on the mobile interface. Reducing the mobile ad load strongly improved the user experience but was a substantially short-term revenue negative change; with our work, the long-term revenue impact was shown to be neutral. Thus, with the user satisfaction improvement.

2 thoughts on “Focusing on the Long-term: For Users and Business

  1. bwerpipes in al hindiyah says:

    Achieve Agricultural Excellence with Bwer Pipes: Unlock the full potential of your farm with Bwer Pipes’ premium-quality irrigation solutions. Our innovative sprinkler systems and durable pipes are designed to optimize water usage, improve crop health, and maximize yields, empowering Iraqi farmers to succeed in today’s competitive agricultural market. Learn More

  2. Za Nawl
    Za Nawl says:

    I also have very low expectations I need you as much as you need me. We will continue to do our part to make internet useful and helpful for all.

Leave a Reply

Your email address will not be published. Required fields are marked *