My last blog Post “Whats the Optimal?” highlighted the importance of selecting the right metrics and how ultimately revenue per visitor or customer is ultimately what eCommerce professionals wish to see improved.
One component of this is AOV (Average Order Value). By definition this is the aggregate of sales revenue divided by the number of orders. Improving AOV is a worthwhile ambition as all things being equal, an improvement in AOV will support a view that an experiment has / will positively influence revenue.
For non-stats heads – the “Central Limit Theorem” is the assumption that a distribution will approximate to a bell curve. The majority of calculations related to statistical significance or probability to beat control are underpinned by this assumption being correct.
For conversion rates this is generally accepted. However, AOV along with Time on Page, Units per Transaction, and other non-binomial metrics put a slight spanner in the works as they will not follow a bell curve but a skewed distribution. Outliers can influence the average and creating misinterpretations.
Also an example being:
- Group flight bookings
- Trade customer placing an extraordinarily large order
- Breakpoints on free delivery charges
Consider the following engineered scenario. An AB test is executed, and the following order values are captured. For simplicity, all order values for Experiment 1 were £20.
Experiment 1 | Experiment 2 | Lift | ||
£ 20.00 | £ 1.00 | |||
£ 20.00 | £ 1.00 | |||
£ 20.00 | £ 15.00 | |||
£ 20.00 | £ 18.00 | |||
£ 20.00 | £ 20.00 | |||
£ 20.00 | £ 70.00 | |||
AOV | £ 20.00 | £ 20.83 | 4% |
The average order value for Experiment 2 is £20.83. So, the determination would be a small but nerveless improvement of 4% on revenue.
However, when you look at Experiment 2, you can see two very low value orders and one very high value order. These are outliers and therefore unhelpful as they will unduly influence the average figure. It would be better to consider the Median value which is the calculated middle value of the ranked array.
Experiment 1 | Experiment 2 | |||
£ 20.00 | £ 1.00 | |||
£ 20.00 | £ 1.00 | |||
£ 20.00 | £ 15.00 | |||
£ 20.00 | £ 18.00 | |||
£ 20.00 | £ 20.00 | |||
£ 20.00 | £ 70.00 | |||
Average | £ 20.00 | £ 20.83 | ||
Lift | – | +4% | ||
Median | £ 20.00 | £ 16.50 | ||
Lift | -18% |
This is more revealing. We can see that the median value is actually 18% below the control (Experiment 1) and clearly less persuasive for adoption.
So, a bit of guidance when using AOV
- At a minimum compare both the mean and average AOV before considering a judgement.
- Test for statistical significance using an appropriate ranked test such as Wilcoxian or Kruschal. Webtrends Optimize users will benefit from having these approaches automatically calculated by the UI.
- Order values can be influenced significantly by seasonality. Consider how close the reported values are to the long term AOV average of the site before using them in revenue calculations.
- Increasing AOV really should be anticipated within the test hypotheses or at least explainable in the post analysis. If not, then be cautious.
- The best opportunities for AOV improvement come from upselling – online retailers often promote “buy £XX more for free delivery” or car hire companies offering higher levels of Insurance, Baby Seats etc. A more recent approach is with Airlines offering seat allocation, hold luggage and priority boarding. Often some of these genuine improvements can be overshadowed by the comparatively high base price. Car hire and flights are examples where there is a wide range of base prices maybe in the region of £100s of pounds and the value of the add-ons are sub £50. In these instances, distinctly capture the add on value and explore the differences.
For more information please get in touch.