Experiments

Running experiments like A/B Testing and Multivariate Testing is a powerful technique in product development for continuous learning and iterating based on feedback. Featurevisor can help manage those experiments with a strong governance model in your organization.

What is an A/B Test?#

An A/B test, also known as split testing or bucket testing, is a controlled experiment used to compare two or more variations of a specific feature to determine which one performs better. It is commonly used in fields like web development, marketing, user experience design, and product management.

It is common practice to call the default/existing behaviour as control variation, and the new/experimental behaviour as treatment variation.

Why run A/B Tests?#

The primary goal of an A/B test is to measure the impact of the variations on predefined metrics or key performance indicators (KPIs). These metrics can include conversion rates, click-through rates, engagement metrics, revenue, or any other measurable outcome relevant to the experiment.

By comparing the performance of the different variants, statistical analysis is used to determine if one variant outperforms the others with statistical significance. This helps decision-makers understand which variant is likely to have a better impact on the chosen metrics.

Process of running an A/B Test#

A/B testing follows a structured process that typically involves the following steps:

Research and identify: Find a customer or business problem and turn it into a testable hypotheses by determining the specific element, such as a webpage, design element, pricing model, or user interface component, that will be subjected to variation.
Power analysis: Determine if there's enough traffic or users to run the experiment and achieve statistical significance.
Create variations: Develop multiple versions of the element, ensuring they are distinct and have measurable differences.
Split traffic or users: Randomly assign users or traffic into separate groups, with each group exposed to a different variant.
Run the experiment: Implement the variants and collect data on the predefined metrics for each group over a specified period.
Analyze the results: Use statistical analysis to compare the performance of the variants and determine if any differences are statistically significant.
Make informed decisions: Based on the analysis, evaluate which variation performs better and whether it should be implemented as the new default or further optimized.

What about Multivariate Testing?#

A multivariate test is an experimentation technique that allows you to simultaneously test multiple variations of multiple elements or factors within a single experiment.

Unlike A/B testing, which focuses on comparing two or more variants of a single element, multivariate testing involves testing combinations of elements and their variations to understand their collective impact on user behavior or key metrics.

Difference between A/B Tests and Multivariate Tests#

Often times, A/B tests with 3 or more variations are referred to as A/B/n tests. We are considering them both as A/B tests in this guide.

	A/B Tests	Multivariate Tests
Purpose	Compare two or more variants of a single element	Simultaneously test multiple elements and variations
Variants	Two or more variants (Control and Treatment)	Multiple variants for each element being tested
Scope	Focuses on one element at a time	Tests combinations of elements and their variations
Complexity	Relatively simpler to set up and analyze	and analyze More complex to set up and analyze
Statistical Significance	Typically requires fewer samples to achieve significance	Requires larger sample sizes to achieve significance
Insights	Provides insights into the impact of individual changes	Provides insights into the interaction between changes
Test Duration	Generally shorter duration	Often requires longer duration to obtain reliable results
Examples	Ideal for testing isolated changes like UI tweaks, copy variations	Useful for testing multi-factor changes like page redesigns, interaction between multiple elements

Our application#

For this guide, let's say our application consists of a landing page containing these elements:

Hero section: The main section of the landing page, which includes:
- headline
- subheading, and
- call-to-action (CTA) button

We now want to run both A/B Tests and Multivariate Tests using Featurevisor.

Understanding the building blocks

Before going further in this guide, you are recommended to learn about the building blocks of Featurevisor to understand the concepts used in this guide:

Attributes: building block for conditions
Segments: conditions for targeting users
Features: feature flags and variables with rollout rules
SDKs: how to consume datafiles in your applications

The quick start can be very handy as a summary.

A/B Test on CTA button#

Let's say we want to run an A/B Test on the CTA button in the Hero section of your landing page.

The two variations for a simple A/B test experiment would be:

control: The original CTA button with the text "Sign up"
treatment: The new CTA button with the text "Get started"

We can express that in Featurevisor as follows:

features/ctaButton.yml

description: CTA button
tags:
  - all
bucketBy: deviceId
variations:
  - value: control
    description: Original CTA button
    weight: 50
  - value: treatment
    description: New CTA button that we want to test
    weight: 50
rules:
  production:
    - key: everyone
      segments: '*' # everyone
      percentage: 100 # 100% of the traffic

We just set up our first A/B test experiment that is:

rolled out to 100% of our traffic to everyone
with a 50/50 split between the control and treatment variations
to be bucketed against deviceId attribute (since we don't have the user logged in yet)

Importance of bucketing

Featurevisor relies on bucketing to make sure the same user or anonymous visitor always sees the same variation no matter how many times they go through the flow in your application.

This is important to make sure the user experience is consistent across devices (if user's ID is known) and sessions.

You can read further about bucketing in these pages:

The deviceId attribute can be an unique UUID generated and persisted at client-side level where SDK evaluates the features.

If we wanted to a more targeted rollout, we could have used segments to target specific users or groups of users:

features/ctaButton.yml

# ...
rules:
  production:
    - key: nl
      segments:
        - netherlands
        - iphoneUsers
      percentage: 100 # enabled for iPhone users in NL only
    - key: everyone
      segments: '*'
      percentage: 0 # disabled for everyone else

You can read further how segments are defined in a feature's rollout rules here.

Evaluating feature with SDKs#

Now that we have defined our feature, we can use Featurevisor SDKs to evaluate the CTA button variation in the runtime, assuming we have already built and deployed the datafiles to our CDN.

For Node.js and browser environments, install the JavaScript SDK:

Command

$ npm install --save @featurevisor/sdk

Then, initialize the SDK in your application:

your-app/index.js

import { createInstance } from '@featurevisor/sdk'
const DATAFILE_URL = 'https://cdn.yoursite.com/datafile.json'
const datafileContent = await fetch(DATAFILE_URL)
  .then((res) => res.json())
const f = createInstance({
  datafile: datafileContent,
})

Now we can evaluate the ctaButton feature wherever we need to render the CTA button:

const featureKey = 'ctaButton'
const context = {
  deviceId: 'device-123',
  country: 'nl',
  deviceType: 'iphone',
}
const ctaButtonVariation = f.getVariation(featureKey, context)
if (ctaButtonVariation === 'treatment') {
  // render the new CTA button
  return 'Get started'
} else {
  // render the original CTA button
  return 'Sign up'
}

Here we see only two variation cases, but we could have had more than two variations in our A/B test experiment.

Multivariate Test on Hero element#

Let's say we want to run a Multivariate Test on the Hero section of your landing page.

Previously we only ran an A/B test on the CTA button's text, but now we want to run a Multivariate Test on the Hero section affecting some or all its elements. We can map our requirements in a table below:

Variation	Headline	CTA button text
control	Welcome	Sign up
treatment1	Welcome	Get started
treatment2	Hello there	Sign up
treatment2	Hello there	Get started

Instead of creating a separate feature per element, we can create a single feature for the Hero section and define multiple variables for each element.

The relationship can be visualized as:

one feature
having multiple variations
each variation having its own set of variable values

features/hero.yml

description: Hero section
tags:
  - all
bucketBy: deviceId
# define a schema of all variables
# scoped under `hero` feature first
variablesSchema:
  headline:
    type: string
    defaultValue: Welcome
  ctaButtonText:
    type: string
    defaultValue: Sign up
variations:
  - value: control
    weight: 25
  - value: treatment1
    weight: 25
    variables:
      # we only need to define variables inside variations,
      # if the values are different than the default values
      ctaButtonText: Get started
  - value: treatment2
    weight: 25
    variables:
      headline: Hello there
  - value: treatment3
    weight: 25
    variables:
      headline: Hello there
      ctaButtonText: Get started
rules:
  production:
    - key: everyone
      segments: '*'
      percentage: 100

We just set up our first Multivariate test experiment that is:

rolled out to 100% of our traffic to everyone
with an even 25% split among all its variations
with each variation having different values for the variables

Evaluating variables#

In your application, you can access the variables of the hero feature as follows:

your-app/index.js

const featureKey = 'hero'
const context = { deviceId: 'device-123' }
const headline = f.getVariable(featureKey, 'headline', context)
const ctaButtonText = f.getVariable(featureKey, 'ctaButtonText', context)

Use the values inside your hero element (component) when you render it.

Tracking activations#

We understood how to create features for defining simple A/B tests and also more complex multivariates using variables in Featurevisor, and then evaluate them in the runtime in our applications when we need those values.

But we also need to track the performance of our experiments to understand which variation is doing better than others.

This is where hooks API come in handy. Featurevisor SDK provides a way to register hooks that can be used to intercept the evaluation process and perform custom actions:

your-app/index.js

import { createInstance } from '@featurevisor/sdk'
const f = createInstance({
  datafile: '...',
  hooks: [
    {
      name: 'trackActivationsHook',
      // this hook will be called after each variation evaluation
      after: function (evaluation) {
        const { reason, type, featureKey, variationValue } = evaluation;
        // error found
        if (reason === 'error') {
          return
        }
        // not a variation evaluation
        if (type !== "variation") {
          return
        }
        const feature = f.getFeature(featureKey);
        // feature has no variations
        if (!feature || !feature.variations) {
          return;
        }
        // track
        const { userId } = f.getContext();
        const trackPayload = {
          event: 'featurevisorActivation',
          featureKey,
          variationValue,
          userId,
        }
        // send the trackPayload to your analytics service
      }
    }
  ],
})

As an example, you can refer to the guide of Google Tag Manager for tracking purposes.

Featurevisor is not an analytics platform

It is important to understand that Featurevisor is not an analytics platform. It is a feature management tool that helps you manage your features and experiments with a Git-based workflow, and helps evaluate your features in your application with its SDKs.

Mutually exclusive experiments#

Often times when we are running multiple experiments together, we want to make sure that they are mutually exclusive. This means that a user should not be bucketed into multiple experiments at the same time.

In more plain words, the same user should not be exposed to multiple experiments together, and only one experiment at a time avoiding any overlap.

One example: if User X is exposed to feature hero which is running our multivariate test, then the same User X should not be exposed to feature wishlist which is running some other A/B test in the checkout flow of the application.

For those cases, you are recommended to see the Groups functionality of Featurevisor, which will help you achieve exactly that without your applications needing to do any extra code changes at all.

Conclusion#

We learned how to use Featurevisor for:

Creating both simple A/B tests and more complex Multivariate tests
evaluate them in the runtime in our applications
track the performance of our experiments
activate the features when we are sure that the user has been exposed to them
make multiple experiments mutually exclusive if we need to

Featurevisor can be a powerful tool in your experimentation toolkit, and can help you run experiments with a strong governance model in your organization given every change goes through a Pull Request in your Git repository and nothing gets merged without reviews and approvals.

What is an A/B Test?#

Why run A/B Tests?#

Process of running an A/B Test#

What about Multivariate Testing?#

Difference between A/B Tests and Multivariate Tests#

Our application#

A/B Test on CTA button#

Evaluating feature with SDKs#

Multivariate Test on Hero element#

Evaluating variables#

Tracking activations#

Mutually exclusive experiments#

Further reading#

Conclusion#