Maximal Marginal Relevance

8 minute read

Published: July 15, 2024

Maximal Marginal Relevance (MMR) is a technique designed to retrieve documents that are both relevant and diverse. This article provides an implementation of MMR and demonstrates its effectiveness using a dataset of news articles.

Imagine you want to find the most relevant news articles about London from a dataset. Here’s an example dataset containing news article titles:

article_titles = [
    # Culture
    "The Revival of Ancient Traditions in Modern Society",
    "The Evolution of Theatre in London",
    ...

    # Weather
    "Understanding the Science Behind Extreme Weather Events",
    "How London's Weather Has Changed Over the Decades",
    ...

    # Things to Do in London
    "Historic Landmarks You Can't Miss in London",
    "Family-Friendly Activities in London",
    ...
]

You can find the complete dataset and source code for this article in the Jupyter Notebook.

To approach this problem, one method is to use a similarity measure to rank documents by their relevance to a query. A common measure is cosine similarity applied to word embeddings of the documents and the query. The algorithm follows these steps:

Create word embedding representation of the documents and the query.
Compute cosine similarity between the representations of the query and each document.
Select N documents most similar to the query.

We can use the transformers library to leverage a pretrained language model (such as the classic bert-base-uncased) to generate text embeddings:

from transformers import AutoTokenizer, AutoModel

def get_embeddings(text_list, model_name="bert-base-uncased"):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name)

    inputs = tokenizer(text_list, return_tensors="pt", padding=True, truncation=True)
    
    with torch.no_grad():
        outputs = model(**inputs)
    
    embeddings = outputs.last_hidden_state.mean(dim=1)
    embeddings = embeddings.numpy()
    
    return embeddings

This function generates embeddings by averaging the hidden states of the model’s last layer for each input text.

Cosine similarity is computed as follows:

def cosine_similarity(vector1, vector2):
    dot_product = np.dot(vector1, vector2)
    norm_vector1 = np.linalg.norm(vector1)
    norm_vector2 = np.linalg.norm(vector2)
    return dot_product / (norm_vector1 * norm_vector2)

To compute cosine similarities between the query and each document, we can combine the query and documents into a single list and then compute similarities between each pair of embeddings (that would also get us similarities between each document, but we would need that later in the article):

def compute_similarities(documents, query):
    quiery_and_documents = [query] + documents
    embeddings = get_embeddings(quiery_and_documents)
    num_embeddings = embeddings.shape[0]
    similarities = np.zeros((num_embeddings, num_embeddings))

    for i in range(num_embeddings):
        for j in range(num_embeddings):
            similarities[i, j] = cosine_similarity(embeddings[i], embeddings[j])

    return similarities

article_similarities = compute_similarities(article_titles, query="London")

Next, rank the documents based on their similarity to the query:

def similarity_relevance(similarities):
    return sorted(range(len(similarities) - 1),
                  key=lambda i: similarities[0, i+1],
                  reverse=True)

Here are the top 7 relevant news articles:

article_similarity_order = similarity_relevance(article_similarities)
print_selected(article_similarity_order[:7], article_similarities, article_titles)

Cosine Similarity	Top 7 Relevant News Articles
0.7103888392448425	Family-Friendly Activities in London
0.7099422812461853	The Best Photography Exhibitions in London
0.7076156735420227	Unique Shopping Experiences in London
0.7006320357322693	Family-Friendly Events in London
0.6981644034385681	Best Photo Spots in London
0.6938719153404236	Top Photography Spots in London
0.6879022121429443	Exploring London’s Cultural Scene

These articles are relevant to London, but some are duplicates:

“Family-Friendly Activities in London” and “Family-Friendly Events in London”
“Best Photo Spots in London” and “Top Photography Spots in London”

In many cases, retrieving both relevant and diverse documents is essential. For instance, when using Retrieval-Augmented Generation (RAG) in LLMs, the context window is limited, so we prefer selecting text snippets that are relevant but not duplicates. The LangChain framework supports RAG and provides the Maximal Marginal Relevance (MMR) technique [1, 2, 4, 5].

A document has high marginal relevance if it is both relevant to the query and contains minimal similarity to previously selected documents. MMR is defined as:

M M R \overset{d e f}{=} \underset{D_{i} \in R ∖ S}{a r g m a x} [λ * S i m_{1} (D_{i}, Q) - (1 - λ) * \underset{D_{j} \in S}{m a x} (S i m_{2} (D_{i}, D_{j}))] [1, 2]

where:

C - document collection
Q - query
R - ranked list of documents retrieved by the IR system ( $R \subseteq C$ )
S - subset of documents in R already provided to the user ( $S \subseteq C$ )
R \ S - subset of documents not yet offered to the user
$λ$ - hyperparameter to prefer more relevant or more diverse documents

Let’s implement this technique (see [5, 6] for other implementations). First, we select the most relevant document. Then we iteratively select the document that gives the maximum MMR score (most relevant one and most dissimilar to the documents that we have already selected), until we select the requested number of documents.

We’ll reuse the previously computed similarity matrix and now use similarities between each pair of documents. The document-document similarity metric can differ from the query-document similarity, but for simplicity, we’ll use cosine similarity as well.

def maximal_marginal_relevance(similarities, num_to_select, lambda_param):
    if similarities.shape[0] <= 1 or num_to_select <= 0:
        return []

    most_similar = np.argmax(similarities[0, 1:])

    selected = [most_similar]
    candidates = set(range(len(similarities) - 1))
    candidates.remove(most_similar)

    while (len(selected) < num_to_select):
        if not candidates:
            break

        mmr_scores = {}
        for i in candidates:
            mmr_scores[i] = (lambda_param * similarities[i+1, 0] -
                (1 - lambda_param) * max([similarities[i+1, j+1] for j in selected]))

        next_best = max(mmr_scores, key=mmr_scores.get)
        selected.append(next_best)
        candidates.remove(next_best)
    return selected

Let’s select the top 7 relevant news articles using the MMR technique:

article_mmr_order = maximal_marginal_relevance(article_similarities,
                                               num_to_select = 7,
                                               lambda_param = .5)
print_selected(article_mmr_order, article_similarities, article_titles)

Cosine Similarity	Top 7 Relevant News Articles
0.7103888392448425	Family-Friendly Activities in London
0.685788631439209	The Evolution of Theatre in London
0.6981644034385681	Best Photo Spots in London
0.6072949767112732	Weather Patterns Affecting London This Summer
0.5159177780151367	Python Tips for Mastering Data Science
0.6396927237510681	The Best Parks and Green Spaces in London
0.7076156735420227	Unique Shopping Experiences in London

The new set of articles does not contain duplicates. However, there is an article not quite related to our query: “Python Tips for Mastering Data Science”. This issue can be mitigated by selecting a better $λ$ value.

Referring to the MMR formula again:

$λ = 1$ : Computes incrementally the standard relevance-ranked list
$λ = 0$ : Computes a maximal diversity ranking among documents in R
$λ \in [0, 1]$ : Optimizes a linear combination of both criteria

According to [1]:

Users wishing to sample the information space around the query, should set $λ$ , at a smaller value, and those wishing to focus in on multiple potentially overlapping or reinforcing relevant documents, should set $λ$ , to a value closer to 1. For document retrieval, we found that a particularly effective search strategy… is to start with a small $λ$ (e.g. $λ = .3$ ) in order to understand the information space in the region of the query, and then to focus on the most important parts using a reformulated query and a larger value of $λ$ (e.g. $λ = .7$ ).

By setting a higher value of $λ = .7$ :

article_mmr_order_07 = maximal_marginal_relevance(article_similarities,
                                                  num_to_select = 7,
                                                  lambda_param = .7)
print_selected(article_mmr_order_07, article_similarities, article_titles)

Cosine Similarity	Top 7 Relevant News Articles
0.7103888392448425	Family-Friendly Activities in London
0.685788631439209	The Evolution of Theatre in London
0.7099422812461853	The Best Photography Exhibitions in London
0.7076156735420227	Unique Shopping Experiences in London
0.6879022121429443	Exploring London’s Cultural Scene
0.6938719153404236	Top Photography Spots in London
0.6432080268859863	Day Trips from London: Exploring the Countryside

Now, all articles are related to London, and there are no duplicates. Increasing $λ$ further to .8, then we get:

article_mmr_order_08 = maximal_marginal_relevance(article_similarities,
                                                  num_to_select = 7,
                                                  lambda_param = .8)
print_selected(article_mmr_order_08, article_similarities, article_titles)

Cosine Similarity	Top 7 Relevant News Articles
0.7103888392448425	Family-Friendly Activities in London
0.7099422812461853	The Best Photography Exhibitions in London
0.7076156735420227	Unique Shopping Experiences in London
0.685788631439209	The Evolution of Theatre in London
0.6879022121429443	Exploring London’s Cultural Scene
0.6938719153404236	Top Photography Spots in London
0.6981644034385681	Best Photo Spots in London

Again, all articles are related to London, but we have a duplicate pair: “Top Photography Spots in London” and “Best Photo Spots in London”. For this example, $λ = .7$ works well.

For the complete implementation, refer to the Jupyter Notebook.

References:

Share on

Twitter Facebook LinkedIn

Anton Dergunov

Maximal Marginal Relevance

Share on

You May Also Enjoy

Automating HDR Photo Creation in Adobe Lightroom