Skip to content

A small project to try, evaluate, and select SOTA neural summarization models for my personal use.

Notifications You must be signed in to change notification settings

ig-perez/sota-summarizers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is this?

A simple project that helped me to choose a SOTA neural model for summarization. It contains a bash script that iterates over all possible combinations of:

  • Models to choose from
  • Generation techniques
  • Articles from a test corpus

This script executes a Python script that uses HuggingFace's pre-trained models over the XSum and CNN / Daily Mail datasets. The generated summaries are scored with ROUGE1 and ROUGEL to have a numerical idea about how each model performs.

This repository is the accompanying code of the article "A technical primer on neural summarization and SOTA model selection with HuggingFace Transformers 🗜️" available here.

How can I use it?

You can call the bash script (runner.sh) to run all combinations and produce a CSV file with the ROUGE scores for later analysis or run the Python script to summarize a single document.

As a reference, these are the results I got on my first run:


Source: Own

If you want to try the Python script, you first have to install the dependencies (see the pyproject.toml file) and then take into account that the following parameters are mandatory:

  • model_name: The name of the pre-trained model to use.
  • article: The relative path to a text file containing the input document and a gold standard. Use the *** sequence as a separator.
  • generation: The decoding strategy to use.

Check the bash script for all possible values. For example, the next instruction will summarize the article in file one.txt using Beam Search with the PEGASUS model over the XSum dataset.

$ python summarize.py --model_name="google/pegasus-xsum" --article="articles/one.txt" --generation="search"

Besides the summary of the input document, the script, with a tweak, will return all ROUGE scores :).

Where can I learn more about NLP?

You can check this repository or read some of my blog posts. Have fun! :)

About

A small project to try, evaluate, and select SOTA neural summarization models for my personal use.

Resources

Stars

Watchers

Forks