Getting started with Inseq
The Inseq library is a Pytorch-based toolkit inteded to democratize the access to some common use-cases in the study of sequence generation models for interpretability purposes. At the moment, the library supports the following set of models and techniques:
Models
All the models made available through the AutoModelForSeq2SeqLM interface of the 🤗 transformers library (among others, T5, Bart and all >1000 MarianNMT variants) can be used in combination with feature attribution methods.
All the models made available through the AutoModelForCausalLM interface of the 🤗 transformers library (among others, GPT-2, GPT-NeoX, Bloom and OPT/Galactica).
Interpretability Methods
At the moment, only gradient-based feature attribution methods sourced from the Captum library and basic attention attribution methods are available, but other popular occlusion and attention-based techniques will soon follow. The list of all available methods can be obtained by using the
list_feature_attribution_methods()
method. Each method either points to its original implementation, and is thoroughly documented in its docstring.
Installing Inseq
The latest version of Inseq can be installed from PyPI using pip install inseq
. To gain access to some Inseq functionalities, you will need to install optional dependencies (e.g. use pip install inseq[datasets]
to enable datasets attribution via the Inseq CLI). For installing the dev version and contributing, please follow the instructions in Inseq readme file.
The AttributionModel
class
The AttributionModel
class is a torch.nn.Module
intended to seamlessly wrap any sequence generation Pytorch model to enable its interpretability. More specifically, the class adds the following capabilities to the wrapped model:
A
load()
method to load the weights of the wrapped model from a saved checkpoint, locally or remotely. This is called when using theload_model()
function, which is the suggested way to load a model.An
attribute()
method used to perform feature attribution using the loaded model.Multiple utility methods like
encode()
andembed()
that are also used internally by theattribute
method.
AttributionModel
children classes belong to two categories: architectural classes like EncoderDecoderAttributionModel
defines methods that are specific to a certain model architecture, while framework classes like HuggingfaceModel
specify methods that are specific to a certain modeling framework (e.g. encoding with a tokenizer in 🤗 transformers). The final class that will be instantiated by the user is a combination of the two, e.g. HuggingfaceEncoderDecoderModel
for a sequence-to-sequence model from the 🤗 transformers library.
When a model is loaded with load_model()
, a FeatureAttribution
can be attached to it to specify which feature attribution technique should be used on it. Different families of attribution methods such as GradientAttributionRegistry
are made available, each containing multiple methods (e.g. IntegratedGradientsAttribution
, DeepLiftAttribution
).
The following image provides a visual hierarchy of the division between AttributionModel
and FeatureAttribution
subclasses:

The attribute
method
The attribute()
method provides a easy to use and flexible interface to generate feature attributions with sequence generation models. In its most simple form, the selected model is used to generate one or more output sequences with default parameters, and then those are attributed with the specified feature attribution method.
import inseq
model = inseq.load_model("Helsinki-NLP/opus-mt-en-fr", "saliency")
out = model.attribute(input_texts="Hello world, here's the Inseq library!")
The attribute
method supports a wide range of customizations. Among others:
Specifying one string in
generated_texts
for every sentence ininput_texts
allows attributing custom generation outputs. Useful to answer the question “How would the following output be justified in light of the inputs by the model?”.attr_pos_start
andattr_pos_end
can be used to attribute only specific spans of the generated output, making the attribution process more efficient when one is only interested in attributions at a specific output step.output_step_attributions
will fill thestep_attributions
property in the output object with step-by-step attributions that are normally produced but then discarded after converting them in sequence attributions specific to every sequence in the attributed batch.attribute_target
can be used to specify that target-side prefix should also be attributed for encoder-decoder models besides the original source-to-target attribution. This would populate thetarget_attribution
filed in the output, which would otherwise be left empty. In the decoder-only case, the parameter is not used since only the prefix is attributed by default.step_scores
allows for computing custom scores at every generation step, with some such as tokenprobability
and output distributionentropy
being defined by default in Inseq.attributed_fn
allows defining a custom output function for the model, enabling advanced use cases such as contrastive explanations.
The FeatureAttributionOutput
class
In the code above, the out
object is a FeatureAttributionOutput
instance, containing a list of sequence_attributions
and additional useful ìnfo
regarding the attribution that was performed. In this example sequence_attributions
has length 1 since a single sequence was attributed. Printing the output of the above result:
FeatureAttributionOutput({
sequence_attributions: list with 1 elements of type GradientFeatureAttributionSequenceOutput: [
GradientFeatureAttributionSequenceOutput({
source: list with 13 elements of type TokenWithId:[
'▁Hello', '▁world', ',', '▁here', '\'', 's', '▁the', '▁In', 'se', 'q', '▁library', '!', '</s>'
],
target: list with 12 elements of type TokenWithId:[
'▁Bonjour', '▁le', '▁monde', ',', '▁voici', '▁la', '▁bibliothèque', '▁Ins', 'e', 'q', '!', '</s>'
],
source_attributions: torch.float32 tensor of shape [13, 12, 512] on cpu,
target_attributions: None,
step_scores: {},
sequence_scores: None,
attr_pos_start: 0,
attr_pos_end: 12,
})
],
step_attributions: None,
info: {
...
}
})
The tensor in the source_attribution
field contains one attribution score per model’s hidden size (512 here) for every source token (13 in this example, shown in source
) at every step of generation (12, shown in target
). The GradientFeatureAttributionSequenceOutput
is a special class derived by the regular FeatureAttributionSequenceOutput
that would automatically handle the last dimension of attribution tensors by summing an L2-normalizing via an Aggregator
. This allows using the out.show
function and automatically obtaining a 2-dimensional attribution map despite the original attribution tensor is 3-dimensional.