Models

class inseq.models.AttributionModel(**kwargs)[source]

Base class for all attribution models.

model: The wrapped model to be attributed.

model_name

The name of the model.

Type:: str

is_encoder_decoder

Whether the model is an encoder-decoder model.

Type:: bool

pad_token

The pad token used by the model.

Type:: str

embed_scale

Value used to scale the embeddings.

Type:: float

device

The device on which the model is located.

Type:: str

attribution_method

The attribution method used alongside the model.

Type:: FeatureAttribution

is_hooked

Whether the model is currently hooked by the attribution method.

Type:: bool

default_attributed_fn_id

The id for the default step function used as attribution target.

Type:: str

attribute(input_texts: str | Sequence[str], generated_texts: str | Sequence[str] | None = None, method: str | None = None, override_default_attribution: bool | None = False, attr_pos_start: int | None = None, attr_pos_end: int | None = None, show_progress: bool = True, pretty_progress: bool = True, output_step_attributions: bool = False, attribute_target: bool = False, step_scores: list[str] = [], include_eos_baseline: bool = False, attributed_fn: str | Callable[[...], Float32[Tensor, 'batch_size']] | None = None, device: str | None = None, batch_size: int | None = None, generate_from_target_prefix: bool = False, generation_args: dict[str, Any] = {}, **kwargs) → FeatureAttributionOutput[source]

Perform sequential attribution of input texts for every token in generated texts using the specified method.

Parameters:

input_texts (str or list(str)) – One or more input texts to be attributed.
generated_texts (str or list(str), optional) – One or more generated texts to be used as targets for the attribution. Must match the number of input texts. If not provided, the model will be used to generate the texts from the input texts (default behavior). Specifying this argument enables attribution for constrained decoding, which should be interpreted carefully in presence of distributional shifts compared to natural generations (Vamvas and Sennrich, 2021).
method (str, optional) – The identifier associated to the attribution method to use. If not provided, the default attribution method specified when initializing the model will be used.
override_default_attribution (bool, optional) – Whether to override the default attribution method specified when initializing the model permanently, or to use the method above for a single attribution.
attr_pos_start (int, optional) – The starting position of the attribution. If not provided, the whole input text will be attributed. Allows for span-targeted attribution of generated texts.
attr_pos_end (int, optional) – The ending position of the attribution. If not provided, the whole input text will be attributed. Allows for span-targeted attribution of generated texts.
show_progress (bool) – Whether to show a progress bar for the attribution, default True.
pretty_progress (bool, optional) – Whether to show a pretty progress bar for the attribution. Automatically set to False for IPython environments due to visualization issues. If False, a simple tqdm progress bar will be used. default: True.
output_step_attributions (bool, optional) – Whether to fill the step_attributions field in FeatureAttributionOutput with step-wise attributions for each generated token. default: False.
attribute_target (bool, optional) – Specific to encoder-decoder models. Whether to attribute the target prefix alongside the input text. default: False. Note that an encoder-decoder attribution not accounting for the target prefix does not correctly reflect the overall input importance, since part of the input is not included in the attribution.
step_scores (list(str)) – A list of step function identifiers specifying the step scores to be computed alongside the attribution process. Available step functions are listed in list_step_functions().
include_eos_baseline (bool, optional) – Whether to include the EOS token in attributed tokens when using an attribution method requiring a baseline. default: False.
attributed_fn (str or Callable, optional) – The identifier associated to the step function to be used as attribution target. If not provided, the one specified in default_attributed_fn_id ( model default) will be used. If the provided string is not a registered step function, an error will be raised. If a callable is provided, it must be a function matching the requirements for a step function.
device (str, optional) – The device to use for the attribution. If not provided, the default model device will be used.
batch_size (int, optional) – The batch size to use to dilute the attribution computation over the set of inputs. If no batch size is provided, the full set of input texts will be attributed at once.
generate_from_target_prefix (bool, optional) – Whether the generated_texts should be used as target prefixes for the generation process. If False, the generated_texts will be used as full targets. This option is only available for encoder-decoder models, since the same behavior can be achieved by modifying the input texts for decoder-only models. Default: False.
**kwargs – Additional keyword arguments. These can include keyword arguments for the attribution method, for the generation process or for the attributed function. Generation arguments can be provided explicitly as a dictionary named generation_args.

Returns:

The attribution output object containing the attribution scores, step-scores, optionally step-wise attributions and general information concerning attributed texts and the attribution process.

Return type:

FeatureAttributionOutput

configure_interpretable_embeddings(**kwargs) → None[source]

Configure the model with interpretable embeddings for gradient attribution.

This method needs to be defined for models that cannot receive embeddings directly from their forward method parameters, to allow the usage of interpretable embeddings as surrogate for feature attribution methods. Model that support precomputed embedding inputs by design can skip this method.

forward(*args, **kwargs) → Float[Tensor, 'batch_size vocab_size'][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

remove_interpretable_embeddings(**kwargs) → None[source]

Removes interpretable embeddings used for gradient attribution.

If the configure_interpretable_embeddings method is defined, this method needs to be defined to allow restoring original embeddings for the model. This is required for methods using the decorator @unhooked since they require the original model capabilities.

setup(device: str | None = None, attribution_method: str | None = None, **kwargs) → None[source]: Move the model to device and in eval mode.

Framework Classes

class inseq.models.HuggingfaceModel(model: str | PreTrainedModel, attribution_method: str | None = None, tokenizer: str | PreTrainedTokenizerBase | None = None, device: str | None = None, model_kwargs: dict[str, Any] | None = {}, tokenizer_kwargs: dict[str, Any] | None = {}, **kwargs)[source]

Model wrapper for any ForCausalLM and ForConditionalGeneration model on the HuggingFace Hub used to enable feature attribution. Corresponds to AutoModelForCausalLM and AutoModelForSeq2SeqLM auto classes.

_autoclass

The HuggingFace model class to use for initialization. Must be defined in subclasses.

Type:: Type[transformers.AutoModel]

model

the model on which attribution is performed.

Type:: transformers.AutoModelForSeq2SeqLM or transformers.AutoModelForSeq2SeqLM

tokenizer

the tokenizer associated to the model.

Type:: transformers.AutoTokenizer

device

the device on which the model is run.

Type:: str

encoder_int_embeds

the interpretable embedding layer for the encoder, used for layer attribution methods in Captum.

Type:: captum.InterpretableEmbeddingBase

decoder_int_embeds

the interpretable embedding layer for the decoder, used for layer attribution methods in Captum.

Type:: captum.InterpretableEmbeddingBase

embed_scale

scale factor for embeddings.

Type:: float, optional

tokenizer_name

The name of the tokenizer in the Huggingface Hub. Default: use model name.

Type:: str, optional

clean_tokens(tokens: Sequence[Sequence[str]], skip_special_tokens: bool = False, as_targets: bool = False) → Sequence[Sequence[str]][source]

Cleans special characters from tokens.

Parameters:

tokens (OneOrMoreTokenSequences) – A list containing one or more lists of tokens.
skip_special_tokens (bool, optional, defaults to True) – If true, special tokens are skipped.
as_targets (bool, optional, defaults to False) – If true, a target tokenizer is used to clean the tokens.

Returns:

A list containing one or more lists of cleaned tokens.

Return type:

OneOrMoreTokenSequences

abstract configure_embeddings_scale() → None[source]: Configure the scale factor for embeddings.

encode(texts: str | Sequence[str], as_targets: bool = False, return_baseline: bool = False, include_eos_baseline: bool = False, add_bos_token: bool = True, add_special_tokens: bool = True) → BatchEncoding[source]

Encode one or multiple texts, producing a BatchEncoding.

Parameters:

texts (str or list of str) – the texts to tokenize.
return_baseline (bool, optional) – if True, baseline token ids are returned.

Returns:

contains ids and attention masks.

Return type:

BatchEncoding

generate(inputs: str | Sequence[str] | BatchEncoding, return_generation_output: bool = False, skip_special_tokens: bool = True, output_generated_only: bool = False, **kwargs) → list[str] | tuple[list[str], ModelOutput][source]

Wrapper of model.generate to handle tokenization and decoding.

Parameters:

inputs (Union[TextInput, BatchEncoding]) – Inputs to be provided to the model for generation.
return_generation_output (bool, optional, defaults to False) – If true, generation outputs are returned alongside the generated text.
output_generated_only (bool, optional, defaults to False) – If true, only the generated text is returned. Relevant for decoder-only models that would otherwise return the full input + output.

Returns:

Generated text or a tuple of generated text and generation outputs.

Return type:

Union[List[str], Tuple[List[str], ModelOutput]]

static load(model: str | PreTrainedModel, attribution_method: str | None = None, tokenizer: str | PreTrainedTokenizerBase | None = None, device: str | None = None, model_kwargs: dict[str, Any] | None = {}, tokenizer_kwargs: dict[str, Any] | None = {}, **kwargs) → HuggingfaceModel[source]: Loads a HuggingFace model and tokenizer and wraps them in the appropriate AttributionModel.

Architectural Classes

class inseq.models.EncoderDecoderAttributionModel(**kwargs)[source]

AttributionModel class for attributing encoder-decoder models.

forward(*args, **kwargs) → Float[Tensor, 'batch_size vocab_size'][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

class inseq.models.DecoderOnlyAttributionModel(**kwargs)[source]

AttributionModel class for attributing encoder-decoder models.

forward(*args, **kwargs) → Float[Tensor, 'batch_size vocab_size'][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Final Classes

class inseq.models.HuggingfaceEncoderDecoderModel(model: str | PreTrainedModel, attribution_method: str | None = None, tokenizer: str | PreTrainedTokenizerBase | None = None, device: str | None = None, model_kwargs: dict[str, Any] | None = {}, tokenizer_kwargs: dict[str, Any] | None = {}, **kwargs)[source]

Model wrapper for any ForConditionalGeneration model on the HuggingFace Hub used to enable feature attribution. Corresponds to AutoModelForSeq2SeqLM auto classes in HF transformers.

model (: transformers.AutoModelForSeq2SeqLM): the model on which attribution is performed.

configure_embeddings_scale()[source]: Configure the scale factor for embeddings.

class inseq.models.HuggingfaceDecoderOnlyModel(model: str | PreTrainedModel, attribution_method: str | None = None, tokenizer: str | PreTrainedTokenizerBase | None = None, device: str | None = None, model_kwargs: dict[str, Any] | None = {}, tokenizer_kwargs: dict[str, Any] | None = {}, **kwargs)[source]

Model wrapper for any ForCausalLM or LMHead model on the HuggingFace Hub used to enable feature attribution. Corresponds to AutoModelForCausalLM auto classes in HF transformers.

model (: transformers.AutoModelForCausalLM): the model on which attribution is performed.

configure_embeddings_scale()[source]: Configure the scale factor for embeddings.