Attribution Methods

class inseq.attr.FeatureAttribution(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]

Abstract registry for feature attribution methods.

attr

Attribute of child classes that will act as lookup name for the registry.

Type:

str

ignore_extra_args

Arguments used by default in the attribute step and thus ignored as extra arguments during attribution. The selection of defaults follows the Captum naming convention.

Type:

list of str

attribute(batch: DecoderOnlyBatch | EncoderDecoderBatch, attributed_fn: Callable[[...], Float32[Tensor, 'batch_size']], attr_pos_start: int | None = None, attr_pos_end: int | None = None, show_progress: bool = True, pretty_progress: bool = True, output_step_attributions: bool = False, attribute_target: bool = False, step_scores: list[str] = [], attribution_args: dict[str, Any] = {}, attributed_fn_args: dict[str, Any] = {}, step_scores_args: dict[str, Any] = {}) FeatureAttributionOutput[source]

Performs the feature attribution procedure using the specified attribution method.

Parameters:
  • batch (EncoderDecoderBatch or DecoderOnlyBatch) – The batch of sequences to attribute.

  • attributed_fn (Callable[..., SingleScorePerStepTensor]) – The function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). It must be a function that taking multiple keyword arguments and returns a tensor of size (batch_size,). If not provided, the default attributed function for the model will be used.

  • attr_pos_start (int, optional) – The initial position for performing sequence attribution. Defaults to 1 (0 is the default BOS token).

  • attr_pos_end (int, optional) – The final position for performing sequence attribution. Defaults to None (full string).

  • show_progress (bool, optional) – Whether to show a progress bar. Defaults to True.

  • pretty_progress (bool, optional) – Whether to use a pretty progress bar. Defaults to True.

  • output_step_attributions (bool, optional) – Whether to output a list of FeatureAttributionStepOutput objects for each step. Defaults to False.

  • attribute_target (bool, optional) – Whether to include target prefix for feature attribution. Defaults to False.

  • step_scores (list of str) – List of identifiers for step scores that need to be computed during attribution. The available step scores are defined in inseq.attr.feat.STEP_SCORES_MAP and new step scores can be added by using the register_step_function() function.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method.

  • attributed_fn_args (dict, optional) – Additional arguments to pass to the attributed function.

  • step_scores_args (dict, optional) – Additional arguments to pass to the step scores function.

Returns:

An object containing a list of sequence attributions, with

an optional added list of single FeatureAttributionStepOutput for each step and extra information regarding the attribution parameters.

Return type:

FeatureAttributionOutput

attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) FeatureAttributionStepOutput[source]

Performs a single attribution step for the specified attribution arguments.

Parameters:
  • attribute_fn_main_args (dict) – Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.

Returns:

A dataclass containing a tensor of source

attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.

Return type:

FeatureAttributionStepOutput

filtered_attribute_step(batch: DecoderOnlyBatch | EncoderDecoderBatch, target_ids: Int[Tensor, 'batch_size 1'], attributed_fn: Callable[[...], Float32[Tensor, 'batch_size']], target_attention_mask: Int[Tensor, 'batch_size 1'] | None = None, attribute_target: bool = False, step_scores: list[str] = [], attribution_args: dict[str, Any] = {}, attributed_fn_args: dict[str, Any] = {}, step_scores_args: dict[str, Any] = {}) FeatureAttributionStepOutput[source]

Performs a single attribution step for all the sequences in the batch that still have valid target_ids, as identified by the target_attention_mask. Finished sentences are temporarily filtered out to make the attribution step faster and then reinserted before returning.

Parameters:
  • batch (EncoderDecoderBatch or DecoderOnlyBatch) – The batch of sequences to attribute.

  • target_ids (torch.Tensor) – Target token ids of size (batch_size, 1) corresponding to tokens for which the attribution step must be performed.

  • attributed_fn (Callable[..., SingleScorePerStepTensor]) – The function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). The parameter must be a function that taking multiple keyword arguments and returns a tensor of size (batch_size,). If not provided, the default attributed function for the model will be used (change attribution_model.default_attributed_fn_id).

  • target_attention_mask (torch.Tensor, optional) – Boolean attention mask of size (batch_size, 1) specifying which target_ids are valid for attribution and which are padding.

  • attribute_target (bool, optional) – Whether to include target prefix for feature attribution. Defaults to False.

  • step_scores (list of str) – List of identifiers for step scores that need to be computed during attribution. The available step scores are defined in inseq.attr.feat.STEP_SCORES_MAP and new step scores can be added by using the register_step_function() function.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method.

  • attributed_fn_args (dict, optional) – Additional arguments to pass to the attributed function.

  • step_scores_args (dict, optional) – Additional arguments to pass to the step scores functions.

Returns:

A dataclass containing attribution tensors for source

and target attributions of size (batch_size, source_length) and (batch_size, prefix length). (target optional if attribute_target=True), plus batch information and any step score present.

Return type:

FeatureAttributionStepOutput

hook(**kwargs) None[source]

Hooks the attribution method to the model. Useful to implement pre-attribution logic (e.g. freezing layers, replacing embeddings, raise warnings, etc.).

classmethod load(method_name: str, attribution_model: AttributionModel | None = None, model_name_or_path: str | None = None, **kwargs) FeatureAttribution[source]

Load the selected method and hook it to an existing or available attribution model.

Parameters:
  • method_name (str) – The name of the attribution method to load.

  • attribution_model (AttributionModel, optional) – An instance of an AttributionModel child class. If not provided, the method will try to load the model from the model_name_or_path argument. Defaults to None.

  • model_name_or_path (ModelIdentifier, optional) – The name of the model to load or its path on disk. If not provided, an instantiated model must be provided. If the model is loaded in this way, the model will be created with default arguments. Defaults to None.

  • **kwargs – Additional arguments to pass to the attribution method __init__ function.

Raises:
  • RuntimeError – Raised if both or neither model_name_or_path and attribution_model are provided.

  • UnknownAttributionMethodError – Raised if the method_name is not found in the registry.

Returns:

The loaded attribution method.

Return type:

FeatureAttribution

prepare_and_attribute(sources: str | Sequence[str] | BatchEncoding | Batch, targets: str | Sequence[str] | BatchEncoding | Batch, attr_pos_start: int | None = None, attr_pos_end: int | None = None, show_progress: bool = True, pretty_progress: bool = True, output_step_attributions: bool = False, attribute_target: bool = False, step_scores: list[str] = [], include_eos_baseline: bool = False, attributed_fn: str | Callable[[...], Float32[Tensor, 'batch_size']] | None = None, attribution_args: dict[str, Any] = {}, attributed_fn_args: dict[str, Any] = {}, step_scores_args: dict[str, Any] = {}) FeatureAttributionOutput[source]

Prepares inputs and performs attribution.

Wraps the attribution method attribute() method and the prepare_inputs_for_attribution() method.

Parameters:
  • sources (FeatureAttributionInput) – The sources provided to the prepare() method.

  • targets (FeatureAttributionInput) – The targets provided to the prepare() method.

  • attr_pos_start (int, optional) – The initial position for performing sequence attribution. Defaults to 0.

  • attr_pos_end (int, optional) – The final position for performing sequence attribution. Defaults to None (full string).

  • show_progress (bool, optional) – Whether to show a progress bar. Defaults to True.

  • pretty_progress (bool, optional) – Whether to use a pretty progress bar. Defaults to True.

  • output_step_attributions (bool, optional) – Whether to output a list of FeatureAttributionStepOutput objects for each step. Defaults to False.

  • attribute_target (bool, optional) – Whether to include target prefix for feature attribution. Defaults to False.

  • step_scores (list of str) – List of identifiers for step scores that need to be computed during attribution. The available step scores are defined in inseq.attr.feat.STEP_SCORES_MAP and new step scores can be added by using the register_step_function() function.

  • include_eos_baseline (bool, optional) – Whether to include the EOS token in the baseline for attribution. By default the EOS token is not used for attribution. Defaults to False.

  • attributed_fn (str or Callable[..., SingleScorePerStepTensor], optional) – The identifier or function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). If it is a string, it must be a valid function. Otherwise, it must be a function that taking multiple keyword arguments and returns a tensor of size (batch_size,). If not provided, the default attributed function for the model will be used (change attribution_model.default_attributed_fn_id).

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method.

  • attributed_fn_args (dict, optional) – Additional arguments to pass to the attributed function.

  • step_scores_args (dict, optional) – Additional arguments to pass to the step scores functions.

Returns:

An object containing a list of sequence attributions, with

an optional added list of single FeatureAttributionStepOutput for each step and extra information regarding the attribution parameters.

Return type:

FeatureAttributionOutput

unhook(**kwargs) None[source]

Unhooks the attribution method from the model. If the model was modified in any way, this should restore its initial state.

Gradient-based Attribution Methods

class inseq.attr.feat.GradientAttributionRegistry(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]

Gradient-based attribution method registry.

attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) GranularFeatureAttributionStepOutput[source]

Performs a single attribution step for the specified attribution arguments.

Parameters:
  • attribute_fn_main_args (dict) – Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.

Returns:

A dataclass containing a tensor of source

attributions of size (batch_size, source_length), possibly a tensor of target attributions of size (batch_size, prefix length) if attribute_target=True and possibly a tensor of deltas of size `(batch_size) if the attribution step supports deltas and they are requested. At this point the batch information is empty, and will later be filled by the enrich_step_output function.

Return type:

GranularFeatureAttributionStepOutput

hook(**kwargs)[source]

Hooks the attribution method to the model by replacing normal nn.Embedding with Captum’s InterpretableEmbeddingBase.

unhook(**kwargs)[source]

Unhook the attribution method by restoring the model’s original embeddings.

class inseq.attr.feat.DeepLiftAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

DeepLIFT attribution method.

Reference implementation: https://captum.ai/api/deep_lift.html.

class inseq.attr.feat.DiscretizedIntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = False, **kwargs)[source]

Discretized Integrated Gradients attribution method.

Reference: https://arxiv.org/abs/2108.13654

Original implementation: https://github.com/INK-USC/DIG

hook(**kwargs)[source]

Hooks the attribution method to the model by replacing normal nn.Embedding with Captum’s InterpretableEmbeddingBase.

class inseq.attr.feat.GradientShapAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

GradientSHAP attribution method.

Reference implementation: https://captum.ai/api/gradient_shap.html.

class inseq.attr.feat.IntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

Integrated Gradients attribution method.

Reference implementation: https://captum.ai/api/integrated_gradients.html.

class inseq.attr.feat.InputXGradientAttribution(attribution_model)[source]

Input x Gradient attribution method.

Reference implementation: https://captum.ai/api/input_x_gradient.html.

class inseq.attr.feat.SaliencyAttribution(attribution_model)[source]

Saliency attribution method.

Reference implementation: https://captum.ai/api/saliency.html.

class inseq.attr.feat.SequentialIntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

Sequential Integrated Gradients attribution method.

Reference: https://aclanthology.org/2023.findings-acl.477/

Original implementation: https://github.com/josephenguehard/time_interpret/blob/main/tint/attr/seq_ig.py

Layer Attribution Methods

class inseq.attr.feat.LayerIntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

Layer Integrated Gradients attribution method.

Reference implementation: https://captum.ai/api/layer.html#layer-integrated-gradients.

class inseq.attr.feat.LayerGradientXActivationAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

Layer Integrated Gradients attribution method.

Reference implementation: https://captum.ai/api/layer.html#layer-gradient-x-activation.

class inseq.attr.feat.LayerDeepLiftAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

Layer DeepLIFT attribution method.

Reference implementation: https://captum.ai/api/layer.html#layer-deeplift.

Internals-based Attribution Methods

class inseq.attr.feat.InternalsAttributionRegistry(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]

Model Internals-based attribution method registry.

class inseq.attr.feat.AttentionWeightsAttribution(attribution_model, **kwargs)[source]

The basic attention attribution method, which retrieves the attention weights from the model.

class AttentionWeights(forward_func: AttributionModel)[source]
attribute(inputs: TensorOrTupleOfTensorsGeneric, additional_forward_args: TensorOrTupleOfTensorsGeneric, encoder_self_attentions: Float[Tensor, 'batch_size n_layers n_units seq_len seq_len'] | None = None, decoder_self_attentions: Float[Tensor, 'batch_size n_layers n_units seq_len seq_len'] | None = None, cross_attentions: Float[Tensor, 'batch_size n_layers n_units seq_len seq_len'] | None = None) MultiDimensionalFeatureAttributionStepOutput[source]

Extracts the attention weights from the model.

Parameters:
  • inputs (TensorOrTupleOfTensorsGeneric) – Tensor or tuple of tensors that are inputs to the model. Used to match standard Captum API, and to determine whether both source and target are being attributed.

  • additional_forward_args (TensorOrTupleOfTensorsGeneric) – Tensor or tuple of tensors that are additional arguments to the model. Unused, but included to match standard Captum API.

  • encoder_self_attentions (tuple(torch.Tensor), optional, defaults to None) – Tensor of encoder self-attention weights of the forward pass with shape (batch_size, n_layers, n_heads, source_seq_len, source_seq_len).

  • decoder_self_attentions (tuple(torch.Tensor), optional, defaults to None) – Tensor of decoder self-attention weights of the forward pass with shape (batch_size, n_layers, n_heads, target_seq_len, target_seq_len).

  • cross_attentions (tuple(torch.Tensor), optional, defaults to None) – Tensor of cross-attention weights computed during the forward pass with shape (batch_size, n_layers, n_heads, source_seq_len, target_seq_len).

Returns:

A step output containing attention weights for each layer and head, with shape (batch_size, seq_len, n_layers, n_heads).

Return type:

MultiDimensionalFeatureAttributionStepOutput

compute_convergence_delta: Callable

The attribution algorithms which derive Attribution class and provide convergence delta (aka approximation error) should implement this method. Convergence delta can be computed based on certain properties of the attribution alogrithms.

Parameters:
  • attributions (Tensor or tuple[Tensor, ...]) – Attribution scores that are precomputed by an attribution algorithm. Attributions can be provided in form of a single tensor or a tuple of those. It is assumed that attribution tensor’s dimension 0 corresponds to the number of examples, and if multiple input tensors are provided, the examples must be aligned appropriately.

  • *args (Any, optional) – Additonal arguments that are used by the sub-classes depending on the specific implementation of compute_convergence_delta.

Returns:

  • deltas (Tensor):

    Depending on specific implementaion of sub-classes, convergence delta can be returned per sample in form of a tensor or it can be aggregated across multuple samples and returned in form of a single floating point tensor.

Return type:

Tensor of deltas

static has_convergence_delta() bool[source]

This method informs the user whether the attribution algorithm provides a convergence delta (aka an approximation error) or not. Convergence delta may serve as a proxy of correctness of attribution algorithm’s approximation. If deriving attribution class provides a compute_convergence_delta method, it should override both compute_convergence_delta and has_convergence_delta methods.

Returns:

Returns whether the attribution algorithm provides a convergence delta (aka approximation error) or not.

Return type:

bool

attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any]) MultiDimensionalFeatureAttributionStepOutput[source]

Performs a single attribution step for the specified attribution arguments.

Parameters:
  • attribute_fn_main_args (dict) – Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.

Returns:

A dataclass containing a tensor of source

attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.

Return type:

FeatureAttributionStepOutput

Perturbation-based Attribution Methods

class inseq.attr.feat.PerturbationAttributionRegistry(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]

Perturbation-based attribution method registry.

class inseq.attr.feat.OcclusionAttribution(attribution_model)[source]

Occlusion-based attribution method. Reference implementation: https://captum.ai/api/occlusion.html.

Usage in other implementations: niuzaisheng/AttExplainer andrewPoulton/explainable-asag copenlu/xai-benchmark DFKI-NLP/thermostat

attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) CoarseFeatureAttributionStepOutput[source]

Sliding window shapes is defined as a tuple. First entry is between 1 and length of input. Second entry is given by the embedding dimension of the underlying model. If not explicitly given via attribution_args, the default is (1, embedding_dim).

class inseq.attr.feat.LimeAttribution(attribution_model, **kwargs)[source]

LIME-based attribution method. Reference implementations: https://captum.ai/api/lime.html. https://github.com/DFKI-NLP/thermostat/. https://github.com/copenlu/ALPS_2021.

The main part of the code is in Lime of ops/lime.py.

attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) GranularFeatureAttributionStepOutput[source]

Performs a single attribution step for the specified attribution arguments.

Parameters:
  • attribute_fn_main_args (dict) – Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.

Returns:

A dataclass containing a tensor of source

attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.

Return type:

FeatureAttributionStepOutput

class inseq.attr.feat.ValueZeroingAttribution(attribution_model, **kwargs)[source]

Value Zeroing method for feature attribution.

Introduced by Mohebbi et al. (2023) to quantify context mixing in Transformer models. The method is based on the observation that context mixing is regulated by the value vectors of the attention mechanism. The method consists of two steps:

  1. Zeroing the value vectors of the attention mechanism for a given token index at a given layer of the model.

  2. Computing the similarity between hidden states produced with and without the zeroing operation, and using it as a measure of context mixing for the given token at the given layer.

The method is converted into a feature attribution method by allowing for extraction of value zeroing scores at specific layers, or by aggregating them across layers.

Reference implementations: - Original implementation: hmohebbi/ValueZeroing - Encoder-decoder implementation: hmohebbi/ContextMixingASR

Parameters:
  • similarity_metric (str, optional) – The similarity metric to use for computing the distance between hidden states produced with and without the zeroing operation. Options: cosine, euclidean. Default: cosine.

  • encoder_zeroed_units_indices (Union[int, tuple[int, int], list[int], dict], optional) –

    The indices of the attention heads that should be zeroed to compute corrupted states in the encoder self-attention module. Not used for decoder-only models, or if output_encoder_self_scores is False. Format

    • None: all attention heads across all layers are zeroed.

    • int: the same attention head is zeroed across all layers.

    • tuple of two integers: the attention heads in the range are zeroed across all layers.

    • list of integers: the attention heads in the list are zeroed across all layers.

    • dictionary: the keys are the layer indices and the values are the zeroed attention heads for the corresponding layer.

    Default: None (all heads are zeroed for every encoder layer).

  • decoder_zeroed_units_indices (Union[int, tuple[int, int], list[int], dict], optional) – Same as encoder_zeroed_units_indices but for the decoder self-attention module. Not used for encoder-decoder models or if output_decoder_self_scores is False. Default: None (all heads are zeroed for every decoder layer).

  • cross_zeroed_units_indices (Union[int, tuple[int, int], list[int], dict], optional) – Same as encoder_zeroed_units_indices but for the cross-attention module in encoder-decoder models. Not used if the model is decoder-only. Default: None (all heads are zeroed for every layer).

  • output_decoder_self_scores (bool, optional) – Whether to produce scores derived from zeroing the decoder self-attention value vectors in encoder-decoder models. Cannot be false for decoder-only, or if target-side attribution is requested using attribute_target=True. Default: True.

  • output_encoder_self_scores (bool, optional) – Whether to produce scores derived from zeroing the encoder self-attention value vectors in encoder-decoder models. Default: True.

Returns:

The final dimension returned by the method is [attributed_seq_len, generated_seq_len, num_layers]. If output_decoder_self_scores and output_encoder_self_scores are True, the respective scores are returned in the sequence_scores output dictionary.

Return type:

MultiDimensionalFeatureAttributionStepOutput

attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) MultiDimensionalFeatureAttributionStepOutput[source]

Performs a single attribution step for the specified attribution arguments.

Parameters:
  • attribute_fn_main_args (dict) – Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.

Returns:

A dataclass containing a tensor of source

attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.

Return type:

FeatureAttributionStepOutput

class inseq.attr.feat.ReagentAttribution(attribution_model: HuggingfaceModel, keep_top_n: int = 5, keep_ratio: float = None, invert_keep: bool = False, stopping_condition_top_k: int = 3, replacing_ratio: float = 0.3, max_probe_steps: int = 3000, num_probes: int = 16)[source]

Recursive attribution generator (ReAGent) method.

Measures importance as the drop in prediction probability produced by replacing a token with a plausible alternative predicted by a LM.

Reference implementation: ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models

__init__(attribution_model: HuggingfaceModel, keep_top_n: int = 5, keep_ratio: float = None, invert_keep: bool = False, stopping_condition_top_k: int = 3, replacing_ratio: float = 0.3, max_probe_steps: int = 3000, num_probes: int = 16)[source]

ReAGent method constructor.

Parameters:
  • keep_top_n (int, optional) – If set to a value greater than 0, the top n tokens based on their importance score will be kept during the prediction inference. If set to 0, the top n will be determined by keep_ratio. Default: 5.

  • keep_ratio (float, optional) – If keep_top_n is set to 0, this specifies the proportion of tokens to keep.

  • invert_keep (bool, optional) – If specified, the top tokens selected either via keep_top_n or keep_ratio will be replaced instead of being kept. Default: False.

  • stopping_condition_top_k (int, optional) – Threshold indicating that the stop condition achieved when the predicted target exist in top k predictions. Default: 3.

  • replacing_ratio (float, optional) – replacing ratio of tokens for probing. Default: 0.3.

  • max_probe_steps (int, optional) – Max number of steps before stopping the probing. Default: 3000.

  • num_probes (int, optional) – Number of probes performed in parallel. Default: 16.

attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) GranularFeatureAttributionStepOutput[source]

Performs a single attribution step for the specified attribution arguments.

Parameters:
  • attribute_fn_main_args (dict) – Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.

Returns:

A dataclass containing a tensor of source

attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.

Return type:

FeatureAttributionStepOutput

import inseq

model = inseq.load_model(
    "gpt2-medium",
    "reagent",
    keep_top_n=5,
    stopping_condition_top_k=3,
    replacing_ratio=0.3,
    max_probe_steps=3000,
    num_probes=8
)
out = model.attribute("Super Mario Land is a game that developed by")
out.show()