Attribution Methods
- class inseq.attr.FeatureAttribution(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]
Abstract registry for feature attribution methods.
- ignore_extra_args
Arguments used by default in the attribute step and thus ignored as extra arguments during attribution. The selection of defaults follows the Captum naming convention.
- Type:
list
ofstr
- attribute(batch: DecoderOnlyBatch | EncoderDecoderBatch, attributed_fn: Callable[[...], Float32[Tensor, 'batch_size']], attr_pos_start: int | None = None, attr_pos_end: int | None = None, show_progress: bool = True, pretty_progress: bool = True, output_step_attributions: bool = False, attribute_target: bool = False, step_scores: list[str] = [], attribution_args: dict[str, Any] = {}, attributed_fn_args: dict[str, Any] = {}, step_scores_args: dict[str, Any] = {}) FeatureAttributionOutput [source]
Performs the feature attribution procedure using the specified attribution method.
- Parameters:
batch (
EncoderDecoderBatch
orDecoderOnlyBatch
) β The batch of sequences to attribute.attributed_fn (
Callable[..., SingleScorePerStepTensor]
) β The function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). It must be a function that taking multiple keyword arguments and returns atensor
of size (batch_size,). If not provided, the default attributed function for the model will be used.attr_pos_start (
int
, optional) β The initial position for performing sequence attribution. Defaults to 1 (0 is the default BOS token).attr_pos_end (
int
, optional) β The final position for performing sequence attribution. Defaults to None (full string).show_progress (
bool
, optional) β Whether to show a progress bar. Defaults to True.pretty_progress (
bool
, optional) β Whether to use a pretty progress bar. Defaults to True.output_step_attributions (
bool
, optional) β Whether to output a list of FeatureAttributionStepOutput objects for each step. Defaults to False.attribute_target (
bool
, optional) β Whether to include target prefix for feature attribution. Defaults to False.step_scores (
list
of str) β List of identifiers for step scores that need to be computed during attribution. The available step scores are defined ininseq.attr.feat.STEP_SCORES_MAP
and new step scores can be added by using theregister_step_function()
function.attribution_args (
dict
, optional) β Additional arguments to pass to the attribution method.attributed_fn_args (
dict
, optional) β Additional arguments to pass to the attributed function.step_scores_args (
dict
, optional) β Additional arguments to pass to the step scores function.
- Returns:
- An object containing a list of sequence attributions, with
an optional added list of single
FeatureAttributionStepOutput
for each step and extra information regarding the attribution parameters.
- Return type:
FeatureAttributionOutput
- attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) FeatureAttributionStepOutput [source]
Performs a single attribution step for the specified attribution arguments.
- Parameters:
attribute_fn_main_args (
dict
) β Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.attribution_args (
dict
, optional) β Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.
- Returns:
- A dataclass containing a tensor of source
attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.
- Return type:
FeatureAttributionStepOutput
- filtered_attribute_step(batch: DecoderOnlyBatch | EncoderDecoderBatch, target_ids: Int[Tensor, 'batch_size 1'], attributed_fn: Callable[[...], Float32[Tensor, 'batch_size']], target_attention_mask: Int[Tensor, 'batch_size 1'] | None = None, attribute_target: bool = False, step_scores: list[str] = [], attribution_args: dict[str, Any] = {}, attributed_fn_args: dict[str, Any] = {}, step_scores_args: dict[str, Any] = {}) FeatureAttributionStepOutput [source]
Performs a single attribution step for all the sequences in the batch that still have valid target_ids, as identified by the target_attention_mask. Finished sentences are temporarily filtered out to make the attribution step faster and then reinserted before returning.
- Parameters:
batch (
EncoderDecoderBatch
orDecoderOnlyBatch
) β The batch of sequences to attribute.target_ids (
torch.Tensor
) β Target token ids of size (batch_size, 1) corresponding to tokens for which the attribution step must be performed.attributed_fn (
Callable[..., SingleScorePerStepTensor]
) β The function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). The parameter must be a function that taking multiple keyword arguments and returns atensor
of size (batch_size,). If not provided, the default attributed function for the model will be used (change attribution_model.default_attributed_fn_id).target_attention_mask (
torch.Tensor
, optional) β Boolean attention mask of size (batch_size, 1) specifying which target_ids are valid for attribution and which are padding.attribute_target (
bool
, optional) β Whether to include target prefix for feature attribution. Defaults to False.step_scores (
list
of str) β List of identifiers for step scores that need to be computed during attribution. The available step scores are defined ininseq.attr.feat.STEP_SCORES_MAP
and new step scores can be added by using theregister_step_function()
function.attribution_args (
dict
, optional) β Additional arguments to pass to the attribution method.attributed_fn_args (
dict
, optional) β Additional arguments to pass to the attributed function.step_scores_args (
dict
, optional) β Additional arguments to pass to the step scores functions.
- Returns:
- A dataclass containing attribution tensors for source
and target attributions of size (batch_size, source_length) and (batch_size, prefix length). (target optional if attribute_target=True), plus batch information and any step score present.
- Return type:
FeatureAttributionStepOutput
- hook(**kwargs) None [source]
Hooks the attribution method to the model. Useful to implement pre-attribution logic (e.g. freezing layers, replacing embeddings, raise warnings, etc.).
- classmethod load(method_name: str, attribution_model: AttributionModel | None = None, model_name_or_path: str | None = None, **kwargs) FeatureAttribution [source]
Load the selected method and hook it to an existing or available attribution model.
- Parameters:
method_name (
str
) β The name of the attribution method to load.attribution_model (
AttributionModel
, optional) β An instance of anAttributionModel
child class. If not provided, the method will try to load the model from the model_name_or_path argument. Defaults to None.model_name_or_path (
ModelIdentifier
, optional) β The name of the model to load or its path on disk. If not provided, an instantiated model must be provided. If the model is loaded in this way, the model will be created with default arguments. Defaults to None.**kwargs β Additional arguments to pass to the attribution method
__init__
function.
- Raises:
RuntimeError β Raised if both or neither model_name_or_path and attribution_model are provided.
UnknownAttributionMethodError β Raised if the method_name is not found in the registry.
- Returns:
The loaded attribution method.
- Return type:
FeatureAttribution
- prepare_and_attribute(sources: str | Sequence[str] | BatchEncoding | Batch, targets: str | Sequence[str] | BatchEncoding | Batch, attr_pos_start: int | None = None, attr_pos_end: int | None = None, show_progress: bool = True, pretty_progress: bool = True, output_step_attributions: bool = False, attribute_target: bool = False, step_scores: list[str] = [], include_eos_baseline: bool = False, attributed_fn: str | Callable[[...], Float32[Tensor, 'batch_size']] | None = None, attribution_args: dict[str, Any] = {}, attributed_fn_args: dict[str, Any] = {}, step_scores_args: dict[str, Any] = {}) FeatureAttributionOutput [source]
Prepares inputs and performs attribution.
Wraps the attribution method
attribute()
method and theprepare_inputs_for_attribution()
method.- Parameters:
sources (
FeatureAttributionInput
) β The sources provided to theprepare()
method.targets (
FeatureAttributionInput
) β The targets provided to theprepare()
method.attr_pos_start (
int
, optional) β The initial position for performing sequence attribution. Defaults to 0.attr_pos_end (
int
, optional) β The final position for performing sequence attribution. Defaults to None (full string).show_progress (
bool
, optional) β Whether to show a progress bar. Defaults to True.pretty_progress (
bool
, optional) β Whether to use a pretty progress bar. Defaults to True.output_step_attributions (
bool
, optional) β Whether to output a list of FeatureAttributionStepOutput objects for each step. Defaults to False.attribute_target (
bool
, optional) β Whether to include target prefix for feature attribution. Defaults to False.step_scores (
list
of str) β List of identifiers for step scores that need to be computed during attribution. The available step scores are defined ininseq.attr.feat.STEP_SCORES_MAP
and new step scores can be added by using theregister_step_function()
function.include_eos_baseline (
bool
, optional) β Whether to include the EOS token in the baseline for attribution. By default the EOS token is not used for attribution. Defaults to False.attributed_fn (
str
orCallable[..., SingleScorePerStepTensor]
, optional) β The identifier or function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). If it is a string, it must be a valid function. Otherwise, it must be a function that taking multiple keyword arguments and returns atensor
of size (batch_size,). If not provided, the default attributed function for the model will be used (change attribution_model.default_attributed_fn_id).attribution_args (
dict
, optional) β Additional arguments to pass to the attribution method.attributed_fn_args (
dict
, optional) β Additional arguments to pass to the attributed function.step_scores_args (
dict
, optional) β Additional arguments to pass to the step scores functions.
- Returns:
- An object containing a list of sequence attributions, with
an optional added list of single
FeatureAttributionStepOutput
for each step and extra information regarding the attribution parameters.
- Return type:
FeatureAttributionOutput
- unhook(**kwargs) None [source]
Unhooks the attribution method from the model. If the model was modified in any way, this should restore its initial state.
Gradient-based Attribution Methods
- class inseq.attr.feat.GradientAttributionRegistry(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]
Gradient-based attribution method registry.
- attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) GranularFeatureAttributionStepOutput [source]
Performs a single attribution step for the specified attribution arguments.
- Parameters:
attribute_fn_main_args (
dict
) β Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.attribution_args (
dict
, optional) β Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.
- Returns:
- A dataclass containing a tensor of source
attributions of size (batch_size, source_length), possibly a tensor of target attributions of size (batch_size, prefix length) if attribute_target=True and possibly a tensor of deltas of size `(batch_size) if the attribution step supports deltas and they are requested. At this point the batch information is empty, and will later be filled by the enrich_step_output function.
- Return type:
GranularFeatureAttributionStepOutput
- hook(**kwargs)[source]
Hooks the attribution method to the model by replacing normal
nn.Embedding
with Captumβs InterpretableEmbeddingBase.
- unhook(**kwargs)[source]
Unhook the attribution method by restoring the modelβs original embeddings.
- class inseq.attr.feat.DeepLiftAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]
DeepLIFT attribution method.
Reference implementation: https://captum.ai/api/deep_lift.html.
- class inseq.attr.feat.DiscretizedIntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = False, **kwargs)[source]
Discretized Integrated Gradients attribution method.
Reference: https://arxiv.org/abs/2108.13654
Original implementation: https://github.com/INK-USC/DIG
- hook(**kwargs)[source]
Hooks the attribution method to the model by replacing normal
nn.Embedding
with Captumβs InterpretableEmbeddingBase.
- class inseq.attr.feat.GradientShapAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]
GradientSHAP attribution method.
Reference implementation: https://captum.ai/api/gradient_shap.html.
- class inseq.attr.feat.IntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]
Integrated Gradients attribution method.
Reference implementation: https://captum.ai/api/integrated_gradients.html.
- class inseq.attr.feat.InputXGradientAttribution(attribution_model)[source]
Input x Gradient attribution method.
Reference implementation: https://captum.ai/api/input_x_gradient.html.
- class inseq.attr.feat.SaliencyAttribution(attribution_model)[source]
Saliency attribution method.
Reference implementation: https://captum.ai/api/saliency.html.
- class inseq.attr.feat.SequentialIntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]
Sequential Integrated Gradients attribution method.
Reference: https://aclanthology.org/2023.findings-acl.477/
Original implementation: https://github.com/josephenguehard/time_interpret/blob/main/tint/attr/seq_ig.py
Layer Attribution Methods
- class inseq.attr.feat.LayerIntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]
Layer Integrated Gradients attribution method.
Reference implementation: https://captum.ai/api/layer.html#layer-integrated-gradients.
- class inseq.attr.feat.LayerGradientXActivationAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]
Layer Integrated Gradients attribution method.
Reference implementation: https://captum.ai/api/layer.html#layer-gradient-x-activation.
- class inseq.attr.feat.LayerDeepLiftAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]
Layer DeepLIFT attribution method.
Reference implementation: https://captum.ai/api/layer.html#layer-deeplift.
Internals-based Attribution Methods
- class inseq.attr.feat.InternalsAttributionRegistry(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]
Model Internals-based attribution method registry.
- class inseq.attr.feat.AttentionWeightsAttribution(attribution_model, **kwargs)[source]
The basic attention attribution method, which retrieves the attention weights from the model.
- class AttentionWeights(forward_func: AttributionModel)[source]
- attribute(inputs: TensorOrTupleOfTensorsGeneric, additional_forward_args: TensorOrTupleOfTensorsGeneric, encoder_self_attentions: Float[Tensor, 'batch_size n_layers n_units seq_len seq_len'] | None = None, decoder_self_attentions: Float[Tensor, 'batch_size n_layers n_units seq_len seq_len'] | None = None, cross_attentions: Float[Tensor, 'batch_size n_layers n_units seq_len seq_len'] | None = None) MultiDimensionalFeatureAttributionStepOutput [source]
Extracts the attention weights from the model.
- Parameters:
inputs (TensorOrTupleOfTensorsGeneric) β Tensor or tuple of tensors that are inputs to the model. Used to match standard Captum API, and to determine whether both source and target are being attributed.
additional_forward_args (TensorOrTupleOfTensorsGeneric) β Tensor or tuple of tensors that are additional arguments to the model. Unused, but included to match standard Captum API.
encoder_self_attentions (
tuple(torch.Tensor)
, optional, defaults to None) β Tensor of encoder self-attention weights of the forward pass with shape(batch_size, n_layers, n_heads, source_seq_len, source_seq_len)
.decoder_self_attentions (
tuple(torch.Tensor)
, optional, defaults to None) β Tensor of decoder self-attention weights of the forward pass with shape(batch_size, n_layers, n_heads, target_seq_len, target_seq_len)
.cross_attentions (
tuple(torch.Tensor)
, optional, defaults to None) β Tensor of cross-attention weights computed during the forward pass with shape(batch_size, n_layers, n_heads, source_seq_len, target_seq_len)
.
- Returns:
A step output containing attention weights for each layer and head, with shape
(batch_size, seq_len, n_layers, n_heads)
.- Return type:
MultiDimensionalFeatureAttributionStepOutput
- compute_convergence_delta: Callable
The attribution algorithms which derive Attribution class and provide convergence delta (aka approximation error) should implement this method. Convergence delta can be computed based on certain properties of the attribution alogrithms.
- Parameters:
attributions (Tensor or tuple[Tensor, ...]) β Attribution scores that are precomputed by an attribution algorithm. Attributions can be provided in form of a single tensor or a tuple of those. It is assumed that attribution tensorβs dimension 0 corresponds to the number of examples, and if multiple input tensors are provided, the examples must be aligned appropriately.
*args (Any, optional) β Additonal arguments that are used by the sub-classes depending on the specific implementation of compute_convergence_delta.
- Returns:
- deltas (Tensor):
Depending on specific implementaion of sub-classes, convergence delta can be returned per sample in form of a tensor or it can be aggregated across multuple samples and returned in form of a single floating point tensor.
- Return type:
Tensor of deltas
- static has_convergence_delta() bool [source]
This method informs the user whether the attribution algorithm provides a convergence delta (aka an approximation error) or not. Convergence delta may serve as a proxy of correctness of attribution algorithmβs approximation. If deriving attribution class provides a compute_convergence_delta method, it should override both compute_convergence_delta and has_convergence_delta methods.
- Returns:
Returns whether the attribution algorithm provides a convergence delta (aka approximation error) or not.
- Return type:
bool
- attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any]) MultiDimensionalFeatureAttributionStepOutput [source]
Performs a single attribution step for the specified attribution arguments.
- Parameters:
attribute_fn_main_args (
dict
) β Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.attribution_args (
dict
, optional) β Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.
- Returns:
- A dataclass containing a tensor of source
attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.
- Return type:
FeatureAttributionStepOutput
Perturbation-based Attribution Methods
- class inseq.attr.feat.PerturbationAttributionRegistry(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]
Perturbation-based attribution method registry.
- class inseq.attr.feat.OcclusionAttribution(attribution_model)[source]
Occlusion-based attribution method. Reference implementation: https://captum.ai/api/occlusion.html.
Usage in other implementations: niuzaisheng/AttExplainer andrewPoulton/explainable-asag copenlu/xai-benchmark DFKI-NLP/thermostat
- attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) CoarseFeatureAttributionStepOutput [source]
Sliding window shapes is defined as a tuple. First entry is between 1 and length of input. Second entry is given by the embedding dimension of the underlying model. If not explicitly given via attribution_args, the default is (1, embedding_dim).
- class inseq.attr.feat.LimeAttribution(attribution_model, **kwargs)[source]
LIME-based attribution method. Reference implementations: https://captum.ai/api/lime.html. https://github.com/DFKI-NLP/thermostat/. https://github.com/copenlu/ALPS_2021.
The main part of the code is in Lime of ops/lime.py.
- attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) GranularFeatureAttributionStepOutput [source]
Performs a single attribution step for the specified attribution arguments.
- Parameters:
attribute_fn_main_args (
dict
) β Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.attribution_args (
dict
, optional) β Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.
- Returns:
- A dataclass containing a tensor of source
attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.
- Return type:
FeatureAttributionStepOutput
- class inseq.attr.feat.ValueZeroingAttribution(attribution_model, **kwargs)[source]
Value Zeroing method for feature attribution.
Introduced by Mohebbi et al. (2023) to quantify context mixing in Transformer models. The method is based on the observation that context mixing is regulated by the value vectors of the attention mechanism. The method consists of two steps:
Zeroing the value vectors of the attention mechanism for a given token index at a given layer of the model.
Computing the similarity between hidden states produced with and without the zeroing operation, and using it as a measure of context mixing for the given token at the given layer.
The method is converted into a feature attribution method by allowing for extraction of value zeroing scores at specific layers, or by aggregating them across layers.
Reference implementations: - Original implementation: hmohebbi/ValueZeroing - Encoder-decoder implementation: hmohebbi/ContextMixingASR
- Parameters:
similarity_metric (
str
, optional) β The similarity metric to use for computing the distance between hidden states produced with and without the zeroing operation. Options: cosine, euclidean. Default: cosine.encoder_zeroed_units_indices (
Union[int, tuple[int, int], list[int], dict]
, optional) βThe indices of the attention heads that should be zeroed to compute corrupted states in the encoder self-attention module. Not used for decoder-only models, or if
output_encoder_self_scores
is False. FormatNone: all attention heads across all layers are zeroed.
int: the same attention head is zeroed across all layers.
tuple of two integers: the attention heads in the range are zeroed across all layers.
list of integers: the attention heads in the list are zeroed across all layers.
dictionary: the keys are the layer indices and the values are the zeroed attention heads for the corresponding layer.
Default: None (all heads are zeroed for every encoder layer).
decoder_zeroed_units_indices (
Union[int, tuple[int, int], list[int], dict]
, optional) β Same asencoder_zeroed_units_indices
but for the decoder self-attention module. Not used for encoder-decoder models or ifoutput_decoder_self_scores
is False. Default: None (all heads are zeroed for every decoder layer).cross_zeroed_units_indices (
Union[int, tuple[int, int], list[int], dict]
, optional) β Same asencoder_zeroed_units_indices
but for the cross-attention module in encoder-decoder models. Not used if the model is decoder-only. Default: None (all heads are zeroed for every layer).output_decoder_self_scores (
bool
, optional) β Whether to produce scores derived from zeroing the decoder self-attention value vectors in encoder-decoder models. Cannot be false for decoder-only, or if target-side attribution is requested using attribute_target=True. Default: True.output_encoder_self_scores (
bool
, optional) β Whether to produce scores derived from zeroing the encoder self-attention value vectors in encoder-decoder models. Default: True.
- Returns:
The final dimension returned by the method is
[attributed_seq_len, generated_seq_len, num_layers]
. Ifoutput_decoder_self_scores
andoutput_encoder_self_scores
are True, the respective scores are returned in thesequence_scores
output dictionary.- Return type:
MultiDimensionalFeatureAttributionStepOutput
- attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) MultiDimensionalFeatureAttributionStepOutput [source]
Performs a single attribution step for the specified attribution arguments.
- Parameters:
attribute_fn_main_args (
dict
) β Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.attribution_args (
dict
, optional) β Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.
- Returns:
- A dataclass containing a tensor of source
attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.
- Return type:
FeatureAttributionStepOutput
- class inseq.attr.feat.ReagentAttribution(attribution_model: HuggingfaceModel, keep_top_n: int = 5, keep_ratio: float = None, invert_keep: bool = False, stopping_condition_top_k: int = 3, replacing_ratio: float = 0.3, max_probe_steps: int = 3000, num_probes: int = 16)[source]
Recursive attribution generator (ReAGent) method.
Measures importance as the drop in prediction probability produced by replacing a token with a plausible alternative predicted by a LM.
Reference implementation: ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models
- __init__(attribution_model: HuggingfaceModel, keep_top_n: int = 5, keep_ratio: float = None, invert_keep: bool = False, stopping_condition_top_k: int = 3, replacing_ratio: float = 0.3, max_probe_steps: int = 3000, num_probes: int = 16)[source]
ReAGent method constructor.
- Parameters:
keep_top_n (
int
, optional) β If set to a value greater than 0, the top n tokens based on their importance score will be kept during the prediction inference. If set to 0, the top n will be determined bykeep_ratio
. Default:5
.keep_ratio (
float
, optional) β Ifkeep_top_n
is set to 0, this specifies the proportion of tokens to keep.invert_keep (
bool
, optional) β If specified, the top tokens selected either viakeep_top_n
orkeep_ratio
will be replaced instead of being kept. Default:False
.stopping_condition_top_k (
int
, optional) β Threshold indicating that the stop condition achieved when the predicted target exist in top k predictions. Default:3
.replacing_ratio (
float
, optional) β replacing ratio of tokens for probing. Default:0.3
.max_probe_steps (
int
, optional) β Max number of steps before stopping the probing. Default:3000
.num_probes (
int
, optional) β Number of probes performed in parallel. Default:16
.
- attribute_step(attribute_fn_main_args: dict[str, Any], attribution_args: dict[str, Any] = {}) GranularFeatureAttributionStepOutput [source]
Performs a single attribution step for the specified attribution arguments.
- Parameters:
attribute_fn_main_args (
dict
) β Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.attribution_args (
dict
, optional) β Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.
- Returns:
- A dataclass containing a tensor of source
attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.
- Return type:
FeatureAttributionStepOutput
import inseq
model = inseq.load_model(
"gpt2-medium",
"reagent",
keep_top_n=5,
stopping_condition_top_k=3,
replacing_ratio=0.3,
max_probe_steps=3000,
num_probes=8
)
out = model.attribute("Super Mario Land is a game that developed by")
out.show()