Feature Attribution

class inseq.attr.FeatureAttribution(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]

Abstract registry for feature attribution methods.

attr

Attribute of child classes that will act as lookup name for the registry.

Type:

str

ignore_extra_args

Arguments used by default in the attribute step and thus ignored as extra arguments during attribution. The selection of defaults follows the Captum naming convention.

Type:

list of str

attribute(batch: Union[DecoderOnlyBatch, EncoderDecoderBatch], attributed_fn: Callable[[...], Tensor[Tensor]], attr_pos_start: Optional[int] = None, attr_pos_end: Optional[int] = None, show_progress: bool = True, pretty_progress: bool = True, output_step_attributions: bool = False, attribute_target: bool = False, step_scores: List[str] = [], attribution_args: Dict[str, Any] = {}, attributed_fn_args: Dict[str, Any] = {}, step_scores_args: Dict[str, Any] = {}) FeatureAttributionOutput[source]

Performs the feature attribution procedure using the specified attribution method.

Parameters:
  • batch (EncoderDecoderBatch or DecoderOnlyBatch) – The batch of sequences to attribute.

  • attributed_fn (Callable[..., SingleScorePerStepTensor]) – The function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). It must be a function that taking multiple keyword arguments and returns a tensor of size (batch_size,). If not provided, the default attributed function for the model will be used.

  • attr_pos_start (int, optional) – The initial position for performing sequence attribution. Defaults to 1 (0 is the default BOS token).

  • attr_pos_end (int, optional) – The final position for performing sequence attribution. Defaults to None (full string).

  • show_progress (bool, optional) – Whether to show a progress bar. Defaults to True.

  • pretty_progress (bool, optional) – Whether to use a pretty progress bar. Defaults to True.

  • output_step_attributions (bool, optional) – Whether to output a list of FeatureAttributionStepOutput objects for each step. Defaults to False.

  • attribute_target (bool, optional) – Whether to include target prefix for feature attribution. Defaults to False.

  • step_scores (list of str) – List of identifiers for step scores that need to be computed during attribution. The available step scores are defined in inseq.attr.feat.STEP_SCORES_MAP and new step scores can be added by using the register_step_function() function.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method.

  • attributed_fn_args (dict, optional) – Additional arguments to pass to the attributed function.

  • step_scores_args (dict, optional) – Additional arguments to pass to the step scores function.

Returns:

An object containing a list of sequence attributions, with

an optional added list of single FeatureAttributionStepOutput for each step and extra information regarding the attribution parameters.

Return type:

FeatureAttributionOutput

attribute_step(attribute_fn_main_args: Dict[str, Any], attribution_args: Dict[str, Any] = {}) FeatureAttributionStepOutput[source]

Performs a single attribution step for the specified attribution arguments.

Parameters:
  • attribute_fn_main_args (dict) – Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.

Returns:

A dataclass containing a tensor of source

attributions of size (batch_size, source_length). At this point the batch information is empty, and will later be filled by the enrich_step_output function.

Return type:

FeatureAttributionStepOutput

filtered_attribute_step(batch: Union[DecoderOnlyBatch, EncoderDecoderBatch], target_ids: Tensor[Tensor], attributed_fn: Callable[[...], Tensor[Tensor]], target_attention_mask: Optional[Tensor[Tensor]] = None, attribute_target: bool = False, step_scores: List[str] = [], attribution_args: Dict[str, Any] = {}, attributed_fn_args: Dict[str, Any] = {}, step_scores_args: Dict[str, Any] = {}) FeatureAttributionStepOutput[source]

Performs a single attribution step for all the sequences in the batch that still have valid target_ids, as identified by the target_attention_mask. Finished sentences are temporarily filtered out to make the attribution step faster and then reinserted before returning.

Parameters:
  • batch (EncoderDecoderBatch or DecoderOnlyBatch) – The batch of sequences to attribute.

  • target_ids (torch.Tensor) – Target token ids of size (batch_size, 1) corresponding to tokens for which the attribution step must be performed.

  • attributed_fn (Callable[..., SingleScorePerStepTensor]) – The function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). The parameter must be a function that taking multiple keyword arguments and returns a tensor of size (batch_size,). If not provided, the default attributed function for the model will be used (change attribution_model.default_attributed_fn_id).

  • target_attention_mask (torch.Tensor, optional) – Boolean attention mask of size (batch_size, 1) specifying which target_ids are valid for attribution and which are padding.

  • attribute_target (bool, optional) – Whether to include target prefix for feature attribution. Defaults to False.

  • step_scores (list of str) – List of identifiers for step scores that need to be computed during attribution. The available step scores are defined in inseq.attr.feat.STEP_SCORES_MAP and new step scores can be added by using the register_step_function() function.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method.

  • attributed_fn_args (dict, optional) – Additional arguments to pass to the attributed function.

  • step_scores_args (dict, optional) – Additional arguments to pass to the step scores functions.

Returns:

A dataclass containing attribution tensors for source

and target attributions of size (batch_size, source_length) and (batch_size, prefix length). (target optional if attribute_target=True), plus batch information and any step score present.

Return type:

FeatureAttributionStepOutput

format_attribute_args(batch: Union[DecoderOnlyBatch, EncoderDecoderBatch], target_ids: Tensor[Tensor], attributed_fn: Callable[[...], Tensor[Tensor]], attributed_fn_args: Dict[str, Any] = {}, **kwargs) Dict[str, Any][source]

Formats inputs for the attribution method based on the model type and the attribution method requirements.

Parameters:
  • batch (DecoderOnlyBatch or EncoderDecoderBatch) – The batch of sequences on which attribution is performed.

  • target_ids (torch.Tensor) – Target token ids of size (batch_size) corresponding to tokens for which the attribution step must be performed.

  • attributed_fn (Callable[..., SingleScorePerStepTensor]) – The function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). The parameter must be a function that taking multiple keyword arguments and returns a tensor of size (batch_size,). If not provided, the default attributed function for the model will be used (change attribution_model.default_attributed_fn_id).

  • attribute_target (bool, optional) – Whether to attribute the target prefix or not. Defaults to False.

  • attributed_fn_args (dict, optional) – Additional arguments to pass to the attributed function. Defaults to {}.

  • **kwargs – Additional arguments to pass to the model-specific inseq.models.AttributionModel.format_attribution_args() method.

Returns:

A dictionary containing the formatted attribution arguments.

Return type:

dict

abstract hook(**kwargs) None[source]

Hooks the attribution method to the model. Useful to implement pre-attribution logic (e.g. freezing layers, replacing embeddings, raise warnings, etc.).

Abstract method, must be implemented by subclasses.

classmethod load(method_name: str, attribution_model: Optional[AttributionModel] = None, model_name_or_path: Optional[str] = None, **kwargs) FeatureAttribution[source]

Load the selected method and hook it to an existing or available attribution model.

Parameters:
  • method_name (str) – The name of the attribution method to load.

  • attribution_model (AttributionModel, optional) – An instance of an AttributionModel child class. If not provided, the method will try to load the model from the model_name_or_path argument. Defaults to None.

  • model_name_or_path (ModelIdentifier, optional) – The name of the model to load or its path on disk. If not provided, an instantiated model must be provided. If the model is loaded in this way, the model will be created with default arguments. Defaults to None.

  • **kwargs – Additional arguments to pass to the attribution method __init__ function.

Raises:
  • RuntimeError – Raised if both or neither model_name_or_path and attribution_model are provided.

  • UnknownAttributionMethodError – Raised if the method_name is not found in the registry.

Returns:

The loaded attribution method.

Return type:

FeatureAttribution

prepare_and_attribute(sources: Sequence[str], targets: Union[str, Sequence[str], BatchEncoding, Batch], attr_pos_start: Optional[int] = None, attr_pos_end: Optional[int] = None, show_progress: bool = True, pretty_progress: bool = True, output_step_attributions: bool = False, attribute_target: bool = False, step_scores: List[str] = [], include_eos_baseline: bool = False, attributed_fn: Optional[Union[str, Callable[[...], Tensor[Tensor]]]] = None, attribution_args: Dict[str, Any] = {}, attributed_fn_args: Dict[str, Any] = {}, step_scores_args: Dict[str, Any] = {}) FeatureAttributionOutput[source]

Prepares inputs and performs attribution.

Wraps the attribution method attribute() method and the prepare_inputs_for_attribution() method.

Parameters:
  • sources (list(str)) – The sources provided to the prepare() method.

  • ( (targets) – obj:FeatureAttributionInput): The targets provided to the :meth:`~inseq.attr.feat.FeatureAttribution.prepare method.

  • attr_pos_start (int, optional) – The initial position for performing sequence attribution. Defaults to 0.

  • attr_pos_end (int, optional) – The final position for performing sequence attribution. Defaults to None (full string).

  • show_progress (bool, optional) – Whether to show a progress bar. Defaults to True.

  • pretty_progress (bool, optional) – Whether to use a pretty progress bar. Defaults to True.

  • output_step_attributions (bool, optional) – Whether to output a list of FeatureAttributionStepOutput objects for each step. Defaults to False.

  • attribute_target (bool, optional) – Whether to include target prefix for feature attribution. Defaults to False.

  • step_scores (list of str) – List of identifiers for step scores that need to be computed during attribution. The available step scores are defined in inseq.attr.feat.STEP_SCORES_MAP and new step scores can be added by using the register_step_function() function.

  • include_eos_baseline (bool, optional) – Whether to include the EOS token in the baseline for attribution. By default the EOS token is not used for attribution. Defaults to False.

  • attributed_fn (str or Callable[..., SingleScorePerStepTensor], optional) – The identifier or function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). If it is a string, it must be a valid function. Otherwise, it must be a function that taking multiple keyword arguments and returns a tensor of size (batch_size,). If not provided, the default attributed function for the model will be used (change attribution_model.default_attributed_fn_id).

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method.

  • attributed_fn_args (dict, optional) – Additional arguments to pass to the attributed function.

  • step_scores_args (dict, optional) – Additional arguments to pass to the step scores functions.

Returns:

An object containing a list of sequence attributions, with

an optional added list of single FeatureAttributionStepOutput for each step and extra information regarding the attribution parameters.

Return type:

FeatureAttributionOutput

abstract unhook(**kwargs) None[source]

Unhooks the attribution method from the model. If the model was modified in any way, this should restore its initial state.

Abstract method, must be implemented by subclasses.

Gradient Attribution Methods

class inseq.attr.feat.GradientAttributionRegistry(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]

Gradient-based attribution method registry.

attribute_step(attribute_fn_main_args: Dict[str, Any], attribution_args: Dict[str, Any] = {}) GradientFeatureAttributionStepOutput[source]

Performs a single attribution step for the specified attribution arguments.

Parameters:
  • attribute_fn_main_args (dict) – Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.

Returns:

A dataclass containing a tensor of source

attributions of size (batch_size, source_length), possibly a tensor of target attributions of size (batch_size, prefix length) if attribute_target=True and possibly a tensor of deltas of size `(batch_size) if the attribution step supports deltas and they are requested. At this point the batch information is empty, and will later be filled by the enrich_step_output function.

Return type:

GradientFeatureAttributionStepOutput

hook(**kwargs)[source]

Hooks the attribution method to the model by replacing normal nn.Embedding with Captum’s InterpretableEmbeddingBase.

unhook(**kwargs)[source]

Unhook the attribution method by restoring the model’s original embeddings.

class inseq.attr.feat.DeepLiftAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

DeepLIFT attribution method.

Reference implementation: https://captum.ai/api/deep_lift.html.

Warning

The DiscretizedIntegratedGradientsAttribution class is currently exhibiting inconsistent behavior, so usage should be limited until further notice. See PR # 114 for additional info.

class inseq.attr.feat.DiscretizedIntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = False, **kwargs)[source]

Discretized Integrated Gradients attribution method

Reference: https://arxiv.org/abs/2108.13654

Original implementation: https://github.com/INK-USC/DIG

hook(**kwargs)[source]

Hooks the attribution method to the model by replacing normal nn.Embedding with Captum’s InterpretableEmbeddingBase.

class inseq.attr.feat.GradientShapAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

GradientSHAP attribution method.

Reference implementation: https://captum.ai/api/gradient_shap.html.

class inseq.attr.feat.IntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

Integrated Gradients attribution method.

Reference implementation: https://captum.ai/api/integrated_gradients.html.

class inseq.attr.feat.InputXGradientAttribution(attribution_model)[source]

Input x Gradient attribution method.

Reference implementation: https://captum.ai/api/input_x_gradient.html.

class inseq.attr.feat.SaliencyAttribution(attribution_model)[source]

Saliency attribution method.

Reference implementation: https://captum.ai/api/saliency.html.

Layer Attribution Methods

class inseq.attr.feat.LayerIntegratedGradientsAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

Layer Integrated Gradients attribution method.

Reference implementation: https://captum.ai/api/layer.html#layer-integrated-gradients.

class inseq.attr.feat.LayerGradientXActivationAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

Layer Integrated Gradients attribution method.

Reference implementation: https://captum.ai/api/layer.html#layer-gradient-x-activation.

class inseq.attr.feat.LayerDeepLiftAttribution(attribution_model, multiply_by_inputs: bool = True, **kwargs)[source]

Layer DeepLIFT attribution method.

Reference implementation: https://captum.ai/api/layer.html#layer-deeplift.

Attention Attribution Methods

class inseq.attr.feat.InternalsAttributionRegistry(attribution_model: AttributionModel, hook_to_model: bool = True, **kwargs)[source]

Model Internals-based attribution method registry.

attribute_step(attribute_fn_main_args: Dict[str, Any], attribution_args: Dict[str, Any] = {}) FeatureAttributionStepOutput[source]

Performs a single attribution step for the specified attribution arguments.

Parameters:
  • attribute_fn_main_args (dict) – Main arguments used for the attribution method. These are built from model inputs at the current step of the feature attribution process.

  • attribution_args (dict, optional) – Additional arguments to pass to the attribution method. These can be specified by the user while calling the top level attribute methods. Defaults to {}.

Returns:

A dataclass containing a tensor of source

attributions of size (batch_size, source_length), possibly a tensor of target attributions of size (batch_size, prefix length) if attribute_target=True and possibly a tensor of deltas of size `(batch_size) if the attribution step supports deltas and they are requested. At this point the batch information is empty, and will later be filled by the enrich_step_output function.

Return type:

FeatureAttributionStepOutput

format_attribute_args(batch: Union[Batch, EncoderDecoderBatch], target_ids: Tensor[Tensor], attributed_fn: Callable[[...], Tensor[Tensor]], attribute_target: bool = False, attributed_fn_args: Dict[str, Any] = {}, **kwargs) Dict[str, Any][source]

Formats inputs for the attention attribution methods

Parameters:
  • batch (Batch or EncoderDecoderBatch) – The batch of sequences on which attribution is performed.

  • target_ids (torch.Tensor) – Target token ids of size (batch_size) corresponding to tokens for which the attribution step must be performed.

  • attributed_fn (Callable[..., SingleScorePerStepTensor]) – The function of model outputs representing what should be attributed (e.g. output probits of model best prediction after softmax). The parameter must be a function that taking multiple keyword arguments and returns a tensor of size (batch_size,). If not provided, the default attributed function for the model will be used (change attribution_model.default_attributed_fn_id).

  • attribute_target (bool, optional) – Whether to attribute the target prefix or not. Defaults to False.

  • attributed_fn_args (dict, optional) – Additional arguments to pass to the attributed function. Defaults to {}.

Returns:

A dictionary containing the formatted attribution arguments.

Return type:

dict

hook(**kwargs)[source]

Hooks the attribution method to the model. Useful to implement pre-attribution logic (e.g. freezing layers, replacing embeddings, raise warnings, etc.).

Abstract method, must be implemented by subclasses.

unhook(**kwargs)[source]

Unhooks the attribution method from the model. If the model was modified in any way, this should restore its initial state.

Abstract method, must be implemented by subclasses.

class inseq.attr.feat.AttentionWeightsAttribution(attribution_model, **kwargs)[source]

The basic attention attribution method, which retrieves the attention weights from the model.

Attribute Args:
aggregate_heads_fn (str or callable): The method to use for aggregating across heads.

Can be one of average (default if heads is list, tuple or None), max, min or single (default if heads is int), or a custom function defined by the user.

aggregate_layers_fn (str or callable): The method to use for aggregating across layers.

Can be one of average (default if layers is tuple or list), max, min or single (default if layers is int or None), or a custom function defined by the user.

heads (int or tuple[int, int] or list(int), optional): If a single value is specified,

the head at the corresponding index is used. If a tuple of two indices is specified, all heads between the indices will be aggregated using aggregate_fn. If a list of indices is specified, the respective heads will be used for aggregation. If aggregate_fn is β€œsingle”, a head must be specified. If no value is specified, all heads are passed to aggregate_fn by default.

layers (int or tuple[int, int] or list(int), optional): If a single value is specified

, the layer at the corresponding index is used. If a tuple of two indices is specified, all layers among the indices will be aggregated using aggregate_fn. If a list of indices is specified, the respective layers will be used for aggregation. If aggregate_fn is β€œsingle”, the last layer is used by default. If no value is specified, all available layers are passed to aggregate_fn by default.

Example

  • model.attribute(src) will return the average attention for all heads of the last layer.

  • model.attribute(src, heads=0) will return the attention weights for the first head of the last layer.

  • model.attribute(src, heads=(0, 5), aggregate_heads_fn="max", layers=[0, 2, 7]) will return the maximum

    attention weights for the first 5 heads averaged across the first, third, and eighth layers.