Aggregators
- class inseq.data.aggregator.Aggregator[source]
- classmethod end_aggregation_hook(tensors: TensorWrapper, **kwargs)[source]
Hook called at the end of the aggregation process.
Use to ensure that the final product of aggregation is compliant with the requirements of individual aggregators.
- classmethod post_aggregate_hook(tensors: TensorWrapper, **kwargs)[source]
Hook called right after the aggregation function is called.
Verifies that the aggregated object has the correct properties.
- classmethod pre_aggregate_hook(tensors: TensorWrapper, **kwargs)[source]
Hook called right before the aggregation function is called.
Use to ensure a prerequisite that is functional of previous aggregation steps and fundamental to the aggregation process (e.g. the aggregatable object produced by the previous step has correct shapes).
- classmethod start_aggregation_hook(tensors: TensorWrapper, **kwargs)[source]
Hook called at the start of the aggregation process.
Use to ensure a prerequisite that is independent of previous aggregation steps and fundamental to the aggregation process (e.g. parameters are of the correct type). Will avoid performing aggregation steps before returning an error.
- class inseq.data.aggregator.AggregatorPipeline(aggregators: list[str | type[Aggregator]], aggregate_fn: list[str | Callable] | None = None)[source]
- class inseq.data.aggregator.AggregableMixin[source]
- aggregate(aggregator: AggregatorPipeline | type[Aggregator] | str | Sequence[str | type[Aggregator]] | None = None, aggregate_fn: str | Sequence[str] | None = None, do_pre_aggregation_checks: bool = True, do_post_aggregation_checks: bool = True, **kwargs) AggregableMixinClass [source]
Aggregate outputs using the default or provided aggregator.
- Parameters:
aggregator (
AggregatorPipeline
orType[Aggregator]
orstr
or , optional) β Aggregator pipeline to use. If not provided, the default aggregator pipeline is used.- Returns:
The aggregated output class.
- Return type:
- class inseq.data.aggregator.SequenceAttributionAggregator[source]
Aggregates sequence attributions using a custom function. By default, the mean function is used.
Enables aggregation for the FeatureAttributionSequenceOutput class using an aggregation function of choice.
- Parameters:
attr (
FeatureAttributionSequenceOutput
) β The attribution object to aggregate.aggregate_fn (
Callable
, optional) β Function used to aggregate sequence attributions. Defaults to summing over the last dimension and renormalizing by the norm of the source(+target) attributions for granular attributions, no aggregation for token-level attributions.
- classmethod end_aggregation_hook(attr: FeatureAttributionSequenceOutput, **kwargs)[source]
Hook called at the end of the aggregation process.
Use to ensure that the final product of aggregation is compliant with the requirements of individual aggregators.
- classmethod post_aggregate_hook(attr: FeatureAttributionSequenceOutput, **kwargs)[source]
Hook called right after the aggregation function is called.
Verifies that the aggregated object has the correct properties.
- class inseq.data.aggregator.ContiguousSpanAggregator[source]
Reduces sequence attributions across one or more contiguous spans.
- Parameters:
attr (
FeatureAttributionSequenceOutput
) β The attribution object to aggregate.aggregate_fn (
Callable
, optional) β Function used to aggregate sequence attributions. Defaults to the highest absolute value score across the aggregated span, with original sign preserved (e.g. [0.3, -0.7, 0.1] -> -0.7).source_spans (tuple of [int, int] or sequence of tuples of [int, int], optional) β Spans to aggregate over for the source sequence. Defaults to no aggregation performed.
target_spans (tuple of [int, int] or sequence of tuples of [int, int], optional) β Spans to aggregate over for the target sequence. Defaults to no aggregation performed.
- classmethod aggregate(attr: FeatureAttributionSequenceOutput, source_spans: tuple[int, int] | Sequence[tuple[int, int]] | None = None, target_spans: tuple[int, int] | Sequence[tuple[int, int]] | None = None, **kwargs)[source]
Spans can be:
- A list of the form [pos_start, pos_end] including the contiguous positions of tokens that
are to be aggregated, if all values are integers and len(span) < len(original_seq)
- A list of the form [(pos_start_0, pos_end_0), (pos_start_1, pos_end_1)], same as above but
for multiple contiguous spans.
- classmethod end_aggregation_hook(attr: FeatureAttributionSequenceOutput, **kwargs)[source]
Hook called at the end of the aggregation process.
Use to ensure that the final product of aggregation is compliant with the requirements of individual aggregators.
- classmethod start_aggregation_hook(attr: FeatureAttributionSequenceOutput, source_spans: tuple[int, int] | Sequence[tuple[int, int]] | None = None, target_spans: tuple[int, int] | Sequence[tuple[int, int]] | None = None, **kwargs)[source]
Hook called at the start of the aggregation process.
Use to ensure a prerequisite that is independent of previous aggregation steps and fundamental to the aggregation process (e.g. parameters are of the correct type). Will avoid performing aggregation steps before returning an error.
- class inseq.data.aggregator.SubwordAggregator[source]
Aggregates over subwords by automatic detecting contiguous subword spans.
- Parameters:
attr (
FeatureAttributionSequenceOutput
) β The attribution object to aggregate.aggregate_fn (
Callable
, optional) β Function to aggregate over the subwords. Defaults to the highest absolute value score across the aggregated span, with original sign preserved (e.g. [0.3, -0.7, 0.1] -> -0.7).aggregate_source (bool, optional) β Whether to aggregate over the source sequence. Defaults to True.
aggregate_target (bool, optional) β Whether to aggregate over the target sequence. Defaults to True.
special_chars (str or tuple of str, optional) β One or more characters used to identify subword boundaries. Defaults to βββ, used by SentencePiece. If is_suffix_symbol=True, then this symbol is used to identify parts to be aggregated (e.g. # in WordPiece, [βphenβ, β##omenβ, β##alβ]). Otherwise, it identifies the roots that should be preserved (e.g. β in SentencePiece, [ββphenβ, βomenβ, βalβ]).
is_suffix_symbol (bool, optional) β Whether the special symbol is used to identify suffixes or prefixes. Defaults to False.
- classmethod aggregate(attr: FeatureAttributionSequenceOutput, aggregate_source: bool = True, aggregate_target: bool = True, special_chars: str | tuple[str, ...] = 'β', is_suffix_symbol: bool = False, **kwargs)[source]
Spans can be:
- A list of the form [pos_start, pos_end] including the contiguous positions of tokens that
are to be aggregated, if all values are integers and len(span) < len(original_seq)
- A list of the form [(pos_start_0, pos_end_0), (pos_start_1, pos_end_1)], same as above but
for multiple contiguous spans.
- class inseq.data.aggregator.PairAggregator[source]
Aggregates two FeatureAttributionSequenceOutput object into a single one containing the diff.
- Parameters:
attr (
FeatureAttributionSequenceOutput
) β The starting attribution object.paired_attr (
FeatureAttributionSequenceOutput
) β The attribution object with whom the diff is computed, representing a change from attr_start (e.g. minimal pair edit).aggregate_fn (
Callable
, optional) β Function to aggregate elementwise values of the pair. Defaults to the difference between the two elements.
- default_fn(y)
str(object=ββ) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to βstrictβ.
- classmethod pre_aggregate_hook(attr: FeatureAttributionSequenceOutput, paired_attr: FeatureAttributionSequenceOutput, **kwargs)[source]
Hook called right before the aggregation function is called.
Use to ensure a prerequisite that is functional of previous aggregation steps and fundamental to the aggregation process (e.g. the aggregatable object produced by the previous step has correct shapes).