Welcome to Inseq! 🐛
Inseq is a Pytorch-based hackable toolkit to democratize the study of interpretability for sequence generation models. At the moment, Inseq supports a wide set of models from the 🤗 Transformers library and an ever-growing set of feature attribution methods, leveraging in part the widely-used Captum library. For a quick introduction to common use cases, see the Getting started with Inseq page.
PyPI Package: https://pypi.org/project/inseq
MT Gender Bias Demo: oskarvanderwal/MT-bias-demo
Using Inseq, feature attribution maps that can be saved, reloaded, aggregated and visualized either as HTMLs (with Jupyter notebook support) or directly in the console using rich. Besides simple attribution, Inseq also supports features like step score extraction, attribution aggregation and attributed functions customization for more advanced use cases. Refer to the guides in the 🐛 Using Inseq section for more details and examples on specific features.
To give a taste of what Inseq can do in a couple lines of code, here’s a snippet doing source-side attribution of an English-to-Italian translation produced by the model
Helsinki-NLP/opus-mt-en-it from 🤗 Transformers using the
IntegratedGradients method with 300 integral approximation steps, and returning the attribution convergence delta and token-level prediction probabilties.
import inseq model = inseq.load_model("Helsinki-NLP/opus-mt-en-fr", "integrated_gradients") out = model.attribute( "The developer argued with the designer because she did not like the design.", n_steps=300, return_convergence_delta=True, step_scores=["probability"], ) out.show()
Inseq is still in early development and is currently maintained by a small team of graduate students based working on interpretability for NLP/NLG led by Gabriele Sarti. We are working hard to add more features and models. If you have any suggestions or feedback, please open an issue on our GitHub repository. Happy hacking! 🐛
- Getting started with Inseq
- Comparing Attributions with PairAggregator
- Using Custom Attribution Targets for Contrastive Feature Attribution
- Attributing Multilingual MT Models
- Locating Factual Knowledge in GPT-2
- Estimating Prediction Confidence with Tuned Lens
- Main Functions
- Framework Classes
- Architectural Classes
- Data Classes
- Feature Attribution
- Gradient Attribution Methods
- Attention Attribution Methods
- Step Functions