
Tomek Links in Machine Learning: Complete Guide (With Python Code)
Tomek Links in Machine Learning are a powerful data cleaning technique used to improve classification models, especially when working with imbalanced datasets. They identify pairs of samples from opposite classes that are extremely close to each other and remove ambiguous or noisy points near the decision boundary.
If your model struggles with overlapping classes or poor minority-class performance, applying Tomek Links can significantly improve results.
What Are Tomek Links in Machine Learning?
Tomek Links in Machine Learning are pairs of samples from opposite classes that are each other’s nearest neighbors. These pairs typically lie near the decision boundary and often represent noise or class overlap.
Learn more about nearest neighbors here: k-nearest neighbors algorithm.

Why Use Tomek Links?
- Improve class separation
- Reduce noise near boundaries
- Boost minority class performance
- Enhance model generalization
They are widely used in machine learning, especially when dealing with imbalanced datasets.
How Tomek Links Work
- Find nearest neighbors
- Identify cross-class mutual neighbors
- Mark Tomek Links
- Remove majority samples
Python Implementation
from collections import Counter
from sklearn.datasets import make_classification
from imblearn.under_sampling import TomekLinks
X, y = make_classification(
n_samples=1000,
weights=[0.9, 0.1],
n_features=2,
random_state=42
)
print("Before:", Counter(y))
tl = TomekLinks()
X_res, y_res = tl.fit_resample(X, y)
print("After:", Counter(y_res))
When to Use Tomek Links in Machine Learning
- Imbalanced classification problems
- Noisy datasets
- Overlapping distributions
Comparison vs SMOTE
SMOTE creates synthetic samples, while Tomek Links remove noisy ones.
They are often combined to improve results.
Combine SMOTE + Tomek Links
from imblearn.combine import SMOTETomek
smt = SMOTETomek(random_state=42)
X_res, y_res = smt.fit_resample(X, y)
Best Resources to Learn Machine Learning
To go deeper into machine learning, check these curated resources:
FAQ
What is a Tomek Link?
A pair of samples from different classes that are each other’s nearest neighbors.
Do they remove minority samples?
Usually no, only majority samples are removed.
Are they better than random undersampling?
Yes, because they remove only problematic samples.
Final Thoughts
Tomek Links in Machine Learning are a simple yet effective way to clean datasets and improve classification performance. They are especially useful in imbalanced problems and work well with other preprocessing techniques.
Subscribe to our awesome newsletter to get the best content on your journey to learn Machine Learning, including some exclusive free goodies!

