Question 10 of 10Pro Only

What is data leakage in machine learning, and how can preprocessing steps inadvertently cause it? Describe best practices for preventing data leakage throughout the preprocessing pipeline.

Sample answer preview

Data leakage occurs when information from outside the training data is used to build the model, leading to overly optimistic performance estimates that do not generalize to new data. Preprocessing is a common source of subtle leakage that can be difficult to detect.

data leakagetrain-test contaminationtarget leakagePipelinecross-validationfit_transform

Unlock the full answer

Get the complete model answer, key points, common pitfalls, and access to 9+ more AI/ML Engineer interview questions.

Upgrade to Pro

Starting at $19/month • Cancel anytime