Question 10 of 10Pro Only
What is data leakage in machine learning, and how can preprocessing steps inadvertently cause it? Describe best practices for preventing data leakage throughout the preprocessing pipeline.
Sample answer preview
Data leakage occurs when information from outside the training data is used to build the model, leading to overly optimistic performance estimates that do not generalize to new data. Preprocessing is a common source of subtle leakage that can be difficult to detect.
data leakagetrain-test contaminationtarget leakagePipelinecross-validationfit_transform