Question For Data Scientist: in Shoelaces?
Data Science in shoelaces, let’s know how your intuition is, in solving real-world-data problems!
enjoyable=> {
have fun;
}
Nike has hired you as a data science consultant to help them save money on shoe materials. Your first assignment is to review a model one of their employees built to predict how many shoelaces they’ll need each month. The features going into the machine learning model include:
- The current month (January, February, etc)
- Advertising expenditures in the previous month
- Various macroeconomic features (like the unemployment rate) as of the beginning of the current month
- The amount of leather they ended up using in the current month
The results show the model is almost perfectly accurate if you include the feature about how much leather they used. But it is only moderately accurate if you leave that feature out. You realize this is because the amount of leather they use is a perfect indicator of how many shoes they produce, which in turn tells you how many shoelaces they need.
After you read it, tell me What are the features or a feature that constitutes a source of data leakage?
Explain it why and how…..in the comment!