Suppose the same customer can buy from both websites and has the same products. If we want to build a customer dimension, how would we combine these two sources? How would we identify if the customer is the same, given they may have different email IDs and phone numbers? What logic would you use to identify these two records?
💡 Model Answer
Building a unified customer dimension across two e‑commerce sites requires a robust entity resolution pipeline. First, we perform blocking to reduce comparisons: group records by shared attributes such as postal code or first name. Within each block, we compute similarity scores for name, address, email, and phone using metrics like Jaro‑Winkler and Levenshtein. We then apply a probabilistic model (e.g., Fellegi‑Sunter) that assigns weights to each field based on its discriminative power. The overall match score is compared against two thresholds: a high threshold for automatic merge and a low threshold for manual review. For deterministic matches, we use exact email or phone matches. For ambiguous cases, we may incorporate additional signals such as purchase history similarity or device fingerprinting. Once matched, we generate a surrogate key for the customer dimension and store all source identifiers in a mapping table. This approach ensures that the dimension reflects a single customer entity while preserving traceability to each source.
This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.
🎤 Get questions like this answered in real-time
Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.
Get Assisting AI — Starts at ₹500