HomeInterview QuestionsData Processing, Json, Deduplication

Given a file containing JSON records, de‑duplicate the contents based on event_id. If multiple records share the same event_id, keep only one.

🟡 Medium Coding Junior level
1 Times asked
Mar 2026 Last seen
Mar 2026 First seen

💡 Model Answer

The solution is identical to question 3. Read each line, parse the JSON, and store the first occurrence of each event_id in a dictionary. Subsequent duplicates are ignored. Complexity is linear in the number of records. Example in Python:

python
import json
seen = {}
with open('events.txt') as f:
    for line in f:
        rec = json.loads(line)
        eid = rec['event_id']
        if eid not in seen:
            seen[eid] = rec
# Process or write seen.values()

For large files, write each unique record immediately to an output file to keep memory usage low. If you need to keep the most recent record, replace the entry in the dictionary instead of skipping. This approach ensures O(n) time and O(k) space, where k is the number of unique event_ids.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500