Given a file containing JSON records, de‑duplicate the contents based on event_id. If multiple records share the same event_id, keep only one.

Question

Assisting AI · Accepted Answer

The solution is identical to question 3. Read each line, parse the JSON, and store the first occurrence of each event_id in a dictionary. Subsequent duplicates are ignored. Complexity is linear in the number of records. Example in Python:

```python
import json
seen = {}
with open('events.txt') as f:
    for line in f:
        rec = json.loads(line)
        eid = rec['event_id']
        if eid not in seen:
            seen[eid] = rec
# Process or write seen.values()
```

For large files, write each unique record immediately to an output file to keep memory usage low. If you need to keep the most recent record, replace the entry in the dictionary instead of skipping. This approach ensures O(n) time and O(k) space, where k is the number of unique event_ids.

Given a file containing JSON records, de‑duplicate the contents based on event_id. If multiple records share the same event_id, keep only one.

💡 Model Answer

🎤 Get questions like this answered in real-time