Given a file containing JSON records, de‑duplicate the contents based on event_id. If multiple records share the same event_id, keep only one.
💡 Model Answer
To de‑duplicate the file, read each line, parse the JSON, and use a dictionary keyed by event_id to keep the first occurrence. If a duplicate event_id is encountered, skip it or replace it based on the desired policy. This approach runs in O(n) time and O(k) space, where n is the number of lines and k is the number of unique event_ids. In Python:
import json
unique = {}
with open('events.txt', 'r') as f:
for line in f:
record = json.loads(line)
eid = record['event_id']
if eid not in unique:
unique[eid] = record
# Write back or process unique.values()If the file is large, you can stream the output directly to a new file to avoid holding all records in memory. For very large datasets, consider using a database or a streaming framework like Apache Beam, where you can apply a Distinct transform on the event_id field. The key idea is to maintain a hash set or map of seen event_ids to filter duplicates efficiently.
This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.
🎤 Get questions like this answered in real-time
Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.
Get Assisting AI — Starts at ₹500