Home › Interview Questions › Given a DataFrame with columns ID and email, gener…

Given a DataFrame with columns ID and email, generate two additional columns: first name and last name, using the email column. For example, the left side of the dot is the first name, and the right side is the last name.

🟡 Medium Coding Junior level
1Times asked
Jul 2026Last seen
Jul 2026First seen

💡 Model Answer

You can achieve this in PySpark by using the split function to separate the email address at the dot. First, import the necessary functions:

python
from pyspark.sql import functions as F

Assuming your DataFrame is named df and has columns ID and email, you can create the new columns as follows:

python
# Split the email into two parts
split_col = F.split(F.col('email'), '\.')

# Extract first and last name
result_df = df.withColumn('first_name', split_col.getItem(0))
result_df = result_df.withColumn('last_name', split_col.getItem(1))

If the email might contain more than one dot or you want to handle edge cases, you can use regexp_extract or split with a limit. The operation is a single pass over the data, so its time complexity is O(n) where n is the number of rows, and the space complexity is O(1) additional memory aside from the new columns. This solution works for both Spark SQL and DataFrame APIs and is straightforward to test with unit tests.

This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.

🎤 Get questions like this answered in real-time

Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.

Get Assisting AI — Starts at ₹500