How do you handle data dependency versus task dependency scenarios? For example, if a task completes successfully but the data is incorrect.
💡 Model Answer
Task dependency controls the order in which tasks run; it guarantees that Task B will not start until Task A has finished, regardless of the data produced. Data dependency, on the other hand, ensures that a downstream task only runs when the required data is available and valid. In Airflow, you can enforce task dependencies with the DAG structure (e.g., task_a >> task_b). For data dependencies, you can use sensors (e.g., FileSensor, HttpSensor) or XComs to pass data between tasks. If a task finishes but the data is incorrect, you can add a validation step that raises an exception or triggers a retry. Alternatively, use an ExternalTaskSensor to wait for a downstream DAG that performs data quality checks before allowing the next task to run. This combination of task ordering and data validation ensures both execution order and data integrity.
This answer was generated by AI for study purposes. Use it as a starting point — personalize it with your own experience.
🎤 Get questions like this answered in real-time
Assisting AI listens to your interview, captures questions live, and gives you instant AI-powered answers — invisible to screen sharing.
Get Assisting AI — Starts at ₹500