A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?
A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.Which of the following feature engineering tasks will be the least efficient to distribute?
A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column price is greater than 0.Which of the following code blocks will accomplish this task?
A data scientist has produced two models for a single machine learning problem. One of the models performs well when one of the features has a value of less than 5, and the other model performs well when the value of that feature is greater than or equal to 5. The data scientist decides to combine the two models into a single machine learning solution.Which of the following terms is used to describe this combination of models?
A machine learning engineering team has a Job with three successive tasks. Each task runs a single notebook. The team has been alerted that the Job has failed in its latest run.Which of the following approaches can the team use to identify which task is the cause of the failure?