You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Document how to use a GCP service account in Airflow with SkyPilot (#6291)
* document how to use a GCP service account in Airflow with SkyPilot
* revert DATA_BUCKET_STORE_TYPE from gcs to s3
* remove unused import
* add missing newline
* assign var.value.SKYPILOT_API_SERVER_ENDPOINT to a variable
* add a note on airflow task virtual env
* fix link
* add code snippet for run_sky_task
* simplify code snippet
* simplify passing in SKYPILOT_API_SERVER_ENDPOINT
* fix readme
* simplify
* move import sky inside _run_sky_task
In this guide, we show how a training workflow involving data preprocessing, training and evaluation can be first easily developed with SkyPilot, and then orchestrated in Airflow.
4
4
5
-
This example uses a remote SkyPilot API Server to manage shared state across invocations, and includes a failure callback to tear down the SkyPilot cluster on task failure.
5
+
This example uses a remote SkyPilot API Server to manage shared state across invocations.
@@ -11,17 +11,17 @@ This example uses a remote SkyPilot API Server to manage shared state across inv
11
11
</p>
12
12
13
13
14
-
**💡 Tip:** SkyPilot also supports defining and running pipelines without Airflow. Check out [Jobs Pipelines](https://skypilot.readthedocs.io/en/latest/examples/managed-jobs.html#job-pipelines) for more information.
14
+
**💡 Tip:** SkyPilot also supports defining and running pipelines without Airflow. Check out [Jobs Pipelines](https://skypilot.readthedocs.io/en/latest/examples/managed-jobs.html#job-pipelines) for more information.
15
15
16
16
## Why use SkyPilot with Airflow?
17
-
In AI workflows, **the transition from development to production is hard**.
17
+
In AI workflows, **the transition from development to production is hard**.
18
18
19
-
Workflow development happens ad-hoc, with a lot of interaction required
20
-
with the code and data. When moving this to an Airflow DAG in production, managing dependencies, environments and the
21
-
infra requirements of the workflow gets complex. Porting the code to an airflow requires significant time to test and
19
+
Workflow development happens ad-hoc, with a lot of interaction required
20
+
with the code and data. When moving this to an Airflow DAG in production, managing dependencies, environments and the
21
+
infra requirements of the workflow gets complex. Porting the code to an airflow requires significant time to test and
22
22
validate any changes, often requiring re-writing the code as Airflow operators.
23
23
24
-
**SkyPilot seamlessly bridges the dev -> production gap**.
24
+
**SkyPilot seamlessly bridges the dev -> production gap**.
25
25
26
26
SkyPilot can operate on any of your infra, allowing you to package and run the same code that you ran during development on a
27
27
production Airflow cluster. Behind the scenes, SkyPilot handles environment setup, dependency management, and infra orchestration, allowing you to focus on your code.
@@ -45,13 +45,13 @@ Here's how you can use SkyPilot to take your dev workflows to production in Airf
45
45
Once your API server is deployed, you will need to configure Airflow to use it. Set the `SKYPILOT_API_SERVER_ENDPOINT` variable in Airflow - it will be used by the `run_sky_task` function to send requests to the API server:
46
46
47
47
```bash
48
-
airflow variables set SKYPILOT_API_SERVER_ENDPOINT https://<api-server-endpoint>
48
+
airflow variables set SKYPILOT_API_SERVER_ENDPOINT http://<skypilot-api-server-endpoint>
49
49
```
50
50
51
51
You can also use the Airflow web UI to set the variable:
@@ -88,38 +88,58 @@ Once we have developed the tasks, we can seamlessly run them in Airflow.
88
88
89
89
1.**No changes required to our tasks -** we use the same YAMLs we wrote in the previous step to create an Airflow DAG in `sky_train_dag.py`.
90
90
2.**Airflow native logging** - SkyPilot logs are written to container stdout, which is captured as task logs in Airflow and displayed in the UI.
91
-
3.**Easy debugging** - If a task fails, you can independently run the task using `sky launch` to debug the issue. SkyPilot will recreate the environment in which the task failed.
91
+
3.**Easy debugging** - If a task fails, you can independently run the task using `sky launch` to debug the issue. SkyPilot will recreate the environment in which the task failed.
92
92
93
93
Here's a snippet of the DAG declaration in [sky_train_dag.py](https://github.com/skypilot-org/skypilot/blob/master/examples/airflow/sky_train_dag.py):
94
94
```python
95
-
with DAG(dag_id='sky_train_dag',
96
-
default_args=default_args,
97
-
schedule_interval=None,
95
+
with DAG(dag_id='sky_train_dag', default_args=default_args,
98
96
catchup=False) as dag:
99
97
# Path to SkyPilot YAMLs. Can be a git repo or local directory.
Behind the scenes, the `run_sky_task` uses the Airflow native Python operator to invoke the SkyPilot API. All SkyPilot API calls are made to the remote API server, which is configured using the `SKYPILOT_API_SERVER_ENDPOINT` variable.
130
+
Behind the scenes, the `run_sky_task` uses the Airflow native [PythonVirtualenvOperator](https://airflow.apache.org/docs/apache-airflow-providers-standard/stable/operators/python.html#pythonvirtualenvoperator) (@task.virtualenv), which creates a Python virtual environment with `skypilot` installed. We need to run the task in a virtual environment as there's a dependency conflict between the latest `skypilot` and `airflow` Python package.
2. Run `airflow dags list` to confirm that the DAG is loaded.
170
+
2. Run `airflow dags list` to confirm that the DAG is loaded.
151
171
3. Find the DAG in the Airflow UI (typically http://localhost:8080) and enable it. The UI may take a couple of minutes to reflect the changes. Force unpause the DAG if it is paused with `airflow dags unpause sky_train_dag`
152
172
4. Trigger the DAG from the Airflow UI using the `Trigger DAG` button.
153
173
5. Navigate to the run in the Airflow UI to see the DAG progress and logs of each task.
154
174
155
-
If a task fails, `task_failure_callback` will automatically tear down the SkyPilot cluster.
156
-
175
+
If a task fails, SkyPilot will automatically tear down the SkyPilot cluster.
Next, we will define `data_preprocessing_gcp_sa.yaml`, which contains small modifications to `data_preprocessing.yaml` that will use our GCP service account. The key changes needed here are to mount the GCP service account JSON key to our SkyPilot cluster, and to activate it using `gcloud auth activate-service-account`.
211
+
212
+
We will also need a new task to read the GCP service account JSON key from our Airflow connection, and then change the preprocess task in our DAG to refer to this new YAML file.
213
+
214
+
```python
215
+
with DAG(dag_id='sky_train_dag', default_args=default_args,
## Future work: a native Airflow Executor built on SkyPilot
166
236
167
237
Currently this example relies on a helper `run_sky_task` method to wrap SkyPilot invocation in @task, but in the future SkyPilot can provide a native Airflow Executor.
0 commit comments