Hii @beccasaurus @nicain @lukesneeringer @hfwang I'm encountering an error while running a custom Dataflow job using a flex template in Google Cloud Platform (GCP).
A custom pipeline was created using a custom template in which a JSON file was provided. The pipeline was launched
successfully and scheduled to run. However, at some point during the execution, an error occurred, causing the pipeline to fail.
Environment:
Apache Beam version: apache-beam[gcp]==2.44.0
The error message is as follows:
Failed to read the job file: gs://dataflow-staging-us-central1-713358881388/staging/template_launches/2023-02-20_18_27_45-8498022740013370621/job_object with error message: (c20b1cad16245ca5): Unable to open template file: gs://dataflow-staging-us-central1-713358881388/staging/template_launches/2023-02-20_18_27_45-8498022740013370621/job_object..
I have also verified that the options for the job are set correctly. Here's an example of how I'm setting the options using the PipelineOptions class in Python:
pipeline_options = PipelineOptions.from_dictionary({
'runner': 'DataflowRunner',
'project': 'testcircle-350611',
'region': 'us-central1',
'staging_location': 'gs://dataflow-staging-us-central1-713358881388/staging/',
'temp_location': 'gs://dataflow-staging-us-central1-713358881388/tmp/',
'template_location': 'gs://dataflow-staging-us-central1-713358881388/staging/template_launches/',
'service_account_email': 'xxxx-compute@developer.gserviceaccount.com'
})
here is my JSON file
"resources": {
"sdkPipelineOptions": {
"description": "Apache Beam SDK pipeline options",
"properties": {
"saveMainSession": "true",
"runner": "DataflowRunner",
"project": "testcircle-350611",
"region": "us-central1",
"staging_location": "gs://dataflow-staging-us-central1-713358881388/staging/",
"temp_location": "gs://dataflow-staging-us-central1-713358881388/tmp/",
"template_location": "gs://dataflow-staging-us-central1-713358881388/staging/template_launches/",
"service_account_email": "-xxxxcompute@developer.gserviceaccount.com"
}
}
},
and here is my docker file code
RUN pip install --upgrade pip
RUN apt-get update && apt-get install -y default-jdk postgresql-client
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY etl.py .
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/etl.py"
ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"] ```
Hii @beccasaurus @nicain @lukesneeringer @hfwang I'm encountering an error while running a custom Dataflow job using a flex template in Google Cloud Platform (GCP).
A custom pipeline was created using a custom template in which a JSON file was provided. The pipeline was launched
successfully and scheduled to run. However, at some point during the execution, an error occurred, causing the pipeline to fail.
Environment:
Apache Beam version: apache-beam[gcp]==2.44.0
The error message is as follows:
I have also verified that the options for the job are set correctly. Here's an example of how I'm setting the options using the PipelineOptions class in Python:
here is my JSON file
and here is my docker file code