Unable to upload PDF Files to the Shopify from S3 Bucket using REST API - Python Code

Topic summary

A developer is attempting to bulk upload 5000+ products daily to Shopify, each with an associated PDF file. Their workflow involves:

Current Process:

  • Upload PDFs to AWS S3 staging bucket
  • Use Shopify’s GraphQL API to link files from S3 to products via variant SKU
  • File sizes range from <1MB to 20MB+

The Problem:
While the API returns successful responses (status 200), the files are not actually appearing in Shopify. The response indicates files were “successfully submitted for uploading” but they fail to complete.

Code Details:
The implementation uses the fileCreate GraphQL mutation with originalSource pointing to S3 URLs. Response data shows file IDs are being generated, suggesting the API accepts the request, but the actual file upload doesn’t complete.

Status:
The issue remains unresolved. The disconnect between successful API responses and failed uploads suggests a potential problem with S3 file accessibility, CORS configuration, or Shopify’s ability to retrieve files from the provided URLs. No solutions or responses from other users have been posted yet.

Summarized with AI on November 10. AI used: claude-sonnet-4-5-20250929.

Hi, I have to add a mass amount of Products (5000+) on Shopify on daily basis.

Almost all the products contain a PDF file that is to be uploaded as well. So to handle this, I have a code that uploads all the PDF files to a staging directory which is in AWS S3 Bucket and then from S3, I am uploading all the files to Shopify and keeping them linked with Variant SKU. This Variant SKU is then later used to link these files with the newly created products.

The problem I am facing is my code runs fine and even the response from the API is also ok. But the files fail to get uploaded on Shopify. These files are mostly less than 5 MBs but some of them are also 20 MB+ and some also are less than 1 MB. But all of them fail to get uploaded on Shopify.

This is my current code,

def upload_file_to_shopify(original_source, alt="Uploaded file"):
    sku = get_sku_from_url(original_source)
    variables = {
        "files": [{
            "alt": alt,
            "contentType": "FILE",
            "originalSource": original_source
        }]
    }

    payload = {
        "query": graphql_mutation,
        "variables": variables,
    }

    response = requests.post(store_url, json=payload, headers=headers)
    check_rate_limit(response.headers)

    if response.status_code == 200:
        response_data = response.json()
        upload_user_guide_file_response_logs.append(f"For {sku} -> \n\n {response_data} \n\n")
        file_ids = [file['id'] for file in response_data.get("data", {}).get("fileCreate", {}).get("files", [])]
        if file_ids:
            sku_file_id_mapping[sku] = file_ids[0]  
        logging.info(f"Success: {sku} uploaded successfully.")
        total_logs.append(f"Success: {sku} uploaded successfully.")
        return (True, sku)
    else:
        total_logs.append(f"Failed to upload {sku}.")
        total_logs.append(response.text)
        logging.info(f"Failed to upload {sku}.")
        return (False, sku)

successful_skus = []
failed_skus = []

with ThreadPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(upload_file_to_shopify, s3_path, "Alt text for your file") for s3_path in successful_s3_paths]
    for future in as_completed(futures):
        success, sku = future.result()
        if success:
            successful_skus.append(sku)
        else:
            failed_skus.append(sku)
 

Please note that successful_s3_paths is a list which contains paths to all my PDF files on AWS S3 Bucket. This list is generated right before this function where another function is responsible for adding all the products to the S3 Bucket. If you would like to look at the code, I have shared it below,

def upload_stream_to_s3(url, bucket, key):
    with closing(requests.get(url, stream=True)) as r:
        r.raise_for_status()
        # Create a BytesIO object
        with BytesIO(r.content) as file_stream:
            file_stream.seek(0)
            s3.upload_fileobj(file_stream, bucket, key, ExtraArgs={'ContentType': 'application/pdf'})
            s3_url = f"https://{bucket}.s3.{aws_region}.amazonaws.com/{key}"
            return s3_url

skipped_skus = []
downloaded_skus = []
failed_skus = []
successful_s3_paths = [] 

def download_and_upload_file(row):
    pdf_url = row.get('PDF Link')
    if pd.isnull(pdf_url) or pdf_url.strip() == '':
        skipped_skus.append(row['sku'])
        return f"Skipped {row['sku']} due to missing PDF link"
    
    sku = row['sku']
    file_name = f"user_guides/{sku}.pdf"
    try:
        s3_path = upload_stream_to_s3(pdf_url, s3_bucket_name, file_name)
        successful_s3_paths.append(s3_path)
        downloaded_skus.append(sku)
        return f"Uploaded {file_name} to S3 bucket {s3_bucket_name}"
    except Exception as e:
        failed_skus.append(sku)
        return f"Failed to process {sku}: {e}"

with ThreadPoolExecutor(max_workers=max_workers) as executor:
    futures = [executor.submit(download_and_upload_file, row) for index, row in df.iterrows()]
    
    for future in as_completed(futures):
        logging.info(future.result())
        total_logs.append(future.result())

I have added some loggings in between for my own understanding. This is the response I get for each file upload, which tells me that file was successfully submitted for uploading,

 {'data': {'fileCreate': {'files': [{'id': 'gid://shopify/GenericFile/37759118803226', 'alt': 'Alt text for your file', 'createdAt': '2024-04-14T03:34:09Z'}], 'userErrors': []}}, 'extensions': {'cost': {'requestedQueryCost': 20, 'actualQueryCost': 20, 'throttleStatus': {'maximumAvailable': 2000.0, 'currentlyAvailable': 1941, 'restoreRate': 100.0}}}} 

But, when I look at Shopify, it shows me all files failed to upload. Interestingly, if I go into Shopify right at the same time, it tells me that “{NUMBER} of files processing”

But after some time, it says all files failed to process.
Please see attached images below,

I am unable to understand the problem. Is there anyone who can possibly help me and resolve the issue?

Please note that this same code was working fine before but has started to cause issues now.
Also, please note that I am not running this code locally and it has been running on AWS Fargate Cluster.