Unable to upload PDF Files to the Shopify from S3 Bucket using REST API - Python Code

hassanashas
Shopify Partner
10 0 0

Hi, I have to add a mass amount of Products (5000+) on Shopify on daily basis. 

Almost all the products contain a PDF file that is to be uploaded as well. So to handle this, I have a code that uploads all the PDF files to a staging directory which is in AWS S3 Bucket and then from S3, I am uploading all the files to Shopify and keeping them linked with Variant SKU. This Variant SKU is then later used to link these files with the newly created products.

 

The problem I am facing is my code runs fine and even the response from the API is also ok. But the files fail to get uploaded on Shopify. These files are mostly less than 5 MBs but some of them are also 20 MB+ and some also are less than 1 MB. But all of them fail to get uploaded on Shopify. 

 

This is my current code, 

 

def upload_file_to_shopify(original_source, alt="Uploaded file"):
    sku = get_sku_from_url(original_source)
    variables = {
        "files": [{
            "alt": alt,
            "contentType": "FILE",
            "originalSource": original_source
        }]
    }

    payload = {
        "query": graphql_mutation,
        "variables": variables,
    }

    response = requests.post(store_url, json=payload, headers=headers)
    check_rate_limit(response.headers)

    if response.status_code == 200:
        response_data = response.json()
        upload_user_guide_file_response_logs.append(f"For {sku} -> \n\n {response_data} \n\n")
        file_ids = [file['id'] for file in response_data.get("data", {}).get("fileCreate", {}).get("files", [])]
        if file_ids:
            sku_file_id_mapping[sku] = file_ids[0]  
        logging.info(f"Success: {sku} uploaded successfully.")
        total_logs.append(f"Success: {sku} uploaded successfully.")
        return (True, sku)
    else:
        total_logs.append(f"Failed to upload {sku}.")
        total_logs.append(response.text)
        logging.info(f"Failed to upload {sku}.")
        return (False, sku)

successful_skus = []
failed_skus = []

with ThreadPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(upload_file_to_shopify, s3_path, "Alt text for your file") for s3_path in successful_s3_paths]
    for future in as_completed(futures):
        success, sku = future.result()
        if success:
            successful_skus.append(sku)
        else:
            failed_skus.append(sku)
 
 
 
Please note that successful_s3_paths is a list which contains paths to all my PDF files on AWS S3 Bucket. This list is generated right before this function where another function is responsible for adding all the products to the S3 Bucket. If you would like to look at the code, I have shared it below,
 
def upload_stream_to_s3(url, bucket, key):
    with closing(requests.get(url, stream=True)) as r:
        r.raise_for_status()
        # Create a BytesIO object
        with BytesIO(r.content) as file_stream:
            file_stream.seek(0)
            s3.upload_fileobj(file_stream, bucket, key, ExtraArgs={'ContentType': 'application/pdf'})
            s3_url = f"https://{bucket}.s3.{aws_region}.amazonaws.com/{key}"
            return s3_url

skipped_skus = []
downloaded_skus = []
failed_skus = []
successful_s3_paths = [] 

def download_and_upload_file(row):
    pdf_url = row.get('PDF Link')
    if pd.isnull(pdf_url) or pdf_url.strip() == '':
        skipped_skus.append(row['sku'])
        return f"Skipped {row['sku']} due to missing PDF link"
    
    sku = row['sku']
    file_name = f"user_guides/{sku}.pdf"
    try:
        s3_path = upload_stream_to_s3(pdf_url, s3_bucket_name, file_name)
        successful_s3_paths.append(s3_path)
        downloaded_skus.append(sku)
        return f"Uploaded {file_name} to S3 bucket {s3_bucket_name}"
    except Exception as e:
        failed_skus.append(sku)
        return f"Failed to process {sku}: {e}"


with ThreadPoolExecutor(max_workers=max_workers) as executor:
    futures = [executor.submit(download_and_upload_file, row) for index, row in df.iterrows()]
    
    for future in as_completed(futures):
        logging.info(future.result())
        total_logs.append(future.result())
 
 
I have added some loggings in between for my own understanding. This is the response I get for each file upload, which tells me that file was successfully submitted for uploading, 
 
 
 {'data': {'fileCreate': {'files': [{'id': 'gid://shopify/GenericFile/37759118803226', 'alt': 'Alt text for your file', 'createdAt': '2024-04-14T03:34:09Z'}], 'userErrors': []}}, 'extensions': {'cost': {'requestedQueryCost': 20, 'actualQueryCost': 20, 'throttleStatus': {'maximumAvailable': 2000.0, 'currentlyAvailable': 1941, 'restoreRate': 100.0}}}} 

 
But, when I look at Shopify, it shows me all files failed to upload. Interestingly, if I go into Shopify right at the same time, it tells me that "{NUMBER} of files processing" 

But after some time, it says all files failed to process. 
Please see attached images below,
 
hassanashas_0-1713070745532.pnghassanashas_2-1713070768719.png

 


 

 
I am unable to understand the problem. Is there anyone who can possibly help me and resolve the issue? 

Please note that this same code was working fine before but has started to cause issues now. 
Also, please note that I am not running this code locally and it has been running on AWS Fargate Cluster. 
Replies 0 (0)