A Python script that extracts product information from a Word document, processes it to create entries suitable for WordPress/WooCommerce (including variations), and saves the data as a CSV file

August 22, 2024 ALL, AUTOMATE WORKFLOW EVERYWHERE, BUSINESS, ENGLISH ARTICLES, PYTHON 0

In summary, this Python script extracts product information from a Word document, processes it to create entries suitable for WordPress/WooCommerce (including variations), and saves the data as a CSV file you can import immediately to your eshop.

In more detail, the script performs the following tasks:

Imports necessary libraries: It imports pandas for data manipulation, Document from docx for handling Word documents, and uuid for generating unique identifiers.
Defines a function to generate unique SKUs: The generate_unique_sku function creates a unique SKU by combining a base name with a short UUID (a unique identifier) and an optional suffix.
Extracts product information from a Word document: The extract_product_info_from_word function reads a Word document and extracts product details such as title, description, type, SKU, colors, sizes, brand, and prices. It organizes this data into a list of dictionaries, where each dictionary represents a product.
Processes product information: The script iterates over the extracted product information. For each product:
- If the product is not variable, it generates a base SKU and prepares the product data.
- If the product is variable (i.e., it has variations like different colors or sizes), it creates separate entries for each variation, generating unique SKUs for each one.
Creates a DataFrame: All the processed product data (including variations) is stored in a list of dictionaries, which is then converted into a pandas DataFrame.
Saves the data to a CSV file: Finally, the DataFrame is saved as a CSV file, ready for use with WooCommerce, an e-commerce platform. The file is saved at a specified path on the user’s system.

Download Word document : product_details_for_woocommerce.docx

Product details for 5 products extracted from the word document :

Product Title (product_title)
Product Type (product_type)
Product SKU (product_sku)
Product Brand (product_brand)
Product Composition (product_composition)
Product Description (product_description)
Product Short Description (product_short_description)
Product Colors (product colors) – multiple colors are separated with a comma and a space i.e. Blue, Black
Product Sizes (product sizes) – multiple sizes are separated with a comma and a space i.e. S, M
Product Category (product_category) – multiple categories are separated with a comma and a space i.e. Product Category 1, Product Category 2
Product Regular Price (product_regular_price)
Product Sale Price (product_sale_price)

Follow on LinkedIn

Download CSV file : woocommerce_products_with_variations.csv

See Product 1 (variable), Product 2 (variable) and Product 3 (simple) after importing them temporarily with the generated CSV file to privatefashionfactory.com

Python script

# This Python script extracts product information from a Word document, processes it to create entries suitable

# for WordPress/WooCommerce (including variations), and saves the data as a CSV file you can import immediately to your eshop.

import pandas as pd

from docx import Document

import uuid # Importing uuid for unique SKU generation

def generate_unique_sku(base_name, suffix=“”):

“””Generate a unique SKU using a base name and an optional suffix.”””

unique_part = str(uuid.uuid4())[:8] # Generate a short unique identifier

return f“{base_name}–{unique_part}{suffix}“.upper()

def extract_product_info_from_word(doc_path):

doc = Document(doc_path)

products = []

product_info = {}

description_lines = []

short_description_lines = []

for para in doc.paragraphs:

line = para.text.strip()

if not line:

continue

if line.lower().startswith(“product_title”):

if product_info: # If there’s already product info, save it before starting a new one

# Join collected lines for description and short description with
tag

if description_lines:

product_info[“Product Description”] = “
“.join(description_lines)

if short_description_lines:

product_info[“Product Short Description”] = “
“.join(short_description_lines)

products.append(product_info) # Store the completed product

product_info = {}

description_lines = []

short_description_lines = []

current_key = “Product Title”

product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”

elif line.lower().startswith(“product_description”):

current_key = “Product Description”

elif line.lower().startswith(“product_type”):

current_key = “Product Type”

elif line.lower().startswith(“product_sku”):

current_key = “Product SKU”

description_lines = [] # Start collecting lines for description

elif line.lower().startswith(“product_short_description”):

current_key = “Product Short Description”

short_description_lines = [] # Start collecting lines for short description

elif line.lower().startswith(“product colors”):

current_key = “Product Colors”

product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”

elif line.lower().startswith(“product sizes”):

current_key = “Product Sizes”

elif line.lower().startswith(“product_brand”):

current_key = “Product Brand”

elif line.lower().startswith(“product_composition”):

current_key = “Product Composition”

product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”

elif line.lower().startswith(“product_category”):

current_key = “Product Category”

product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”

elif line.lower().startswith(“product_regular_price”):

current_key = “Product Regular Price”

product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”

elif line.lower().startswith(“product_sale_price”):

current_key = “Product Sale Price”

product_info[current_key] = line.split(“:”, 1)[1].strip() if “:” in line else “”

else:

if current_key == “Product Description”:

description_lines.append(line)

elif current_key == “Product Short Description”:

short_description_lines.append(line)

else:

product_info[current_key] = line

# Handle the last product in the document

if product_info:

if description_lines:

product_info[“Product Description”] = “
“.join(description_lines)

if short_description_lines:

product_info[“Product Short Description”] = “
“.join(short_description_lines)

products.append(product_info)

return products

# Corrected file paths

word_file_path = r‘c:PythonProgramscreate_woocommerce_csv_from_wordproduct_details_for_woocommerce.docx’

# Extracting the product info from the Word file

products = extract_product_info_from_word(word_file_path)

all_data = []

for product_info in products:

# Generate or retrieve SKU for parent product

base_sku = product_info.get(“Product SKU”) or generate_unique_sku(product_info.get(“Product Title”, “PROD”))

# Base parent product data

parent_data = {

“Name”: product_info.get(“Product Title”, “”),

“Description”: product_info.get(“Product Description”, “”),

“Short Description”: product_info.get(“Product Short Description”, “”),

“Categories”: product_info.get(“Product Category”, “”),

“Type”: product_info.get(“Product Type”, “”),

“Published”: “1”,

“Visibility in catalog”: “visible”,

“In stock?”: “1”,

“Allow customer reviews?”: “1”,

“Tax status”: “taxable”,

“Images”: “”,

“SKU”: base_sku,

“Parent”: “”,

“Regular price”: product_info.get(“Product Regular Price”, “”),

“Sale price”: product_info.get(“Product Sale Price”, “”),

“Weight (kg)”: “”,

“Length (cm)”: “”,

“Width (cm)”: “”,

“Height (cm)”: “”,

“Stock”: “”,

“Backorders allowed?”: “0”,

“Sold individually?”: “0”

}

# Update attribute columns only if Attribute 1 value(s) is not null

attribute_1_values = product_info.get(“Product Colors”, “”).split(“, “)

if attribute_1_values and attribute_1_values[0]:

parent_data.update({

“Attribute 1 name”: “ΧΡΩΜΑ / COLOR”,

#”Attribute 1 value(s)”: ” | “.join(attribute_1_values),

“Attribute 1 value(s)”: “, “.join(attribute_1_values),

“Attribute 1 visible”: “1”,

“Attribute 1 global”: “1”

})

# Update attribute columns only if Attribute 2 value(s) is not null

attribute_2_values = product_info.get(“Product Sizes”, “”).split(“, “)

if attribute_2_values and attribute_2_values[0]:

parent_data.update({

“Attribute 2 name”: “ΜΕΓΕΘΟΣ / SIZE”,

#”Attribute 2 value(s)”: ” | “.join(attribute_2_values),

“Attribute 2 value(s)”: “, “.join(attribute_2_values),

“Attribute 2 visible”: “1”,

“Attribute 2 global”: “1”

})

# Update attribute columns only if Attribute 3 value(s) is not null

attribute_3_values = product_info.get(“Product Brand”, “”).split(“, “)

if attribute_3_values and attribute_3_values[0]:

parent_data.update({

“Attribute 3 name”: “ΜΑΡΚΑ / BRAND”,

#”Attribute 3 value(s)”: ” | “.join(attribute_3_values),

“Attribute 3 value(s)”: “, “.join(attribute_3_values),

“Attribute 3 visible”: “1”,

“Attribute 3 global”: “1”

})

# Update attribute columns only if Attribute 4 value(s) is not null

attribute_4_values = product_info.get(“Product Composition”, “”).split(“, “)

if attribute_4_values and attribute_4_values[0]:

parent_data.update({

“Attribute 4 name”: “ΣΥΝΘΕΣΗ / COMPOSITION”,

#”Attribute 4 value(s)”: ” | “.join(attribute_4_values),

“Attribute 4 value(s)”: “, “.join(attribute_4_values),

“Attribute 4 visible”: “1”,

“Attribute 4 global”: “1”

})

# Add the parent data to the list

all_data.append(parent_data)

# Only add variations if the product type is “variable”

if product_info.get(“Product Type”, “”).lower() == “variable”:

for color in attribute_1_values:

for size in attribute_2_values:

variation_sku = generate_unique_sku(base_sku, f“-{color}–{size}“)

variation = {

“Name”: “”, # Leave blank for variations

“Description”: “”, # Leave blank for variations

“Short Description”: “”, # Leave blank for variations

“Categories”: “”, # Leave blank for variations

“Attribute 1 name”: “ΧΡΩΜΑ / COLOR”,

“Attribute 1 value(s)”: color,

“Attribute 1 visible”: “1”,

“Attribute 1 global”: “1”,

“Attribute 2 name”: “ΜΕΓΕΘΟΣ / SIZE”,

“Attribute 2 value(s)”: size,

“Attribute 2 visible”: “1”,

“Attribute 2 global”: “1”,

“Type”: “variation”,

“Published”: “1”,

“Visibility in catalog”: “visible”,

“In stock?”: “1”,

“Allow customer reviews?”: “1”,

“Tax status”: “taxable”,

“Images”: “”,

“SKU”: variation_sku,

“Parent”: base_sku,

“Regular price”: product_info.get(“Product Regular Price”),

“Sale price”: product_info.get(“Product Sale Price”),

“Weight (kg)”: “”,

“Length (cm)”: “”,

“Width (cm)”: “”,

“Height (cm)”: “”,

“Stock”: “”,

“Backorders allowed?”: “0”,

“Sold individually?”: “0”

}

all_data.append(variation)

# Create DataFrame directly from list of dictionaries

df = pd.DataFrame(all_data)

# Path for saving the CSV file

csv_file_path_with_variations = r‘c:PythonProgramscreate_woocommerce_csv_from_wordwoocommerce_products_with_variations.csv’

df.to_csv(csv_file_path_with_variations, index=False)

csv_file_path_with_variations

About Us

Contact Info

A Python script that extracts product information from a Word document, processes it to create entries suitable for WordPress/WooCommerce (including variations), and saves the data as a CSV file

30,055

Useful Links

Office Hours

©2026, Jobspace. All Rights Reserved.