Extracting images from Excel spreadsheets

Category: Computers Category: MS Office

You have images stuck inside a Microsoft Excel spreadsheet.

You need to save them to your hard drive.

Your problem is:

  • There are multiple images. They’re on different worksheets.
  • The pictures need to be named after the worksheet they were on.
  • There might be multiple images on each worksheet.

Use this Python code. It interfaces with Microsoft Excel (so you’ll need to have Excel installed - I have Excel 2010.) You will also need the Python packages pywin32 and PIL.

  • It will save all the images in the Excel workbook to the same folder as the workbook.
  • The images will be saved as JPEG files.
  • The images will be named after the worksheet they were on: Sheet1.jpg, Sheet2.jpg, and so on.
  • If there was more than one image on a worksheet, the images will be numbered: Sheet1.jpg, Sheet1_001.jpg, Sheet1_002.jpg, and so on.

Limitations: When Excel copies an image to the clipboard, it appears to use a fixed DPI. So the resolution of the image may be decreased.

Alternate approaches if this doesn’t suit you:

  1. Save the Excel file to HTML format; all the images drop out as files with names like image001.png.

  2. Dive into the Excel file; Excel 2007 xlsx files are just zip files inside. The images are stored in their original format (JPG or PNG) and original size. The images are helpfully named image112.jpg and so on.

import win32com.client       # Need pywin32 from pip
from PIL import ImageGrab    # Need PIL as well
import os

excel = win32com.client.Dispatch("Excel.Application")
workbook = excel.ActiveWorkbook

wb_folder = workbook.Path
wb_name = workbook.Name
wb_path = os.path.join(wb_folder, wb_name)

print "Extracting images from %s" % wb_path

image_no = 0

for sheet in workbook.Worksheets:
    for n, shape in enumerate(sheet.Shapes):
        if shape.Name.startswith("Picture"):
            # Some debug output for console
            image_no += 1
            print "---- Image No. %07i ----" % image_no

            # Sequence number the pictures, if there's more than one
            num = "" if n == 0 else "_%03i" % n

            filename = sheet.Name + num + ".jpg"
            file_path = os.path.join (wb_folder, filename)

            print "Saving as %s" % file_path    # Debug output

            shape.Copy() # Copies from Excel to Windows clipboard

            # Use PIL (python imaging library) to save from Windows clipboard
            # to a file
            image = ImageGrab.grabclipboard()
            image.save(file_path,'jpeg')