How to work with DOCX in Python

In this tutorial, we will be working with DOCX file in Python to manipulate the text in the existing document and save as a new DOCX file. This type of technique will be useful if you need to issue multiple letters/reports but need to change few text in the document. Let’s dive in further.

Package needed in this tutorial

  • python-docx

You need to have python-docx package installed in your system or in your virtualenv. (Refer to this post regarding virtualenv)

python-docx installation

Issue the below command in the terminal.

pip install python-docx

Please refer to this page for more details regarding the installation of python-docx package.

Once the package is installed, you should be seeing python-docx in the pip list something like below:

python-docx

Working with DOCX in Python

For the basic usage of this package, please check the python-docx package website.

In this tutorial, we will be opening an existing docx file as a template, replacing some words in the document and save it as a new docx file.

Loading the DOCX file

Use Document() to initiate the document to work with. For example, we will be using "recognition_letter_template.docx" file as a file to work with in this tutorial. So it will be loaded as:

document = Document('recognition_letter_template.docx')

Document Paragraph

Document consists of paragraphs.Thus, if you need to find a specific word (text), you need to go through the paragraphs in the document object. You need to use the for loop to go through the paragraphs in the document object and look for specific text within that paragraph.

Below is the example of going in the document.paragraphs using for loop.

for paragraph in document.paragraphs:

Find and Replace Text

While you are in the loop, next, you need to check for the “keyword” you want to replace with. Below is the function to check if there is a keyword that you are looking for and if it is, it will replace with the other text you want to replace with.

def find_replace(paragraph_keyword, draft_keyword, paragraph):
    if paragraph_keyword in paragraph.text:
        # print("found")
        paragraph.text = paragraph.text.replace(paragraph_keyword, draft_keyword)

Saving the docx file

Once the texts are replaced, time to save it. In this tutorial, there is a predefined file to be used for saving the file. Pass that variable to document.save() and the script will generate a file with the specified filename for you.

document.save(save_filename)

You want learn more about Python? Check out the books below at Amazon.

Sample Script

Below is the sample script in this tutorial. You can also download the sample code and sample docx file from here.

Latest Posts

Feel free to share this post!

Scroll to Top