In this tutorial, we will be working with DOCX file in Python to manipulate the text in the existing document and save as a new DOCX file. This type of technique will be useful if you need to issue multiple letters/reports but need to change few text in the document. Let’s dive in further.
Table of Contents
Package needed in this tutorial
- python-docx
You need to have python-docx package installed in your system or in your virtualenv
. (Refer to this post regarding virtualenv
)
python-docx installation
Issue the below command in the terminal.
pip install python-docx
Please refer to this page for more details regarding the installation of python-docx package.
Once the package is installed, you should be seeing python-docx
in the pip list
something like below:
Working with DOCX in Python
For the basic usage of this package, please check the python-docx package website.
In this tutorial, we will be opening an existing docx file as a template, replacing some words in the document and save it as a new docx file.
Loading the DOCX file
Use Document()
to initiate the document to work with. For example, we will be using "recognition_letter_template.docx"
file as a file to work with in this tutorial. So it will be loaded as:
document = Document('recognition_letter_template.docx')
Document Paragraph
Document consists of paragraphs.Thus, if you need to find a specific word (text), you need to go through the paragraphs in the document object. You need to use the for loop
to go through the paragraphs in the document object and look for specific text within that paragraph.
Below is the example of going in the document.paragraphs using for loop
.
for paragraph in document.paragraphs:
Find and Replace Text
While you are in the loop, next, you need to check for the “keyword
” you want to replace with. Below is the function to check if there is a keyword that you are looking for and if it is, it will replace with the other text you want to replace with.
def find_replace(paragraph_keyword, draft_keyword, paragraph):
if paragraph_keyword in paragraph.text:
# print("found")
paragraph.text = paragraph.text.replace(paragraph_keyword, draft_keyword)
Saving the docx file
Once the texts are replaced, time to save it. In this tutorial, there is a predefined file to be used for saving the file. Pass that variable to document.save()
and the script will generate a file with the specified filename for you.
document.save(save_filename)
You want learn more about Python? Check out the books below at Amazon.
Sample Script
Below is the sample script in this tutorial. You can also download the sample code and sample docx file from here.
Latest Posts
- How to convert MD (markdown) file to PDF using Pandoc on macOS Ventura 13
- How to make MD (markdown) document
- How to Install Docker Desktop on mac M1 chip (Apple chip) macOS 12 Monterey
- How to install MySQL Workbench on macOS 12 Monterey mac M1 (2021)
- How to install MySQL Community Server on macOS 12 Monterey (2021)