Back to all AI Reports posts

How to Automate Mergers and Acquisitions Reports: A Guide for Developers

May 20, 2024

If you’re a developer, especially for an automated platform company, this guide is for you. It walks you through a script we created to generate mergers and acquisitions reports automatically. Creating these reports manually takes a lot of time and energy — something few developers have. However, you can automate the report generation process with AI and some scripting. You can find links to download the script file and related materials at the bottom of this guide.

What you’ll need to run the script

News Data — You should obtain news data from a reliable source. For this guide, we’re getting the data from the Webz.io News API. It provides structured news data feeds in 170+ languages from millions of news sites.

You need an API key to use the Webz.io News API, and you can get one by contacting Webz.io.

OpenAI API — You’ll use OpenAI’s API to leverage the GPT-4 and DALL·E models. GPT-4 analyzes and summarizes the text from customer reviews, while DALL·E generates a main image for the report.
You also need an API key for the OpenAI API. Create an account or sign in at OpenAI to get a key. OpenAI uses pay-per-use pricing for its language and image models. You can see the price points on the OpenAI website.
Python — We’re using Python to automate the report creation process. You’ll need to ensure you can run Python code on your machine.

How Webz.io and ChatGPT automate your report

Automating mergers and acquisitions reports: script breakdown

The script fetches news articles from the Webz.io News API. Next, the script calls the OpenAI API, using its text model to analyze the articles for content related to mergers and acquisitions. It then creates a structured report for each article, outputting its findings into a Word document. The script uses the OpenAI image model to generate a main image for the report. The script combines all these elements into a completed report contained in the Word document.

Get Started with News API – For Free

Here is the detailed breakdown of the script:

Import packages and modules

First, the script imports the Python packages and modules, the OpenAI Python API library, and other necessary components.

import docx
import requests
from openai import OpenAI
import os
import openai
from Levenshtein import ratio
from docx.shared import Pt
from bs4 import BeautifulSoup
import io
from docx.oxml.shared import OxmlElement, qn
from docx.shared import Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH

import docx

import requests

from openai import OpenAI

import os

import openai

from Levenshtein import ratio

from docx.shared import Pt

from bs4 import BeautifulSoup

import io

from docx.oxml.shared import OxmlElement, qn

from docx.shared import Pt

from docx.enum.text import WD_ALIGN_PARAGRAPH

Set global variable and access API keys

Next the script accesses the API keys (Webz.io and OpenAI) through the development environment. It also includes a global variable where you can set the number of articles the script will generate a report on.

WEBZ_API_KEY = os.getenv("WEBZ_API_KEY")
openai.api_key = os.getenv("OPENAI_API_KEY")
NUM_OF_REPORTS = 5  # Set the number of articles the script will generate a report on.
client = OpenAI()

WEBZ_API_KEY = os.getenv("WEBZ_API_KEY")

openai.api_key = os.getenv("OPENAI_API_KEY")

NUM_OF_REPORTS = 5 # Set the number of articles the script will generate a report on.

client = OpenAI()

Orchestrate entire process (main)

Towards the end of the script, you’ll see the “main” function. It orchestrates the entire process — from reading the news articles and generating reports for each one to creating a main report image to finalizing the Word document.

def main():

    image_url = generate_article_image()
    filtered_articles = get_unique_posts_from_webz("""category:"Economy, Business and Finance" num_chars:>1000  language:english published:>now-7d social.facebook.likes:>0 title:acquire""")
    reports = generate_reports(filtered_articles)
    intro = generate_intro(reports)
    title_text = generate_title(intro)
    create_word_doc("M&A digest.docx", title_text, image_url, intro, reports)


if __name__ == "__main__":
    main()

def main():

image_url = generate_article_image()

filtered_articles = get_unique_posts_from_webz("""category:"Economy, Business and Finance" num_chars:>1000 language:english published:>now-7d social.facebook.likes:>0 title:acquire""")

reports = generate_reports(filtered_articles)

intro = generate_intro(reports)

title_text = generate_title(intro)

create_word_doc("M&A digest.docx", title_text, image_url, intro, reports)

if __name__ == "__main__":

main()

Define functions

Now we define the different functions of our script:

Fetch news articles from Webz.io (def fetch_articles)

Makes a call to the Webz.io News API to get articles with content related to mergers and acquisitions.

# Function to get news articles from Webz.io API
def fetch_articles(query, api_key, total):
    endpoint = f"https://api.webz.io/filterWebContent?token={api_key}&format=json&q={query}&size=100&ts=0"
    all_posts = []

    while total > 0:
        response = requests.get(endpoint)
        data = response.json()

        posts = data["posts"]

        if len(posts) == 0:
            break

        all_posts.extend(posts)

        total -= len(posts)

        if total > 0 and "next" in data:
            endpoint = f"https://api.webz.io{data['next']}"
        else:
            break

    articles = []
    for article in all_posts:
        article = {'title': article["title"],
                   'text': trim_string(trim_title(article["title"]) + "\n\n" + article["text"], 10000),
                   'link': article['url'],
                   'published': article['published']}

        articles.append(article)

    return articles

# Function to get news articles from Webz.io API

def fetch_articles(query, api_key, total):

endpoint = f"https://api.webz.io/filterWebContent?token={api_key}&format=json&q={query}&size=100&ts=0"

all_posts = []

while total > 0:

response = requests.get(endpoint)

data = response.json()

posts = data["posts"]

if len(posts) == 0:

break

all_posts.extend(posts)

total -= len(posts)

if total > 0 and "next" in data:

endpoint = f"https://api.webz.io{data['next']}"

else:

break

articles = []

for article in all_posts:

article = {'title': article["title"],

'text': trim_string(trim_title(article["title"]) + "\n\n" + article["text"], 10000),

'link': article['url'],

'published': article['published']}

articles.append(article)

return articles

Remove duplicate posts (get_unique_posts_from_webz)

Removes duplicate posts in case they exist.

def get_unique_posts_from_webz(query):
    print("Fetch posts from Webz.io")
    articles = fetch_articles(query, WEBZ_API_KEY, 100)
    filtered_articles = remove_similar_strings(articles)
    return filtered_articles

def get_unique_posts_from_webz(query):

print("Fetch posts from Webz.io")

articles = fetch_articles(query, WEBZ_API_KEY, 100)

filtered_articles = remove_similar_strings(articles)

return filtered_articles

Look for similar strings (are_similar)

Checks if two strings are similar based on the Levenshtein ratio.

def are_similar(str1, str2, threshold=1):
    """
    Check if two strings are similar based on Levenshtein ratio.
    """
    return ratio(str1, str2) > threshold

def are_similar(str1, str2, threshold=1):

"""

Check if two strings are similar based on Levenshtein ratio.

"""

return ratio(str1, str2) > threshold

Remove similar articles (remove_similar_strings)

Eliminates duplicate content by removing articles based on similar strings.

def remove_similar_strings(articles):
    unique_articles = []
    for article in articles:
        if not any(are_similar(article['text'], existing['text'], 0.7) for existing in unique_articles):
            unique_articles.append(article)
    return unique_articles

def remove_similar_strings(articles):

unique_articles = []

for article in articles:

if not any(are_similar(article['text'], existing['text'], 0.7) for existing in unique_articles):

unique_articles.append(article)

return unique_articles

Trim string (trim_string)

Makes sure a string isn’t longer than max_length .

def trim_string(string, max_length):
    if len(string) > max_length:
        return string[:max_length]
    else:
        return string

def trim_string(string, max_length):

if len(string) > max_length:

return string[:max_length]

else:

return string

Trim article title (trim_title)

Removes irrelevant text from the title. For example, if the real title includes “- CNN News” the script will remove this phrase.

def trim_title(input_string):
    words = input_string.split()

    if "|" in input_string:
        return input_string.split("|")[0]

    last_dash_index = input_string.rfind("-")
    if last_dash_index != -1:
        right_of_dash = input_string[last_dash_index + 1:]
        right_words = right_of_dash.split()
        if len(right_words) <= 3 and len(words) > 10:
            return input_string[:last_dash_index]

    return input_string

def trim_title(input_string):

words = input_string.split()

if "|" in input_string:

return input_string.split("|")[0]

last_dash_index = input_string.rfind("-")

if last_dash_index != -1:

right_of_dash = input_string[last_dash_index + 1:]

right_words = right_of_dash.split()

if len(right_words) <= 3 and len(words) > 10:

return input_string[:last_dash_index]

return input_string

Send prompt (call_gpt_completion)

Sends a prompt to the GPT-4 model and receives a response.

def call_gpt_completion(prompt):
    return client.chat.completions.create(
        model="gpt-4-1106-preview",
        max_tokens=4096,
        messages=[
            {"role": "user", "content": prompt},
        ]
    )

def call_gpt_completion(prompt):

return client.chat.completions.create(

model="gpt-4-1106-preview",

max_tokens=4096,

messages=[

{"role": "user", "content": prompt},

]

)

Review articles and generate reports (generate_reports)

Goes through articles pulled from the Webz.io News API to determine if any contain content related to mergers and acquisitions. If an article does mention a merger or acquisition, the script generates a report for it.

def generate_reports(filtered_articles):

    print("Generating Reports")

    reports = []

    for article in filtered_articles:

        print(f"Creating report about: {article['title']}")

        prompt = f"""Carefully review the following news article between the [] brackets and determine if there is an explicit discussion about an M&A deal emerging from its content. The article is as follows:

                [ 
                 {article['text']}
                ]
                If the article explicitly mentions or clearly discuss an M&A (merger & acquisition) deal, generate a detailed report in HTML format. Use <B> tags to highlight the titles of each section and <UL> and <LI> tags for listing items. The report should include the following sections:

                    <HTML>
                    <B>Executive Summary</B>
                    <UL><LI>Summarize the main points of the M&A deal: companies involved, deal size, key dates, high-level analysis of the deal's rationale and expected outcomes.</LI></UL>
                    
                    <B>Introduction</B>
                    <UL><LI>Provide a background information about the companies involved, industry context and market conditions leading up to the M&A.</LI></UL>
                    
                    <B>Details of the Deal</B>
                    <UL><LI>Provide a description of the M&A transaction: type, structure, financial terms, timeline, information on deal financing.</LI></UL>
                    
                    <B>Strategic Rationale</B>
                    <UL><LI>Provide a strategic reasons behind the M&A, expected benefits for both companies.</LI></UL>
                    
                    <B>Market Reaction and Analysis</B>
                    <UL><LI>Discuss the market reaction to the announcement, comparison with similar industry deals.</LI></UL>
                    
                    <B>Regulatory and Legal Considerations</B>
                    <UL><LI>Mention any regulatory approvals and antitrust concerns, legal implications and compliance requirements.</LI></UL>
                    
                    <B>Risk Analysis</B>
                    <UL><LI>Discuss any potential risks associated with the deal, risk mitigation strategies.</LI></UL>
                    
                    <B>Financial Analysis</B>
                    <UL><LI>Review the financial metrics and performance indicators, impact on financial statements.</LI></UL>
                    
                    <B>Industry and Competitor Impact</B>
                    <UL><LI>Discuss the effect on the broader industry and market dynamics, impact on competitors.</LI></UL>
                    
                    <B>Conclusion and Recommendations</B>
                    <UL><LI>Provide a summary of key findings and insights, recommendations for stakeholders.</LI></UL>
                   
                    
                    </HTML>
                    
                    If the article does not explicitly mention or discuss an M&A deal, please respond with: can't produce report.
                
            """

        try:
            response = call_gpt_completion(prompt)
            report = {'text': ''}

            for choice in response.choices:
                report['text'] += choice.message.content

            if "Executive Summary" in report['text']:
                report['link'] = article['link']
                report['title'] = article['title']
                report['published'] = article['published']
                reports.append(report)
                print(f"Created a report about: {article['title']}")
            else:
                print(f"Can't product report for: {article['title']}")

            if len(reports) == NUM_OF_REPORTS:
                break


        except Exception as e:
            print("An error occurred:", str(e))

    return reports

def generate_reports(filtered_articles):

print("Generating Reports")

reports = []

for article in filtered_articles:

print(f"Creating report about: {article['title']}")

prompt = f"""Carefully review the following news article between the [] brackets and determine if there is an explicit discussion about an M&A deal emerging from its content. The article is as follows:

[

{article['text']}

]

If the article explicitly mentions or clearly discuss an M&A (merger & acquisition) deal, generate a detailed report in HTML format. Use tags to highlight the titles of each section and <UL> and <LI> tags for listing items. The report should include the following sections:

<HTML>

Executive Summary

<UL><LI>Summarize the main points of the M&A deal: companies involved, deal size, key dates, high-level analysis of the deal's rationale and expected outcomes.</LI></UL>

Introduction

<UL><LI>Provide a background information about the companies involved, industry context and market conditions leading up to the M&A.</LI></UL>

Details of the Deal

<UL><LI>Provide a description of the M&A transaction: type, structure, financial terms, timeline, information on deal financing.</LI></UL>

Strategic Rationale

<UL><LI>Provide a strategic reasons behind the M&A, expected benefits for both companies.</LI></UL>

Market Reaction and Analysis

<UL><LI>Discuss the market reaction to the announcement, comparison with similar industry deals.</LI></UL>

Regulatory and Legal Considerations

<UL><LI>Mention any regulatory approvals and antitrust concerns, legal implications and compliance requirements.</LI></UL>

Risk Analysis

<UL><LI>Discuss any potential risks associated with the deal, risk mitigation strategies.</LI></UL>

Financial Analysis

<UL><LI>Review the financial metrics and performance indicators, impact on financial statements.</LI></UL>

Industry and Competitor Impact

<UL><LI>Discuss the effect on the broader industry and market dynamics, impact on competitors.</LI></UL>

Conclusion and Recommendations

<UL><LI>Provide a summary of key findings and insights, recommendations for stakeholders.</LI></UL>

</HTML>

If the article does not explicitly mention or discuss an M&A deal, please respond with: can't produce report.

"""

try:

response = call_gpt_completion(prompt)

report = {'text': ''}

for choice in response.choices:

report['text'] += choice.message.content

if "Executive Summary" in report['text']:

report['link'] = article['link']

report['title'] = article['title']

report['published'] = article['published']

reports.append(report)

print(f"Created a report about: {article['title']}")

else:

print(f"Can't product report for: {article['title']}")

if len(reports) == NUM_OF_REPORTS:

break

except Exception as e:

print("An error occurred:", str(e))

return reports

Generate title (generate_title)

Creates a title for the mergers and acquisitions report.

def generate_title(intro):
    print("Creating a title")

    prompt = "Create a title using the following text as a context:\n" + intro
    title_text = ""
    try:
        response = call_gpt_completion(prompt)

        for choice in response.choices:
            title_text += choice.message.content
    except Exception as e:
        print("An error occurred:", str(e))

    title_text = title_text.strip(" ").strip('\"')
    if title_text.startswith("Title:"):  # Sometimes ChatGPT prefix the title with Title:
        return title_text[len("Title:"):]

    return title_text

def generate_title(intro):

print("Creating a title")

prompt = "Create a title using the following text as a context:\n" + intro

title_text = ""

try:

response = call_gpt_completion(prompt)

for choice in response.choices:

title_text += choice.message.content

except Exception as e:

print("An error occurred:", str(e))

title_text = title_text.strip(" ").strip('\"')

if title_text.startswith("Title:"): # Sometimes ChatGPT prefix the title with Title:

return title_text[len("Title:"):]

return title_text

Generate introduction (generate_intro)

Generates an introductory paragraph for the report.

def generate_intro(reports):
    print("Generate post intro")

    prompt = """
        Write a paragraph introducing a digest that contains M&A reports about the following titles, don't elaborate on these titles:
        []

        The reports are created automatically by using Webz.io news api and ChatGPT. The reports are generated by calling the Webz.io news API for news articles categorized as "Economy, Business and Finance" related to M&A deals. The matching news articles are then run through a ChatGPT prompt to analyze if the article is indeed about an M&A deal. If so it create a structured report. 
        """
    prompt = insert_titles_in_text(prompt, reports)
    intro = ""
    try:
        response = call_gpt_completion(prompt)

        for choice in response.choices:
            intro += choice.message.content
    except Exception as e:
        print("An error occurred:", str(e))

    return intro

def generate_intro(reports):

print("Generate post intro")

prompt = """

Write a paragraph introducing a digest that contains M&A reports about the following titles, don't elaborate on these titles:

[]

The reports are created automatically by using Webz.io news api and ChatGPT. The reports are generated by calling the Webz.io news API for news articles categorized as "Economy, Business and Finance" related to M&A deals. The matching news articles are then run through a ChatGPT prompt to analyze if the article is indeed about an M&A deal. If so it create a structured report.

"""

prompt = insert_titles_in_text(prompt, reports)

intro = ""

try:

response = call_gpt_completion(prompt)

for choice in response.choices:

intro += choice.message.content

except Exception as e:

print("An error occurred:", str(e))

return intro

HTML to formatted text (html_to_word)

Converts HTML content to formatted text in a Word document. It handles bold text and bullet lists.

def html_to_word(doc, html_content):
    soup = BeautifulSoup(html_content, 'html.parser')

    for element in soup.find_all(['b', 'ul']):
        if element.name == 'b':
            # Add bold text as a heading
            doc.add_paragraph(element.get_text(), style='Heading 2')
        elif element.name == 'ul':
            for item in element.find_all('li'):
                # Add list items
                doc.add_paragraph(item.get_text(), style='List Bullet')

def html_to_word(doc, html_content):

soup = BeautifulSoup(html_content, 'html.parser')

for element in soup.find_all(['b', 'ul']):

if element.name == 'b':

# Add bold text as a heading

doc.add_paragraph(element.get_text(), style='Heading 2')

elif element.name == 'ul':

for item in element.find_all('li'):

# Add list items

doc.add_paragraph(item.get_text(), style='List Bullet')

Add hyperlink (add_hyperlink)

Inserts a hyperlink into a Word document paragraph.

def add_hyperlink(paragraph, url, text):
    """
    Add a hyperlink to a paragraph.
    """
    part = paragraph.part
    r_id = part.relate_to(url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True)

    hyperlink = OxmlElement('w:hyperlink')
    hyperlink.set(qn('r:id'), r_id, )

    new_run = OxmlElement('w:r')
    rPr = OxmlElement('w:rPr')

    u = OxmlElement('w:u')
    u.set(qn('w:val'), 'single')
    rPr.append(u)

    u = OxmlElement('w:u')
    u.set(qn('w:val'), 'single')
    rPr.append(u)

    new_run.append(rPr)
    new_run.text = text
    hyperlink.append(new_run)

    paragraph._p.append(hyperlink)

    return hyperlink

def add_hyperlink(paragraph, url, text):

"""

Add a hyperlink to a paragraph.

"""

part = paragraph.part

r_id = part.relate_to(url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True)

hyperlink = OxmlElement('w:hyperlink')

hyperlink.set(qn('r:id'), r_id, )

new_run = OxmlElement('w:r')

rPr = OxmlElement('w:rPr')

u = OxmlElement('w:u')

u.set(qn('w:val'), 'single')

rPr.append(u)

u = OxmlElement('w:u')

u.set(qn('w:val'), 'single')

rPr.append(u)

new_run.append(rPr)

new_run.text = text

hyperlink.append(new_run)

paragraph._p.append(hyperlink)

return hyperlink

Add title placeholder (insert_titles_in_text)

Adds a placeholder for inserting the titles.

def insert_titles_in_text(text, reports):
    # Placeholder for inserting the titles
    placeholder = "[]"

    # Extracting the titles from the reports and formatting them with new lines
    titles = "\n".join([report['title'] for report in reports])

    # Replacing the placeholder with the titles
    updated_text = text.replace(placeholder, titles)

    return updated_text

def insert_titles_in_text(text, reports):

# Placeholder for inserting the titles

placeholder = "[]"

# Extracting the titles from the reports and formatting them with new lines

titles = "\n".join([report['title'] for report in reports])

# Replacing the placeholder with the titles

updated_text = text.replace(placeholder, titles)

return updated_text

Generate image (generate_article_image)

Generates a main image for the report using OpenAI’s DALL-E model with a specific prompt.

def generate_article_image():
    print("Generating post image")
    image_url = ""
    try:
        response = client.images.generate(
            model="dall-e-3",
            prompt="Create an image for a cover of a Mergers and Acquisitions report. The scene is set in a modern, spacious corporate boardroom with a large glass table. On the table, neatly arranged, are documents, pens, and digital tablets showing graphs and financial data. In the foreground, two pairs of hands, one male and one female, are in the middle of a handshake, symbolizing a successful deal. In the background, through a large window, we see a city skyline with skyscrapers, implying a high-powered business environment. The color scheme should be a mix of cool blues and warm golds to give a sense of professionalism and success. Incorporate subtle M&A related icons like pie charts, upward arrows, and company logos in a tasteful manner. Ensure the image is suitable for a formal report, with a focus on clarity and corporate aesthetics.",
            n=1,
            size="1024x1024"
        )
        image_url = response.data[0].url

    except Exception as e:
        print("An error occurred generating the image:", str(e))

    return image_url

def generate_article_image():

print("Generating post image")

image_url = ""

try:

response = client.images.generate(

model="dall-e-3",

prompt="Create an image for a cover of a Mergers and Acquisitions report. The scene is set in a modern, spacious corporate boardroom with a large glass table. On the table, neatly arranged, are documents, pens, and digital tablets showing graphs and financial data. In the foreground, two pairs of hands, one male and one female, are in the middle of a handshake, symbolizing a successful deal. In the background, through a large window, we see a city skyline with skyscrapers, implying a high-powered business environment. The color scheme should be a mix of cool blues and warm golds to give a sense of professionalism and success. Incorporate subtle M&A related icons like pie charts, upward arrows, and company logos in a tasteful manner. Ensure the image is suitable for a formal report, with a focus on clarity and corporate aesthetics.",

n=1,

size="1024x1024"

)

image_url = response.data[0].url

except Exception as e:

print("An error occurred generating the image:", str(e))

return image_url

Download image (add_image_from_base64)

Downloads an image from a URL and adds it to a Word document.

def add_image_from_base64(doc, image_url):
    response = requests.get(image_url)

    # Check if the request was successful
    if response.status_code == 200:
        image_stream = io.BytesIO(response.content)
        doc.add_picture(image_stream, width=docx.shared.Inches(6))
    else:
        print(f"Failed to download image. Status code: {response.status_code}")

def add_image_from_base64(doc, image_url):

response = requests.get(image_url)

# Check if the request was successful

if response.status_code == 200:

image_stream = io.BytesIO(response.content)

doc.add_picture(image_stream, width=docx.shared.Inches(6))

else:

print(f"Failed to download image. Status code: {response.status_code}")

Create Word doc (create_word_doc)

Assembles the various components into a formatted Word document.

def create_word_doc(file_name, title_text, image_url, intro, reports):
    print("Saving to word document")
    doc = docx.Document()

    # Add a title
    title = doc.add_paragraph()
    title.style = 'Title'
    title_run = title.add_run(title_text)
    title_run.font.size = Pt(24)  # Set the font size
    title_run.font.name = 'Arial (Body)'  # Set the font
    title.alignment = WD_ALIGN_PARAGRAPH.CENTER  # Center align the title

    if len(image_url) > 0:
        add_image_from_base64(doc, image_url)
    doc.add_paragraph(intro)

    # Add each report
    for report in reports:
        p = doc.add_paragraph(style='Heading 1')
        add_hyperlink(p, report['link'], report['title'])
        doc.add_paragraph(f"Published on: {report['published']}")
        html_to_word(doc, report['text'])

    doc.add_paragraph("""

    """)

    for paragraph in doc.paragraphs:
        for run in paragraph.runs:
            run.font.name = 'Arial (Body)'

    # Save the document
    doc.save(file_name)

def create_word_doc(file_name, title_text, image_url, intro, reports):

print("Saving to word document")

doc = docx.Document()

# Add a title

title = doc.add_paragraph()

title.style = 'Title'

title_run = title.add_run(title_text)

title_run.font.size = Pt(24) # Set the font size

title_run.font.name = 'Arial (Body)' # Set the font

title.alignment = WD_ALIGN_PARAGRAPH.CENTER # Center align the title

if len(image_url) > 0:

add_image_from_base64(doc, image_url)

doc.add_paragraph(intro)

# Add each report

for report in reports:

p = doc.add_paragraph(style='Heading 1')

add_hyperlink(p, report['link'], report['title'])

doc.add_paragraph(f"Published on: {report['published']}")

html_to_word(doc, report['text'])

doc.add_paragraph("""

""")

for paragraph in doc.paragraphs:

for run in paragraph.runs:

run.font.name = 'Arial (Body)'

# Save the document

doc.save(file_name)

AI + Python = a powerful automation tool

This script demonstrates an advanced use case of integrating AI-powered analysis and content generation with document automation in Python. It’s a comprehensive example of how combining various Python libraries with AI models can produce a powerful automation tool.

Download the example code and report:

The full Python script.

import docx
import requests
from openai import OpenAI
import os
import openai
from Levenshtein import ratio
from docx.shared import Pt
from bs4 import BeautifulSoup
import io
from docx.oxml.shared import OxmlElement, qn
from docx.shared import Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH


WEBZ_API_KEY = os.getenv("WEBZ_API_KEY")
openai.api_key = os.getenv("OPENAI_API_KEY")
NUM_OF_REPORTS = 5

client = OpenAI()

def are_similar(str1, str2, threshold=1):
    """
    Check if two strings are similar based on Levenshtein ratio.
    """
    return ratio(str1, str2) > threshold

def remove_similar_strings(articles):
    unique_articles = []
    for article in articles:
        if not any(are_similar(article['text'], existing['text'], 0.7) for existing in unique_articles):
            unique_articles.append(article)
    return unique_articles


def trim_string(string, max_length):

    if len(string) > max_length:
        return string[:max_length]
    else:
        return string

# Function to get news articles from Webz.io API
def fetch_articles(query, api_key, total):
    endpoint = f"https://api.webz.io/filterWebContent?token={api_key}&format=json&q={query}&size=100&ts=0"
    all_posts = []

    while total > 0:
        response = requests.get(endpoint)
        data = response.json()

        posts = data["posts"]

        if len(posts) == 0:
            break

        all_posts.extend(posts)

        total -= len(posts)

        if total > 0 and "next" in data:
            endpoint = f"https://api.webz.io{data['next']}"
        else:
            break



    articles = []
    for article in all_posts:
        article = {'title': article["title"],
                'text': trim_string(trim_title(article["title"]) + "\n\n" + article["text"], 10000),
                'link': article['url'],
                'published': article['published']}

        articles.append(article)

    return articles

def trim_title(input_string):
    words = input_string.split()

    if "|" in input_string:
        return input_string.split("|")[0]

    last_dash_index = input_string.rfind("-")
    if last_dash_index != -1:
        right_of_dash = input_string[last_dash_index + 1:]
        right_words = right_of_dash.split()
        if len(right_words) <= 3 and len(words) > 10:
            return input_string[:last_dash_index]

    return input_string



def add_image_from_base64(doc, image_url):
    response = requests.get(image_url)

    # Check if the request was successful
    if response.status_code == 200:
        image_stream = io.BytesIO(response.content)
        doc.add_picture(image_stream, width=docx.shared.Inches(6))
    else:
        print(f"Failed to download image. Status code: {response.status_code}")


def html_to_word(doc, html_content):
    soup = BeautifulSoup(html_content, 'html.parser')

    for element in soup.find_all(['b', 'ul']):
        if element.name == 'b':
            # Add bold text as a heading
            doc.add_paragraph(element.get_text(), style='Heading 2')
        elif element.name == 'ul':
            for item in element.find_all('li'):
                # Add list items
                doc.add_paragraph(item.get_text(), style='List Bullet')



def add_hyperlink(paragraph, url, text):
    """
    Add a hyperlink to a paragraph.
    """
    part = paragraph.part
    r_id = part.relate_to(url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True)

    hyperlink = OxmlElement('w:hyperlink')
    hyperlink.set(qn('r:id'), r_id,)

    new_run = OxmlElement('w:r')
    rPr = OxmlElement('w:rPr')

    u = OxmlElement('w:u')
    u.set(qn('w:val'), 'single')
    rPr.append(u)

    u = OxmlElement('w:u')
    u.set(qn('w:val'), 'single')
    rPr.append(u)

    new_run.append(rPr)
    new_run.text = text
    hyperlink.append(new_run)

    paragraph._p.append(hyperlink)

    return hyperlink


def insert_titles_in_text(text, reports):
    # Placeholder for inserting the titles
    placeholder = "[]"

    # Extracting the titles from the reports and formatting them with new lines
    titles = "\n".join([report['title'] for report in reports])

    # Replacing the placeholder with the titles
    updated_text = text.replace(placeholder, titles)

    return updated_text

def generate_article_image():
    print("Generating post image")
    image_url = ""
    try:
        response = client.images.generate(
            model="dall-e-3",
            prompt="Create an image for a cover of a Mergers and Acquisitions report. The scene is set in a modern, spacious corporate boardroom with a large glass table. On the table, neatly arranged, are documents, pens, and digital tablets showing graphs and financial data. In the foreground, two pairs of hands, one male and one female, are in the middle of a handshake, symbolizing a successful deal. In the background, through a large window, we see a city skyline with skyscrapers, implying a high-powered business environment. The color scheme should be a mix of cool blues and warm golds to give a sense of professionalism and success. Incorporate subtle M&A related icons like pie charts, upward arrows, and company logos in a tasteful manner. Ensure the image is suitable for a formal report, with a focus on clarity and corporate aesthetics.",
            n=1,
            size="1024x1024"
        )
        image_url = response.data[0].url



    except Exception as e:
        print("An error occurred generating the image:", str(e))

    return image_url

def get_unique_posts_from_webz(query):
    print("Fetch posts from Webz.io")
    articles = fetch_articles(query, WEBZ_API_KEY, 100)
    filtered_articles = remove_similar_strings(articles)
    return filtered_articles


def call_gpt_completion(prompt):
    return client.chat.completions.create(
        model="gpt-4-1106-preview",
        max_tokens=4096,
        messages=[
            {"role": "user", "content": prompt},
        ]
    )


def generate_reports(filtered_articles):

    print("Generating Reports")

    reports = []

    for article in filtered_articles:

        print(f"Creating report about: {article['title']}")

        prompt = f"""Carefully review the following news article between the [] brackets and determine if there is an explicit discussion about an M&A deal emerging from its content. The article is as follows:

                [ 
                 {article['text']}
                ]
                If the article explicitly mentions or clearly discuss an M&A (merger & acquisition) deal, generate a detailed report in HTML format. Use <B> tags to highlight the titles of each section and <UL> and <LI> tags for listing items. The report should include the following sections:

                    <HTML>
                    <B>Executive Summary</B>
                    <UL><LI>Summarize the main points of the M&A deal: companies involved, deal size, key dates, high-level analysis of the deal's rationale and expected outcomes.</LI></UL>
                    
                    <B>Introduction</B>
                    <UL><LI>Provide a background information about the companies involved, industry context and market conditions leading up to the M&A.</LI></UL>
                    
                    <B>Details of the Deal</B>
                    <UL><LI>Provide a description of the M&A transaction: type, structure, financial terms, timeline, information on deal financing.</LI></UL>
                    
                    <B>Strategic Rationale</B>
                    <UL><LI>Provide a strategic reasons behind the M&A, expected benefits for both companies.</LI></UL>
                    
                    <B>Market Reaction and Analysis</B>
                    <UL><LI>Discuss the market reaction to the announcement, comparison with similar industry deals.</LI></UL>
                    
                    <B>Regulatory and Legal Considerations</B>
                    <UL><LI>Mention any regulatory approvals and antitrust concerns, legal implications and compliance requirements.</LI></UL>
                    
                    <B>Risk Analysis</B>
                    <UL><LI>Discuss any potential risks associated with the deal, risk mitigation strategies.</LI></UL>
                    
                    <B>Financial Analysis</B>
                    <UL><LI>Review the financial metrics and performance indicators, impact on financial statements.</LI></UL>
                    
                    <B>Industry and Competitor Impact</B>
                    <UL><LI>Discuss the effect on the broader industry and market dynamics, impact on competitors.</LI></UL>
                    
                    <B>Conclusion and Recommendations</B>
                    <UL><LI>Provide a summary of key findings and insights, recommendations for stakeholders.</LI></UL>
                   
                    
                    </HTML>
                    
                    If the article does not explicitly mention or discuss an M&A deal, please respond with: can't produce report.
                
            """

        try:
            response = call_gpt_completion(prompt)
            report = {'text': ''}

            for choice in response.choices:
                report['text'] += choice.message.content

            if "Executive Summary" in report['text']:
                report['link'] = article['link']
                report['title'] = article['title']
                report['published'] = article['published']
                reports.append(report)
                print(f"Created a report about: {article['title']}")
            else:
                print(f"Can't product report for: {article['title']}")

            if len(reports) == NUM_OF_REPORTS:
                break


        except Exception as e:
            print("An error occurred:", str(e))

    return reports

def generate_intro(reports):
    print("Generate post intro")

    prompt = """
        Write a paragraph introducing a digest that contains M&A reports about the following titles, don't elaborate on these titles:
        []

        The reports are created automatically by using Webz.io news api and ChatGPT. The reports are generated by calling the Webz.io news API for news articles categorized as "Economy, Business and Finance" related to M&A deals. The matching news articles are then run through a ChatGPT prompt to analyze if the article is indeed about an M&A deal. If so it create a structured report. 
        """
    prompt = insert_titles_in_text(prompt, reports)
    intro = ""
    try:
        response = call_gpt_completion(prompt)

        for choice in response.choices:
            intro += choice.message.content
    except Exception as e:
        print("An error occurred:", str(e))

    return intro


def generate_title(intro):
    print("Creating a title")

    prompt = "Create a title using the following text as a context:\n" + intro
    title_text = ""
    try:
        response = call_gpt_completion(prompt)

        for choice in response.choices:
            title_text += choice.message.content
    except Exception as e:
        print("An error occurred:", str(e))

    title_text = title_text.strip(" ").strip('\"')
    if title_text.startswith("Title:"):  # Sometimes ChatGPT prefix the title with Title:
        return title_text[len("Title:"):]

    return title_text


def create_word_doc(file_name, title_text, image_url, intro, reports):

    print("Saving to word document")
    doc = docx.Document()

    # Add a title
    title = doc.add_paragraph()
    title.style = 'Title'
    title_run = title.add_run(title_text)
    title_run.font.size = Pt(24)  # Set the font size
    title_run.font.name = 'Arial (Body)'  # Set the font
    title.alignment = WD_ALIGN_PARAGRAPH.CENTER  # Center align the title

    if len(image_url) > 0:
        add_image_from_base64(doc, image_url)
    doc.add_paragraph(intro)

    # Add each report
    for report in reports:
        p = doc.add_paragraph(style='Heading 1')
        add_hyperlink(p, report['link'], report['title'])
        doc.add_paragraph(f"Published on: {report['published']}")
        html_to_word(doc, report['text'])

    doc.add_paragraph("""

    """)



    for paragraph in doc.paragraphs:
        for run in paragraph.runs:
            run.font.name = 'Arial (Body)'

    # Save the document
    doc.save(file_name)

def main():

    image_url = generate_article_image()
    filtered_articles = get_unique_posts_from_webz("""category:"Economy, Business and Finance" num_chars:>1000  language:english published:>now-7d social.facebook.likes:>0 title:acquire""")
    reports = generate_reports(filtered_articles)
    intro = generate_intro(reports)
    title_text = generate_title(intro)
    create_word_doc("M&A digest.docx", title_text, image_url, intro, reports)


if __name__ == "__main__":
    main()

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

import docx

import requests

from openai import OpenAI

import os

import openai

from Levenshtein import ratio

from docx.shared import Pt

from bs4 import BeautifulSoup

import io

from docx.oxml.shared import OxmlElement, qn

from docx.shared import Pt

from docx.enum.text import WD_ALIGN_PARAGRAPH

WEBZ_API_KEY = os.getenv("WEBZ_API_KEY")

openai.api_key = os.getenv("OPENAI_API_KEY")

NUM_OF_REPORTS = 5

client = OpenAI()

def are_similar(str1, str2, threshold=1):

"""

Check if two strings are similar based on Levenshtein ratio.

"""

return ratio(str1, str2) > threshold

def remove_similar_strings(articles):

unique_articles = []

for article in articles:

if not any(are_similar(article['text'], existing['text'], 0.7) for existing in unique_articles):

unique_articles.append(article)

return unique_articles

def trim_string(string, max_length):

if len(string) > max_length:

return string[:max_length]

else:

return string

# Function to get news articles from Webz.io API

def fetch_articles(query, api_key, total):

endpoint = f"https://api.webz.io/filterWebContent?token={api_key}&format=json&q={query}&size=100&ts=0"

all_posts = []

while total > 0:

response = requests.get(endpoint)

data = response.json()

posts = data["posts"]

if len(posts) == 0:

break

all_posts.extend(posts)

total -= len(posts)

if total > 0 and "next" in data:

endpoint = f"https://api.webz.io{data['next']}"

else:

break

articles = []

for article in all_posts:

article = {'title': article["title"],

'text': trim_string(trim_title(article["title"]) + "\n\n" + article["text"], 10000),

'link': article['url'],

'published': article['published']}

articles.append(article)

return articles

def trim_title(input_string):

words = input_string.split()

if "|" in input_string:

return input_string.split("|")[0]

last_dash_index = input_string.rfind("-")

if last_dash_index != -1:

right_of_dash = input_string[last_dash_index + 1:]

right_words = right_of_dash.split()

if len(right_words) <= 3 and len(words) > 10:

return input_string[:last_dash_index]

return input_string

def add_image_from_base64(doc, image_url):

response = requests.get(image_url)

# Check if the request was successful

if response.status_code == 200:

image_stream = io.BytesIO(response.content)

doc.add_picture(image_stream, width=docx.shared.Inches(6))

else:

print(f"Failed to download image. Status code: {response.status_code}")

def html_to_word(doc, html_content):

soup = BeautifulSoup(html_content, 'html.parser')

for element in soup.find_all(['b', 'ul']):

if element.name == 'b':

# Add bold text as a heading

doc.add_paragraph(element.get_text(), style='Heading 2')

elif element.name == 'ul':

for item in element.find_all('li'):

# Add list items

doc.add_paragraph(item.get_text(), style='List Bullet')

def add_hyperlink(paragraph, url, text):

"""

Add a hyperlink to a paragraph.

"""

part = paragraph.part

r_id = part.relate_to(url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True)

hyperlink = OxmlElement('w:hyperlink')

hyperlink.set(qn('r:id'), r_id,)

new_run = OxmlElement('w:r')

rPr = OxmlElement('w:rPr')

u = OxmlElement('w:u')

u.set(qn('w:val'), 'single')

rPr.append(u)

u = OxmlElement('w:u')

u.set(qn('w:val'), 'single')

rPr.append(u)

new_run.append(rPr)

new_run.text = text

hyperlink.append(new_run)

paragraph._p.append(hyperlink)

return hyperlink

def insert_titles_in_text(text, reports):

# Placeholder for inserting the titles

placeholder = "[]"

# Extracting the titles from the reports and formatting them with new lines

titles = "\n".join([report['title'] for report in reports])

# Replacing the placeholder with the titles

updated_text = text.replace(placeholder, titles)

return updated_text

def generate_article_image():

print("Generating post image")

image_url = ""

try:

response = client.images.generate(

model="dall-e-3",

n=1,

size="1024x1024"

)

image_url = response.data[0].url

except Exception as e:

print("An error occurred generating the image:", str(e))

return image_url

def get_unique_posts_from_webz(query):

print("Fetch posts from Webz.io")

articles = fetch_articles(query, WEBZ_API_KEY, 100)

filtered_articles = remove_similar_strings(articles)

return filtered_articles

def call_gpt_completion(prompt):

return client.chat.completions.create(

model="gpt-4-1106-preview",

max_tokens=4096,

messages=[

{"role": "user", "content": prompt},

]

)

def generate_reports(filtered_articles):

print("Generating Reports")

reports = []

for article in filtered_articles:

print(f"Creating report about: {article['title']}")

[

{article['text']}

]

<HTML>

Executive Summary

<UL><LI>Summarize the main points of the M&A deal: companies involved, deal size, key dates, high-level analysis of the deal's rationale and expected outcomes.</LI></UL>

Introduction

<UL><LI>Provide a background information about the companies involved, industry context and market conditions leading up to the M&A.</LI></UL>

Details of the Deal

<UL><LI>Provide a description of the M&A transaction: type, structure, financial terms, timeline, information on deal financing.</LI></UL>

Strategic Rationale

<UL><LI>Provide a strategic reasons behind the M&A, expected benefits for both companies.</LI></UL>

Market Reaction and Analysis

<UL><LI>Discuss the market reaction to the announcement, comparison with similar industry deals.</LI></UL>

Regulatory and Legal Considerations

<UL><LI>Mention any regulatory approvals and antitrust concerns, legal implications and compliance requirements.</LI></UL>

Risk Analysis

<UL><LI>Discuss any potential risks associated with the deal, risk mitigation strategies.</LI></UL>

Financial Analysis

<UL><LI>Review the financial metrics and performance indicators, impact on financial statements.</LI></UL>

Industry and Competitor Impact

<UL><LI>Discuss the effect on the broader industry and market dynamics, impact on competitors.</LI></UL>

Conclusion and Recommendations

<UL><LI>Provide a summary of key findings and insights, recommendations for stakeholders.</LI></UL>

</HTML>

If the article does not explicitly mention or discuss an M&A deal, please respond with: can't produce report.

"""

try:

response = call_gpt_completion(prompt)

report = {'text': ''}

for choice in response.choices:

report['text'] += choice.message.content

if "Executive Summary" in report['text']:

report['link'] = article['link']

report['title'] = article['title']

report['published'] = article['published']

reports.append(report)

print(f"Created a report about: {article['title']}")

else:

print(f"Can't product report for: {article['title']}")

if len(reports) == NUM_OF_REPORTS:

break

except Exception as e:

print("An error occurred:", str(e))

return reports

def generate_intro(reports):

print("Generate post intro")

prompt = """

Write a paragraph introducing a digest that contains M&A reports about the following titles, don't elaborate on these titles:

[]

"""

prompt = insert_titles_in_text(prompt, reports)

intro = ""

try:

response = call_gpt_completion(prompt)

for choice in response.choices:

intro += choice.message.content

except Exception as e:

print("An error occurred:", str(e))

return intro

def generate_title(intro):

print("Creating a title")

prompt = "Create a title using the following text as a context:\n" + intro

title_text = ""

try:

response = call_gpt_completion(prompt)

for choice in response.choices:

title_text += choice.message.content

except Exception as e:

print("An error occurred:", str(e))

title_text = title_text.strip(" ").strip('\"')

if title_text.startswith("Title:"): # Sometimes ChatGPT prefix the title with Title:

return title_text[len("Title:"):]

return title_text

def create_word_doc(file_name, title_text, image_url, intro, reports):

print("Saving to word document")

doc = docx.Document()

# Add a title

title = doc.add_paragraph()

title.style = 'Title'

title_run = title.add_run(title_text)

title_run.font.size = Pt(24) # Set the font size

title_run.font.name = 'Arial (Body)' # Set the font

title.alignment = WD_ALIGN_PARAGRAPH.CENTER # Center align the title

if len(image_url) > 0:

add_image_from_base64(doc, image_url)

doc.add_paragraph(intro)

# Add each report

for report in reports:

p = doc.add_paragraph(style='Heading 1')

add_hyperlink(p, report['link'], report['title'])

doc.add_paragraph(f"Published on: {report['published']}")

html_to_word(doc, report['text'])

doc.add_paragraph("""

""")

for paragraph in doc.paragraphs:

for run in paragraph.runs:

run.font.name = 'Arial (Body)'

# Save the document

doc.save(file_name)

def main():

image_url = generate_article_image()

filtered_articles = get_unique_posts_from_webz("""category:"Economy, Business and Finance" num_chars:>1000 language:english published:>now-7d social.facebook.likes:>0 title:acquire""")

reports = generate_reports(filtered_articles)

intro = generate_intro(reports)

title_text = generate_title(intro)

create_word_doc("M&A digest.docx", title_text, image_url, intro, reports)

if __name__ == "__main__":

main()

Example of auto-generated report in PDF format.

Ready to automate M&A reports for your organization? Talk to one of our experts today.

News API

Spread the News

Not subscribed to our Dark Web Pulse updates?

By submitting you agree to Webz.io's Privacy Policy and further marketing communications.

How to Automate Mergers and Acquisitions Reports: A Guide for Developers

What you’ll need to run the script

Automating mergers and acquisitions reports: script breakdown

Import packages and modules

Set global variable and access API keys

Orchestrate entire process (main)

Define functions

Fetch news articles from Webz.io (def fetch_articles)

Remove duplicate posts (get_unique_posts_from_webz)

Look for similar strings (are_similar)

Remove similar articles (remove_similar_strings)

Trim string (trim_string)

Trim article title (trim_title)

Send prompt (call_gpt_completion)

Review articles and generate reports (generate_reports)

Generate title (generate_title)

Generate introduction (generate_intro)

HTML to formatted text (html_to_word)

Add hyperlink (add_hyperlink)

Add title placeholder (insert_titles_in_text)

Generate image (generate_article_image)

Download image (add_image_from_base64)

Create Word doc (create_word_doc)

AI + Python = a powerful automation tool

Download the example code and report:

Not subscribed to our Dark Web Pulse updates?

Feed Your Machines the Data They Need

Feed Your Machines the Data They Need

How to Automate Mergers and Acquisitions Reports: A Guide for Developers

What you’ll need to run the script

Automating mergers and acquisitions reports: script breakdown

Import packages and modules

Set global variable and access API keys

Orchestrate entire process (main)

Define functions

Fetch news articles from Webz.io (def fetch_articles)

Remove duplicate posts (get_unique_posts_from_webz)

Look for similar strings (are_similar)

Remove similar articles (remove_similar_strings)

Trim string (trim_string)

Trim article title (trim_title)

Send prompt (call_gpt_completion)

Review articles and generate reports (generate_reports)

Generate title (generate_title)

Generate introduction (generate_intro)

HTML to formatted text (html_to_word)

Add hyperlink (add_hyperlink)

Add title placeholder (insert_titles_in_text)

Generate image (generate_article_image)

Download image (add_image_from_base64)

Create Word doc (create_word_doc)

AI + Python = a powerful automation tool

Download the example code and report:

Not subscribed to our Dark Web Pulse updates?

Read More

How to Automate Financial Risk Reports: A Guide for Product Managers

How to Automate Financial Risk Reports: A Guide for Developers

How to Automate Customer Sentiment Analysis Reports: A Guide for Product Managers

Feed Your Machines the Data They Need

Feed Your Machines the Data They Need

Ready to Explore Web Data at Scale?