Udit Vashisht
Author: Udit Vashisht


Create your Own Customizable Email Spam Filter using Python

Python की मदद से समय-समय पर अपने अवांछित ईमेलों को हटाएं

  • 7 minutes read
  • 1572 Views
Create your Own Customizable Email Spam Filter using Python

Use Python and Gmail API to create your own Customizable Email Spam Filter

There was a time when you will run to the mailbox outside your home at a fixed time to check your mail and segregate the crap out of it. But today, we all live in a digital era, where the mail keeps on getting delivered to our electronic mailboxes throughout the day. All the mail received in your mailbox is not important and tools like spam filters and unsubscribe options can help to get rid of them. But, there are certain kinds of emails which do not fall in the above two categories. One of the most common example is OTPs received from credit cards and banks, etc. You won’t like to add them into your spam box or unsubscribe them.

If you are using a Google Mail Account, it has an API that can be used with Python to auto-delete your unwanted emails. Let’s walk through the steps one by one.

First of all, visit https://developers.google.com/gmail/api/quickstart/pythonpython_gmail_api_integration_by_saralgyaan.PNG

Login with your credentials and name your project. python_gmail_api_integration_saralgyaan.png

‘Download Client Configuration’ will download the required ‘credentials.json’ file. Save it in the directory/folder in which you are going to create the Python script.

Learn_python_easy_way_saralgyaan.png

Now, we will use pip to install the required modules i.e. google api and oauthclient

pip install --upgrade google-api-python-client oauth2client

Create a Python file and start coding. Let’s name it auto_delete.py. Just make sure to create this Python file in the directory/folder containing the ‘credentials.json’ file downloaded earlier. Since we later will host this script on PythonAnywhere and we need it to run in Python 3.6, let us add the shebang at the very top of it.

#!/usr/bin/python3.6

Now, we will make the necessary imports.

import os
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
from apiclient import errors

Since we will host this script on a cloud i.e. PythonAnywhere , it is better to log the stuff to have some idea of what is going on daily. So quickly punch in the basic logging stuff. If it is overwhelming for you, just copy and paste it. Soon, we will come up with a basic tutorial on logging too.

import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s:%(name)s:%(message)s')
file_handler = logging.FileHandler('auto_delete.log')
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)

The script will delete the email messages, so we need both read and write privileges. Such privilege/right is called scope in terms of Google Email API.

SCOPES = 'https://mail.google.com/'  # read-write mode

We will be needing a service to access our mailbox which will be created by the function called ‘init()’. We will set it as None in the beginning.

SERVICE = None

Create a function init(), which will take three variables- user_id, token_file, credentials_file.

user_id : In all the cases, we will be using ‘me’ as user_id which depicts the user whose ‘credentials.json’ file has been downloaded in the steps above.

credentials_file: It is the ‘credentials.json’ file downloaded above. This file must exist in the directory containing the script i.e. ‘auto_delete.py’.

token_file: When the init() function will be run for the very first time, it will open the browser and ask you to login to your gmail account(the one for which the ‘credentials.json’ file has been created.) and ‘token.json’ will be created in the same parent directory. If you ever change the ‘SCOPE’ defined above, you need to delete the ‘token.json’ and re-generate the same.

def init(user_id='me', token_file='token.json', credentials_file='credentials.json'):

    global SERVICE

    if not os.path.exists(credentials_file):
        raise AutoDeleteException('Can\'t find credentials file at %s. You can download this file from https://developers.google.com/gmail/api/quickstart/python and clicking "Enable the Gmail API"' % (os.path.abspath(credentials_file)))

    store = file.Storage(token_file)
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets(credentials_file, SCOPES)
        creds = tools.run_flow(flow, store)
    SERVICE = build('gmail', 'v1', http=creds.authorize(Http()))

In order to raise a custom exception in case of non missing ‘credentials.json’, we will be creating a class for that exception as under:-

class AutoDeleteException(Exception):
    pass

Now we will run the init() method using the following code. If there is no error in the script, it will take you to the browser and ask you for the login credentials for which ‘credentials.json’ file was created. This will create a ‘token.json’ file in the same parent directory. If you ever change the scope, you need to delete the ‘token.json’ and run init() again to re-create the ‘token.json’ file.

if __name__ == '__main__':
    init()

Now let’s create a search function with variables ‘query’ and user_id = ‘me’ to search for the messages(emails) matching the query.

The query will be set of queries similar to the queries entered in the searchbox of gmail e.g.

label:UNREAD
from:abc@email.com
subject:”hello”
has:attachment

for more details you can visit here.

Run the init() method to create the “SERVICE”

def search(query, user_id='me'):

    if SERVICE is None:
        init()

The following code will return a list of gmail thread objects matching the query. Each gmail thread object is a python dictionary with keys ‘id’ and ‘threadId’-

{'id': '15573cf1adfassafas', 'threadId': '15573cf1adfassafas'}

We will need the value of ‘id’ in our next function to delete the messages(email) matching the query.

    try:
        response = SERVICE.users().messages().list(userId=user_id,
                                                   q=query).execute()
        messages = []
        if 'messages' in response:
            messages.extend(response['messages'])

        while 'nextPageToken' in response:
            page_token = response['nextPageToken']
            response = SERVICE.users().messages().list(userId=user_id, q=query,
                                                       pageToken=page_token).execute()
            messages.extend(response['messages'])

        return messages

    except errors.HttpError as e:
        logger.exception(f'An error occurred:{e}')

So far so good, now create our last function to delete the messages(emails) matching the query. You must remember that this will permanently delete the message(email) and won’t send it to the trash.So please use it with some caution.

def delete_messages(query, user_id='me'):
    messages = search(query)
    if messages:
        for message in messages:
            SERVICE.users().messages().delete(userId=user_id, id=message['id']).execute()
            logger.info(f'Message with id: {message["id"]} deleted successfully.')
    else:
        logger.info("There was no message matching the query.")

To run the script, let’s add the most loved dunders :P

if __name__ == '__main__':
    logger.info("Deleting messages from abc@gmail.com.")
    delete_messages('from:abc@gmail.com\
            subject:"Go Shopping"\
            older_than:1d'
                    )

You can watch the entire tutorial on youtube too

Read this post to learn how to host the script on PythonAnywhere and schedule it to run daily.

Complete code for the same can be found on our github page here

If you have liked our tutorial, there are various ways to support us, the easiest is to share this post. You can also follow us on facebook,twitter and youtube.

In case of any query, you can leave the comment below.

If you want to support our work. You can do it using Patreon.



Related Posts

Convert JSON to CSV using Python
By Udit Vashisht

JSON to CSV in Python

Converting large JSON files to CSV could be a difficult task. But python is a powerhouse and it has lots of built-in and third party modules which make data processing a lot easier. You can read/write/parse large json files, csv ...

Read More
How to make a Twitter Bot using Python and Tweepy
By Udit Vashisht

How to make a Twitter Bot using Python and Tweepy

If you are new to python and looking for some fun python project , a twitter bot is a must try. So in this tutorial we will make a twitter bot using python and tweepy . ...

Read More
Search