Natural Language Processing Application in Python

BACK

Introduction
As a data analyst with a passion for Data Science, I developed 'SYNOPSIS', an AI application that marks a significant milestone in my professional journey. This desktop tool emerged from my firsthand experience with the challenges of text analysis across various roles, from processing extensive survey responses to analysing client support tickets I recognised a recurring challenge: the complexity and costliness of traditional text analysis tools like NVivo. This realisation inspired me to develop SYNOPSIS, a desktop application designed to offer powerful yet accessible text analysis capabilities built entirely within Python.

Functionality
SYNOPSIS offers robust functionalities tailored for comprehensive text analysis:

Summarisation: Utilising BART models sourced from Facebook's open source Hugging Face library, SYNOPSIS generates succinct summaries of large text inputs.
Sentiment Analysis: The application accurately discerns sentiment (positive, negative, or neutral) conveyed within text passages.
Keyword Extraction: SYNOPSIS employs the Natural Language Toolkit (NLTK) to extract pivotal keywords from provided text inputs.
Visualisation: Analysis outcomes are dynamically visualised through interactive plots integrated within the PyQt5 user interface.

Technology Stack

SYNOPSIS is built upon a solid foundation of cutting-edge technologies:

PyQt5: A robust Python framework for crafting intuitive graphical user interfaces.
Hugging Face Transformers: Integrates powerful BART models for executing sophisticated natural language processing tasks.
NLTK: A widely respected Python library renowned for its capabilities in natural language processing.
Plotly: Facilitates the creation of engaging and interactive web-based visualisations for presenting analytical results.

Dependencies
• docx==0.2.4
• nltk==3.8.1
• plotly==5.9.0
• PyQt5==5.15.10
• PyQt5_sip==12.13.0
• python-docx==1.1.2
• torch==2.3.1
• transformers==4.39.3

·BART model from Facebook: Licensed under Apache-2.0. For detailed licensing information, please refer to https://huggingface.co/docs/transformers/en/model_doc/bart.

·NLTK: Licensed under a permissive license. For further details, visit https://www.nltk.org/.

How it Works

Text Input: Users interact with the PyQt5 GUI to input text directly into SYNOPSIS or upload documents. This serves as the raw material for analysis.
Preprocessing (Tokenisation): SYNOPSIS begins by cleaning the text through a process called “tokenisation”. Tokenisation breaks down the text into meaningful units called tokens which are essential for further analysis, this allows NLP models to process text in meaningful chunks rather than as a continuous stream of characters and extract discrete elements.
Analysis:
- Summarisation: SYNOPSIS employs the BART model to process the text and generate concise summaries. This model comprehends the context and key points within the text to condense it effectively.
- Sentiment Analysis: Utilising a pre-trained model from the Hugging Face library, SYNOPSIS evaluates the sentiment expressed in the text—whether it's positive, negative, or neutral. This analysis provides insights into the emotional tone conveyed by the text.
- Keyword Extraction: By utilising NLTK's frequency analysis and stopword filtering, SYNOPSIS identifies and extracts significant keywords from the input text. These keywords highlight the main themes or topics discussed in the text.
Visualisation: SYNOPSIS transforms the analysed results into interactive charts using Plotly.
Output: Finally, SYNOPSIS presents the processed results in its user interface. Users can interact with these outputs to delve deeper.

Challenges and Solutions: One significant challenge was optimising the processing of large text inputs without compromising the application's responsiveness and distribution. I addressed this by accessing the models online without saving and storing them locally, this allows the application to be distributed and updated easily with a smaller file size and live access to the latest models. Another hurdle was integrating the various NLP models seamlessly. This required careful management of dependencies and extensive testing to ensure compatibility across different operating systems.

User Feedback: Early users have found SYNOPSIS to be a valuable tool. As one beta tester, a Doctorate in Pharmacy practice, noted: "SYNOPSIS has dramatically streamlined my literature review process. What used to take hours now takes minuets."

Here's a live step-by-step walk through of myself building a basic version of this application (V1):

TOP