Basic Text Analysis with NLTK (Python)

What follows is a presentation of some very basic NLP procedures, with NLTK a NLP package for the Python Programming language, although not the only package available for NLP, it has the advantage of simplicity afforded by the nature for the Python Programming language(Other popular NLP packages include Stanford’s Core NLP, and Apache’s Open NLP, these packages are written in Java.). Amongst the numerous high level programming languages in existence, Python proves to be one of the least complicated, it is very true to the essence of such languages, as languages which are more human readable and portable than Assembler and Machine languages while sacrificing speed.

Preliminaries

In this post I brush over some basic NLP processes including text tokenisation, part of speech tagging as well named entity recognition. Particularly how these can be achieved in the Python Programming language through the Natural Language Toolkit (NLTK) package developed by Steven Bird and his associates.

NLP is broad field spanning Linguistics , Computer Science  and  Engineering, to mention a few. NLP which is used here to mean Natural Language Processing shouldn’t  be confused with Neurolingistic Programming. NLP as we mean it here, is that field concerned with the interaction between machines and human languages, it seeks to make Language accessible to machines and to make human machine interaction much easier.  This field is one which has come a long way, however there remains many NLP problems that are yet to be solved. The ultimate goal of the field may be considered to be true A.I  (i.e a truly intelligent machine possessing  every human cognitive ability, in this case the L.A.D and thus an ability to acquire, use, comprehend and learn Language).

NLP is at work everyday of our modern lives. Everything from search engines, to online translation services such as bing and Google translate, chatbots and text auto completion on our phones are NLP driven.

Basic Text Analysis with NLTK

What follows is a presentation of some very basic NLP procedures, with NLTK a NLP package for the Python Programming language, although not the only package available for NLP, it has the advantage of simplicity afforded by the nature for the Python Programming language(Other popular NLP packages include Stanford’s Core NLP, and Apache’s Open NLP, these packages are written in Java.). Amongst the numerous high level programming languages in existence, Python proves to be one of the least complicated, it is very true to the essence of such languages, as languages which are more human readable and portable than Assembler and Machine languages while sacrificing speed.

Importing the necessary package
A concordance
Basic Text Analysis
The frequency of word endings in a Text

Tokenising Texts

  • Word Tokenisation

  • Sentence Tokenisation

Stemming Tokens 

Part of Speech tagging

Named Entity Recognition

Code Output

7 thoughts on “Basic Text Analysis with NLTK (Python)

  1. The NLP functions presented are very basic.
    What about more complex operations like relation extraction and explanation generation of deductive QA?
    What are Python’s advantages over Prolog with which I know how to implement relation extraction and DQA with explanation quite simply.

    John Kontos
    AI professor

    Liked by 1 person

    1. True the NLP functions discussed here are very basic. However the more advanced processes which you talk about can also be carried out in python, but I am not familiar with Prolog so I am not in a position to compare both.

      Like

      1. I did look at the book at around p 400.
        It is funny that the QA example refers to MY CITY !
        We have translated factoid questions addressed to DBs from NL to SQL with Prolog many years ago.
        The translation of NL to Logic is questioned by my work since 1992 on inference from text.

        Like

      2. You can find links and full texts of my pubs at : LinkedIn, Academia and Research Gate where I have posted a list of them too. If you find one listed that does not download please let me know and if I have it electronically I will send it ASAP to you.

        All the best
        John

        Like

Leave a comment