Basic Text Analysis with NLTK (Python)

Preliminaries

In this post I brush over some basic NLP processes including text tokenisation, part of speech tagging as well named entity recognition. Particularly how these can be achieved in the Python Programming language through the Natural Language Toolkit (NLTK) package developed by Steven Bird and his associates.

NLP is broad field spanning Linguistics , Computer Science and Engineering, to mention a few. NLP which is used here to mean Natural Language Processing shouldn’t be confused with Neurolingistic Programming. NLP as we mean it here, is that field concerned with the interaction between machines and human languages, it seeks to make Language accessible to machines and to make human machine interaction much easier. This field is one which has come a long way, however there remains many NLP problems that are yet to be solved. The ultimate goal of the field may be considered to be true A.I (i.e a truly intelligent machine possessing every human cognitive ability, in this case the L.A.D and thus an ability to acquire, use, comprehend and learn Language).

NLP is at work everyday of our modern lives. Everything from search engines, to online translation services such as bing and Google translate, chatbots and text auto completion on our phones are NLP driven.

Basic Text Analysis with NLTK

What follows is a presentation of some very basic NLP procedures, with NLTK a NLP package for the Python Programming language, although not the only package available for NLP, it has the advantage of simplicity afforded by the nature for the Python Programming language(Other popular NLP packages include Stanford’s Core NLP, and Apache’s Open NLP, these packages are written in Java.). Amongst the numerous high level programming languages in existence, Python proves to be one of the least complicated, it is very true to the essence of such languages, as languages which are more human readable and portable than Assembler and Machine languages while sacrificing speed.

Tokenising Texts

Word Tokenisation

Sentence Tokenisation

Stemming Tokens

Part of Speech tagging

Named Entity Recognition

7 thoughts on “Basic Text Analysis with NLTK (Python)”

The NLP functions presented are very basic.
What about more complex operations like relation extraction and explanation generation of deductive QA?
What are Python’s advantages over Prolog with which I know how to implement relation extraction and DQA with explanation quite simply.

John Kontos
AI professor

LikeLiked by 1 person

dirisujesse says:

Jan 14, 2017 at 7:49 pm

True the NLP functions discussed here are very basic. However the more advanced processes which you talk about can also be carried out in python, but I am not familiar with Prolog so I am not in a position to compare both.

LikeLike

Reply
dirisujesse says:

Jan 14, 2017 at 7:59 pm

I hope that you wouldn’t mind visiting http://www.nltk.org/book/ for more advanced discussions of NLP with nltk

LikeLike

Reply
1. ikontos2003 says:
  
  Jan 14, 2017 at 8:39 pm
  
  I did look at the book at around p 400.
  It is funny that the QA example refers to MY CITY !
  We have translated factoid questions addressed to DBs from NL to SQL with Prolog many years ago.
  The translation of NL to Logic is questioned by my work since 1992 on inference from text.
  
  LikeLike
2. dirisujesse says:
  
  Jan 14, 2017 at 10:19 pm
  
  Please Can I get a copy of the work, I will love to study it.
  
  LikeLike
3. ikontos2003 says:
  
  Jan 16, 2017 at 11:40 am
  
  You can find links and full texts of my pubs at : LinkedIn, Academia and Research Gate where I have posted a list of them too. If you find one listed that does not download please let me know and if I have it electronically I will send it ASAP to you.
  
  All the best
  John
  
  LikeLike
4. dirisujesse says:
  
  Jan 16, 2017 at 12:15 pm
  
  Thank You
  
  LikeLike

Share this:

Related

7 thoughts on “Basic Text Analysis with NLTK (Python)”

Leave a comment Cancel reply