Skip to main content

What is Natural Language Processing(NLP)

Introduction to Natural Language Processing(NLP)


Natural Language Processing is a technique where in we need to process the normal human language and     makes sense out of it.  Basically this tutorial has two part one is theory and another is practical.
We will post few posts on examples of Natural Language Processing on both Stanford CoreNLP / Apache OpenNLP.


There are some steps that needs to be follow in processing the Natural Language. Actually there are many Natural Language Processing tools like Apache OpenNLP, Stanford CoreNLP etc.
But major steps are same for everyone.
a)    Sentence Detection :  In this part, the individual sentence are detected from the main sentence. As for example ,as seen in two different colours,

Natural Processing Language is a good Technique. It has various parts,
  there are two different sentences. Using this step we detect this two individual sentence from the main sentence and do the processing for the next step.

Note : the definition of sentence is  --- a sentence is defined as the longest white space trimmed character sequence between two punctuation marks. The first and last sentence make an exception to this rule. The first non whitespace character is assumed to be the begin of a sentence, and the last non whitespace character is assumed to be a sentence end. The sample text below should be segmented into its sentences.

       b)   Tokenizer :  In this stage of Natural Language Processing, the individual tokens are generated from the sentence. The default delimiter  for tokens in most of the cases is white space.  So if we consider the sentence, 
:       Natural Processing Language is a good Technique, there will be following tokens : 
 
 Natural,  Language, Processing, is , a good , technique
As mentioned in different colors,  these are the different token

      c)   POS Tagging – It is one of the most important part in Natural Language Processing. Basically it identifies the Part Of Speech Identifier for each of the tokens generated above.
As for example -
Natural-JJ  Processing-NN  Language-NN is-VBZ  a-DT  good-JJ Technique-NN

These tags(JJ,NN etc) are known as POS Tags(Penn Tree Bank). Complete list and there meaning can be found at the following link :

POS Penn-Tree Bank

      d)  Chunking- This is the Step that is not found in every NLP. Like Chunking is only present in Apache OpenNLP, but not present in Stanford CoreNLP.

What chunking does is that it combines the POS Tagging O/P :
Consider an example : In the above sentence Processing & Language are both nouns. So it will combine them and produce the Processing Language as one Noun.

Text chunking consists of dividing a text in syntactically correlated parts of words, like noun groups, verb groups, but does not specify their internal structure, nor their role in the main sentence.

      e)    Parsing(Parse Tree):  In this step the parse tree in generated for the sentence .




This can be parsed to use the necessary information.

       f)  Dependency Parsing:  Currently this is only available in Stanford CoreNLP. This does what that it identifies the dependencies between words in a sentence. The dependencies are between a governor and a dependent.
 


The dependencies exist between two words Natural & processing in above case. The relationship between them is amod.

Stanford dependencies (SD) are triplets: name of the relation, governor and dependent
In this case the triplet would be amod, Natural , Processing.

      g)      Named Entity Recognition:  The Name Finder can detect named entities and numbers in text.
As for example in sentence :  Harward University is good,



These are the most common steps that are mainly found in any Natural Language Processing Tools. How they work is that they have created there training data and based on that training data, they generates the models. Using this models as there knowledge, the code works.

Apache OpenNLP models can be downloaded from the below link :

Stanford provides them in the form of jar which can be downloaded from


Remember this, you can also train these models, but this could also impact the whole learning of the system. So be careful before doing that.

If you need more detail on that please visit :
https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html

Here is the complete video which describes the Natual Language Processing.

Video Ref : Prof.Sudeshna Sarkar and Prof.Anupam Basu, Department of Computer Science and Engineering,I.I.T, Kharagpur (Embedded Youtube Video)


You can also check our Microservices post

Introduction to Microservices


Spring boot Profiles : Different configuration for different environment

          
You can find out Kubernetes blog here

Comments

Popular posts from this blog

Login with Google Account using PHP / Javascript using OAuth2.0

Login with Google Account using PHP with code This post have Complete Code for Login / Sign-in  with google Account  using PHP / Javascript with oAuth2.0 Basically today we have seen almost every website needs you to register yourself before you can post or take part in any discussions to the website. But it become a tedious task to register and login to many different sites. Solution is to provide the users the option to Login with existing Google / Facebook account as almost everyone have Facebook and Google account.. In this post, I am going to explain how to integrate the Google Login / Sign in  for your website. For this,  First you need to create your Client ID, Client Secret and your developer API key. For this go to https://developers.google.com/identity/sign-in/web/sign-in Click on the button Create Project. A new window will open up. Please select Create Project / or select already created Project. It will then ask for about type ...

How To Set Up Apache Virtual Hosts on Ubuntu

How to setup Virtual Host in Ubuntu 16 / Ubuntu 18 on localhost / local machine To run the website with host on localhost(With LAMP) becomes important in many cases. This blog post will demonstrate how to achieve this. Assuming you have LAMP already installed and reading the code from (/var/www/html) Follow the simple steps below Create the code base To Create the code, simply create a directory named localweb inside /var/www/html. Create a file index.php inside localweb directory Content of index.php file <?php  echo "Local Website"; ?> Now our code base is set, so we need to configure apache Go to apache directory cd  /etc/apache2/sites-available/ Create one file named localweb.conf with content <Directory /var/www/html/localweb/>     AllowOverride All </Directory> <VirtualHost *:80>     ServerAdmin admin@localweb.com     ServerName localweb.com   ...

ORCAM - Blind can read too.!!

ORCAM: The technology that was developed mainly with the aim – to help to help the visually impaired and blind regain the functionalities that were lost. Yeah. So now even visually impaired guys can get to know even the smallest written things just by pointing towards them, like the reading the book. Yes. Even those who are completely blind, can even read a book using ORCAM. SO what actually is ORCAM.????? A visual system with human like performance to help visually impaired people Here is a look towards ORCAM-See for yourself Search Results It is a simple device with a camera and earphone   that fits in the simple glasses   and uses the Text-to-speech technology to read the objects and sends the user in the voice format with the help of the earphone. The   OrCam device is able to identify thousands of objects, including the faces of loved ones, dogs, buses, newspaper text and store signs, all with the point o...