What is Natural Language Processing(NLP)

Introduction to Natural Language Processing(NLP)

Natural Language Processing is a technique where in we need to process the normal human language and makes sense out of it. Basically this tutorial has two part one is theory and another is practical.

We will post few posts on examples of Natural Language Processing on both Stanford CoreNLP / Apache OpenNLP.

There are some steps that needs to be follow in processing the Natural Language. Actually there are many Natural Language Processing tools like Apache OpenNLP, Stanford CoreNLP etc.

But major steps are same for everyone.

a) Sentence Detection : In this part, the individual sentence are detected from the main sentence. As for example ,as seen in two different colours,

Natural Processing Language is a good Technique. It has various parts,

there are two different sentences. Using this step we detect this two individual sentence from the main sentence and do the processing for the next step.

Note : the definition of sentence is --- a sentence is defined as the longest white space trimmed character sequence between two punctuation marks. The first and last sentence make an exception to this rule. The first non whitespace character is assumed to be the begin of a sentence, and the last non whitespace character is assumed to be a sentence end. The sample text below should be segmented into its sentences.

b) Tokenizer : In this stage of Natural Language Processing, the individual tokens are generated from the sentence. The default delimiter for tokens in most of the cases is white space. So if we consider the sentence,

: Natural Processing Language is a good Technique, there will be following tokens :

Natural, Language, Processing, is , a good , technique

As mentioned in different colors, these are the different token

c) POS Tagging – It is one of the most important part in Natural Language Processing. Basically it identifies the Part Of Speech Identifier for each of the tokens generated above.

As for example -

Natural-JJ Processing-NN Language-NN is-VBZ a-DT good-JJ Technique-NN

These tags(JJ,NN etc) are known as POS Tags(Penn Tree Bank). Complete list and there meaning can be found at the following link :

POS Penn-Tree Bank

https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

d) Chunking- This is the Step that is not found in every NLP. Like Chunking is only present in Apache OpenNLP, but not present in Stanford CoreNLP.

What chunking does is that it combines the POS Tagging O/P :

Consider an example : In the above sentence Processing & Language are both nouns. So it will combine them and produce the Processing Language as one Noun.

Text chunking consists of dividing a text in syntactically correlated parts of words, like noun groups, verb groups, but does not specify their internal structure, nor their role in the main sentence.

e) Parsing(Parse Tree): In this step the parse tree in generated for the sentence .

This can be parsed to use the necessary information.

f) Dependency Parsing: Currently this is only available in Stanford CoreNLP. This does what that it identifies the dependencies between words in a sentence. The dependencies are between a governor and a dependent.

The dependencies exist between two words Natural & processing in above case. The relationship between them is amod.

Stanford dependencies (SD) are triplets: name of the relation, governor and dependent

In this case the triplet would be amod, Natural , Processing.

g) Named Entity Recognition: The Name Finder can detect named entities and numbers in text.

As for example in sentence : Harward University is good,

These are the most common steps that are mainly found in any Natural Language Processing Tools. How they work is that they have created there training data and based on that training data, they generates the models. Using this models as there knowledge, the code works.

Apache OpenNLP models can be downloaded from the below link :

http://opennlp.sourceforge.net/models-1.5/

Stanford provides them in the form of jar which can be downloaded from

http://nlp.stanford.edu/software/corenlp.shtml

Remember this, you can also train these models, but this could also impact the whole learning of the system. So be careful before doing that.

If you need more detail on that please visit :

https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html

Here is the complete video which describes the Natual Language Processing.

Video Ref : Prof.Sudeshna Sarkar and Prof.Anupam Basu, Department of Computer Science and Engineering,I.I.T, Kharagpur (Embedded Youtube Video)

You can also check our Microservices post

Introduction to Microservices

Spring boot Profiles : Different configuration for different environment

You can find out Kubernetes blog here

Search This Blog

What is Natural Language Processing(NLP)

Introduction to Natural Language Processing(NLP)

Labels

Comments

Post a Comment

Popular posts from this blog

Login with Google Account using PHP / Javascript using OAuth2.0

How To Set Up Apache Virtual Hosts on Ubuntu

JSON Introduction