What is NER in NLP? Real-World Examples and Use Cases Using Python and spaCy

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    What is NER in NLP? Real-World Examples and Use Cases Using Python and spaCy

    Ever wondered how Google or Siri understands names, places, and brands from a sentence? That's Named Entity Recognition – the secret behind smart machines understanding real-world references!


    Well, Name Entity Recognition (NER) which is a subtask of Natural Language Processing. It is a process to identify entities in a text from a predefined categories like person, organisation, location etc.


    It helps in an information extraction, allowing automated extraction of structured data from unstructured text. By recognising named entities, systems can better understand the relationship between different pieces of information within the text.


    Example


    Steve Jobs was a founder of Apple, he created his company April 1, 1976. Now company headquarter located in Cupertino,California,United State


    Person: Steve Jobs

    Organisation: Apple

    Date: April 1, 1976

    Location: Cupertino, California, United States


    Of course there is no way you can read whole text or corpus you just want context of that text. So these predefined libraries in python done these task easily for us to make better and efficient model


    Where We Use It?

    • Chatbot: ChatGpt, MetaAI, Gemini and other chatbot use NER model they trained on this to identify relevant entities mentioned in conversation.
    • Search Engine: More Obvious, NER helps search engine to identify and categorise subject mentioned on the web and in searches.


    Code

    • Print Entities of text


      import spacy
      nlp=spacy.load('en_core_web_sm')
      text=nlp(u"Steve Jobs was a founder of Apple, he created his company April 1, 1976. Now company headquarter located in Cupertino,California,United State")


    print(text.ents)




    text.ents prints the entities of text.




    Output:

    (Steve Jobs, Apple, April 1, 1976, Cupertino, California, United State)
    • Print entity, label and label description
      Now iterate through text and print entity, label and description of that label.




    def show_entities(text):
    if text.ents:
    for ent in text.ents:
    print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
    print('No Entities Found')

    show_entities(text)




    Output:
    Steve Jobs | PERSON | People, including fictional
    Apple | ORG | Companies, agencies, institutions, etc.
    April 1, 1976 | DATE | Absolute or relative dates or periods
    Cupertino | GPE | Countries, cities, states
    California | GPE | Countries, cities, states
    United State | GPE | Countries, cities, states
    • Make label a new entity


    Sometimes .ent doesn't identify the entity because of data from which library made of doesn't have that word so '.ent' wouldn't able to identify it.




    d=nlp(u"Foodles is earning money at an extensive rate")
    def show_entities(d):
    if d.ents:
    for ent in d.ents:
    print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
    print('No Entities Found')




    Output:
    No Entities Found


    Foodles is not identified as organisation because library vocabulary doesn't have this word




    d=nlp(u"Foodles is earning money at an extensive rate")
    def show_entities(d):
    if d.ents:
    for ent in d.ents:
    print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
    print('No Entities Found')

    ORG=d.vocab.strings[u"ORG"]
    new_entity=ss(d,0,1,label=ORG)
    d.ents=list(d.ents)+[new_entity]

    show_entities(d)




    Output:

    Foodles | ORG | Companies, agencies, institutions, etc
    • This gets the numerical ID for the label "ORG" (Organization) from spaCy's vocabulary.
    • ss is short for spacy.tokens.Span, so it's like:
      Span(doc, start, end, label=label)
      0, 1 → refers to the position of the word you want to tag:
      0 = start index of the word in the Doc
      1 = end index (non-inclusive) → only the first token
    • d.ents = the existing entities (like "Google" as ORG, "India" as GPE, etc.).You're adding your new entity to the list of existing ones.


    Make new label for an Entity




    d=nlp(u"Playing Cricket and Football are both good for health")

    def show_entities(d):
    if d.ents:
    for ent in d.ents:
    print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
    print('No Entities Found')




    Output:

    No Entities Found


    As you can see there no label for Cricket and Football




    from spacy.matcher import PhraseMatcher
    d=nlp(u"Playing Cricket and Football are both good for health")

    m=PhraseMatcher(nlp.vocab)

    phrase=['Football','Cricket']
    patterns=[nlp(text) for text in phrase]
    m.add('sports',None,*patterns)
    found=m(d)
    sport=d.vocab.strings[u"Sports"]
    new_ents=[ss(d,match[1],match[2],label=sport) for match in found]
    d.ents=list(d.ents)+new_ents

    def show_entities(d):
    if d.ents:
    for ent in d.ents:
    print(ent, '|' , ent.label_, '|' , spacy.explain(ent.label_))
    else:
    print('No Entities Found')




    Output:
    Cricket | Sports | None
    Football | Sports | None

    1. from spacy.matcher import PhraseMatcher You’re importing a tool that can find exact phrases like “Cricket” or “Football” in text.
    2. d = nlp(u"Playing Cricket and Football are both good for health") nlp() tokenize the sentence into word.
    3. m = PhraseMatcher(nlp.vocab) Create a PhraseMatcher. This tool will help find specific words or phrases from nlp.vocab.
    4. phrase = ['Football', 'Cricket'] You want to tag these words as entities.
    5. patterns = [nlp(text) for text in phrase] Converts each word into a spaCy Doc object (required by the matcher).
    6. m.add('sports', None, *patterns) Adds your patterns to the matcher under the label "sports".
    7. found = m(d) Run the matcher on the sentence d.


    This returns matches, for example:


    [(match_id, 1, 2), (match_id, 3, 4)]


    Here,

    1, 2 = "Cricket"

    3, 4 = "Football"

    1. sport = d.vocab.strings[u"Sports"] This gets a unique numeric ID for the label "Sports" (your custom entity name).
      9 new_ents = [ss(d, match[1], match[2], label=sport) for match in found]
      ss = Span (new entity span in the Doc)
      You create new entities from the match positions:
      match[1] = start
      match[2] = end
      label=sport = label this word as “Sports”
    2. d.ents = list(d.ents) + new_ents You’re adding the new entities to the original sentence.




    More...
Working...