# Install spaCy for NER and other Natural Language Processing (NLP) tasks
!pip install -U spacy

Requirement already satisfied: spacy in c:\users\colto\anaconda3\lib\site-packages (3.7.2)
Requirement already satisfied: jinja2 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.11.3)
Requirement already satisfied: packaging>=20.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (21.3)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (1.0.5)
Requirement already satisfied: weasel<0.4.0,>=0.1.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (0.3.4)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (3.0.9)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.27.1)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (1.0.10)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.4.8)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (1.1.2)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.5.2)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (4.64.0)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (3.3.0)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.0.10)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.0.8)
Requirement already satisfied: thinc<8.3.0,>=8.1.8 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (8.2.1)
Requirement already satisfied: numpy>=1.19.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (1.21.5)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (6.4.0)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (0.9.0)
Requirement already satisfied: setuptools in c:\users\colto\anaconda3\lib\site-packages (from spacy) (61.2.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (3.0.12)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\colto\anaconda3\lib\site-packages (from packaging>=20.0->spacy) (3.0.4)
Requirement already satisfied: typing-extensions>=4.6.1 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.8.0)
Requirement already satisfied: pydantic-core==2.14.5 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.14.5)
Requirement already satisfied: annotated-types>=0.4.0 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.6.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.26.9)
Requirement already satisfied: idna<4,>=2.5 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2021.10.8)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in c:\users\colto\anaconda3\lib\site-packages (from thinc<8.3.0,>=8.1.8->spacy) (0.7.11)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in c:\users\colto\anaconda3\lib\site-packages (from thinc<8.3.0,>=8.1.8->spacy) (0.1.4)
Requirement already satisfied: colorama in c:\users\colto\anaconda3\lib\site-packages (from tqdm<5.0.0,>=4.38.0->spacy) (0.4.6)
Requirement already satisfied: click<9.0.0,>=7.1.1 in c:\users\colto\anaconda3\lib\site-packages (from typer<0.10.0,>=0.3.0->spacy) (8.0.4)
Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in c:\users\colto\anaconda3\lib\site-packages (from weasel<0.4.0,>=0.1.0->spacy) (0.16.0)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\colto\anaconda3\lib\site-packages (from jinja2->spacy) (2.0.1)


import spacy
from spacy import displacy # visualization based on spaCy
from collections import Counter # counting the results
import pandas as pd # dealing with dataframe


# Download the English-language model
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
Requirement already satisfied: spacy<3.8.0,>=3.7.2 in c:\users\colto\anaconda3\lib\site-packages (from en-core-web-sm==3.7.1) (3.7.2)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.5.2)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.9)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.8)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3.0)
Requirement already satisfied: weasel<0.4.0,>=0.1.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.3.4)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.10)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.4.8)
Requirement already satisfied: packaging>=20.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (21.3)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.1.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.27.1)
Requirement already satisfied: thinc<8.3.0,>=8.1.8 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.2.1)
Requirement already satisfied: numpy>=1.19.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.21.5)
Requirement already satisfied: jinja2 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.11.3)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.12)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.64.0)
Requirement already satisfied: setuptools in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (61.2.0)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (6.4.0)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.10)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.9.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\colto\anaconda3\lib\site-packages (from packaging>=20.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.4)
Requirement already satisfied: pydantic-core==2.14.5 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.14.5)
Requirement already satisfied: annotated-types>=0.4.0 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.6.0)
Requirement already satisfied: typing-extensions>=4.6.1 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.8.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.26.9)
Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.4)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in c:\users\colto\anaconda3\lib\site-packages (from thinc<8.3.0,>=8.1.8->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.1.4)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in c:\users\colto\anaconda3\lib\site-packages (from thinc<8.3.0,>=8.1.8->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.7.11)
Requirement already satisfied: colorama in c:\users\colto\anaconda3\lib\site-packages (from tqdm<5.0.0,>=4.38.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.4.6)
Requirement already satisfied: click<9.0.0,>=7.1.1 in c:\users\colto\anaconda3\lib\site-packages (from typer<0.10.0,>=0.3.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.0.4)
Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in c:\users\colto\anaconda3\lib\site-packages (from weasel<0.4.0,>=0.1.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.16.0)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\colto\anaconda3\lib\site-packages (from jinja2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.1)
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


# Load the English language model
import en_core_web_sm
nlp = en_core_web_sm.load()


# Load the dataset
movie = pd.read_csv('Data/movie_dialogue.csv')
movie


# Let's see the lines of the 1st character in the 1st movie
print(movie['lines'][0])

They do not! / I hope so. / Let's go. / Okay -- you're gonna need to learn how to lie. / I'm kidding.  You know how sometimes you just become this "persona"?  And you don't know how to quit? / Like my fear of wearing pastels? / What good stuff? / Me.  This endless ...blonde babble. I'm like, boring myself. / do you listen to this crap? / Then Guillermo says, "If you go any lighter, you're gonna look like an extra on 90210." / But / Well, no... / I was? / Tons / You know Chastity? / Hi. / Who knows?  All I've ever heard her say is that she'd dip before dating a guy that smokes. / Lesbian?  No. I found a picture of Jared Leto in one of her drawers, so I'm pretty sure she's not harboring same-sex tendencies. / I really, really, really wanna go, but I can't.  Not unless my sister goes. / Eber's Deep Conditioner every two days. And I never, ever use a blowdryer without the diffuser attachment. / You're sweet. / I counted on you to help my cause. You and that thug are obviously failing. Aren't we ever going on our date? / Where? / How is our little Find the Wench A Date plan progressing? / Forget French. / I don't want to know how to say that though.  I want to know useful things. Like where the good stores are.  How much does champagne cost?  Stuff like Chat.  I have never in my life had to point out my head to someone. / C'esc ma tete. This is my head / Gosh, if only we could find Kat a boyfriend... / Unsolved mystery.  She used to be really popular when she started high school, then it was just like she got sick of it or something. / The thing is, Cameron -- I'm at the mercy of a particularly hideous breed of loser.  My sister.  I can't date until she does. / No, no, it's my fault -- we didn't have a proper introduction --- / You're asking me out.  That's so cute. What's your name again? / Not the hacking and gagging and spitting part.  Please. / Can we make this quick?  Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad.  Again. / I did. / I have to be home in twenty minutes. / Sometimes I wonder if the guys we're supposed to want to go out with are the ones we actually want to go out with, you know? / Combination.  I don't know -- I thought he'd be different.  More of a gentleman... / He practically proposed when he found out we had the same dermatologist. I mean. Dr. Bonchowski is great an all, but he's not exactly relevant party conversation. / Would you mind getting me a drink, Cameron? / Joey. / Where did he go?  He was just here. / You might wanna think about it / Did you change your hair? / You know the deal.  I can ' t go if Kat doesn't go -- / Hi, Joey. / Neat... / Queen Harry? / Hopefully. / Expensive? / Patrick -- is that- a. / Is that woman a complete fruit-loop or is it just me? / No! I just wanted / I just wanted -- / Let go! / You looked beautiful last night, you know. / I guess I'll never know, will I? / God, you're just like him! Just keep me locked away in the dark, so I can't experience anything for myself / I'm not stupid enough to repeat your mistakes. / No. you didn't!  If you really thought I could make my own decisions, you would've let me go out with him instead of helping Daddy hold me hostage. / Why didn't you tell me? / But / You did what? / As in... / But you hate Joey / Why? / What? / I wish I had that luxury. I'm the only sophomore that got asked to the prom and I can't go, because you won ' t. / Like you care. / I don't get you.  You act like you're too good for any of this, and then you go totally apeshit when you get here. / I really don't think I need any social advice from you right now. / You are so completely unbalanced. / Yeah, he's your freak friend Mandella's boyfriend.  I guess since I'm not allowed to go out, I should obsess over a dead guy, too. / Like I'm supposed to know what that even means. / Can't you forget for just one night that you're completely wretched? / Bogey Lowenstein's party is normal, but you're too busy listening to Bitches Who Need Prozac to know that. / You're ruining my life'  Because you won't be normal, I can't be normal. / I think you're a freak.  I think you do this to torture me.  And I think you suck. / Oh, I thought you might have a date  I don't know why I'm bothering to ask, but are you going to Bogey Lowenstein's party Saturday night? / Oh my God, does this mean you're becoming normal? / Can you at least start wearing a bra? / Nowhere... Hi, Daddy. / I have a date, Daddy.  And he ' s not a captain of oppression like some men we know. / Fine.  I see that I'm a prisoner in my own house.  I'm not a daughter. I'm a possession! / He's not a "hot rod".  Whatever that is. / No, but / Daddy, I want to discuss the prom with you. It's tomorrow night -- / Why? / Daddy, no! / It's just a party. Daddy. / Daddy, people expect me to be there! / It's just a party. Daddy, but I knew you'd forbid me to go since "Gloria Steinem" over there isn't going -- / If you must know, we were attempting to go to a small study group of friends. / Daddy, I -- / But she doesn't want to date. / But it's not fair -- she's a mutant, Daddy! / What if she never starts dating? / Now don't get upset. Daddy, but there's this boy... and I think he might ask...


# Visualize the name entity spaCy recognizes (note that the results are not perfect)
document = nlp(movie['lines'][0])
displacy.render(document, style="ent")


# Count and sort the number of characters with lines of each movie
movie.groupby('mname')['mid'].count().sort_values(ascending=False)

mname
magnolia                          18
lone star                         16
the anniversary party             12
nixon                             12
grand hotel                       11
                                  ..
dark city                          1
quantum project                    1
the nightmare before christmas     1
metropolis                         1
predator                           1
Name: mid, Length: 600, dtype: int64


# Count the number of entities in the lines of the characters in the movie magnolia
ent_types = dict() # initialize a dictionary
for line in movie[movie['mname']=='magnolia']['lines']: # loop through the lines in "magnolia"
    doc = nlp(line)
    for entity in doc.ents: # for each character, loop through all the entities
        label = entity.label_ # get their labels
        if label not in ent_types: # make sure there's a key for this label in the dictionary
            ent_types[label] = Counter() # each label key points to a Counter for examples
        text = entity.text
        ent_types[label][text] += 1 # count the number of times we see each example


# Count of each type of entities
for etype, examples in ent_types.items():
    print(etype, len(examples))

PERSON 73
GPE 12
ORG 31
TIME 11
CARDINAL 19
DATE 27
NORP 2
WORK_OF_ART 8
ORDINAL 1
QUANTITY 3
LAW 1
FAC 2


# Explain the entity type "PERSON"
spacy.explain('PERSON')

'People, including fictional'


# Get all people
people = []
for line in movie[movie['mname']=='magnolia']['lines']:
    doc = nlp(line)
    for named_entity in doc.ents:
        if named_entity.label_ == "PERSON": # we only want the labels with a type "PERSON"
            people.append(named_entity.text)
people_count = Counter(people)
# sort the people names by their occurences, and convert the results into a dataframe
people_magnolia = pd.DataFrame(people_count.most_common(), columns=['character', 'count'])
people_magnolia


# Explain a certain type of entity "TIME" and "DATE"
print("TIME:", spacy.explain('TIME'))
print("DATE:", spacy.explain('DATE'))

TIME: Times smaller than a day
DATE: Absolute or relative dates or periods


# Get entities related to time and their numbers in the movie magnolia and convert the sorted results to a dataframe.
# Like what we did for "people", first create an empty list, and then loop through each line in the movie
# and use "nlp" to convert the line. Next, loop through each entity of the line, and select the entity based on labels
# "TIME" or "DATE", and append the result to the empity list. Finally, use Counter to count the results,
# create a dataframe and sort them using "most_common."

time = []
for line in movie[movie['mname']=='magnolia']['lines']:
    doc = nlp(line)
    for named_entity in doc.ents:
        if named_entity.label_ == "TIME" or named_entity.label_ == "DATE":
            time.append(named_entity.text)
time_count = Counter(time)
time_magnolia = pd.DataFrame(time_count.most_common(), columns=['character', 'count'])
print(time_magnolia)

                  character  count
0                     today      6
1                   tonight      4
2                last night      3
3                four years      2
4               ten o'clock      2
5                   an hour      2
6                  a minute      1
7             Eight o'clock      1
8       About two years ago      1
9                  tomorrow      1
10             seven-eleven      1
11               age twelve      1
12            age seventeen      1
13       thirty eight years      1
14             12 years old      1
15                      '84      1
16               one minute      1
17               five years      1
18                     1980      1
19          three years ago      1
20              Ten o'clock      1
21                    night      1
22           two and a half      1
23                    a day      1
24        about three years      1
25             thirty years      1
26                  ' years      1
27         twenty years old      1
28      Fifteen minutes ago      1
29            ten years ago      1
30             this morning      1
31  Two years...three years      1
32          fifty years old      1
33                 two days      1
34    About three weeks ago      1
35               six months      1
36                   ' days      1
37          ten minutes ago      1


# Get the total word count of all lines of movies released in the year of 1960
movie_1960_df = movie[(movie['year']==1960)]
movie_1960_wordcount = movie_1960_df["wordcount"].sum()
movie_1960_wordcount

19535


# Get the total word count of all lines of movies released in the year of 2009
movie_2009_df = movie[(movie['year']==2009)]
movie_2009_wordcount = movie_2009_df["wordcount"].sum()
movie_2009_wordcount

15858


# Calculate the number of entities related to time in movies released in 1960 and 2009, and divide by respective word counts.
# First, initialize a variable "time_1960_count" and set it as 0. Then like what we did before, loop through each line in 
# movie_1960_df['lines'], and use "nlp" to convert the line. Next, loop through each entity of the line, 
# and if the entity has a label of "TIME" or "DATE", add 1 to time_1960_count. Finally, print time_1960_count,
# divided by movie_1960_wordcount. And then repeat the process for 2009 movies.

time_1960_count = 0
for line in movie[movie['year']==1960]['lines']:
    doc = nlp(line)
    for named_entity in doc.ents:
        if named_entity.label_ == "TIME" or named_entity.label_ == "DATE":
            time_1960_count += 1 
print('1960:',time_1960_count/movie_1960_wordcount)

time_2009_count = 0
for line in movie[movie['year']==2009]['lines']:
    doc = nlp(line)
    for named_entity in doc.ents:
        if named_entity.label_ == "TIME" or named_entity.label_ == "DATE":
            time_2009_count += 1 
print('2009:', time_2009_count/movie_2009_wordcount)

1960: 0.008395188123880215
2009: 0.004540295119182747


# Get the POS tagging of a sample text
sample = """Or set upon a golden bough to sing to lords and ladies of Byzantium of what is past, or passing, or to come."""
# This is an excerpt from "Sailing to Byzantium" by the Irish poet W. B. Yeats
document = nlp(sample)
options = {"compact": True, "distance": 90, "color": "yellow", "bg": "black", "font": "Gill Sans"}
displacy.render(document, style="dep", options=options) # visualize it


# Get part-of-speech tags
for token in document:
    print(token.text, token.pos_, token.dep_) # pos_ means part-of-speech tags, and dep_ means dependency

Or CCONJ cc
set VERB ROOT
upon SCONJ prep
a DET det
golden ADJ amod
bough NOUN pobj
to PART aux
sing VERB advcl
to ADP prep
lords NOUN pobj
and CCONJ cc
ladies NOUN conj
of ADP prep
Byzantium PROPN pobj
of ADP prep
what PRON nsubj
is AUX pcomp
past ADJ acomp
, PUNCT punct
or CCONJ cc
passing VERB conj
, PUNCT punct
or CCONJ cc
to PART aux
come VERB conj
. PUNCT punct


# Get verbs from the movie "magnolia" in the movie dialogue dataset

verbs = []
for line in movie[movie['mname']=='magnolia']['lines']:
    doc = nlp(line)
    for token in doc: # loop through the token, instead of entities
        if token.pos_ == 'VERB': # we only want the tokens with a POS tagging of "VERB"
            verbs.append(token.text)
verbs_count = Counter(verbs)
# sort the verb by their occurences, and convert the results into a dataframe
verbs_magnolia = pd.DataFrame(verbs_count.most_common(), columns=['verb', 'count'])
verbs_magnolia


# Get the top 10 adjectives by count in movies released in 1960 and 2009.
# Like what we did for the movie "magnolia", first create an empty list of adjectives, and then loop through each line in 
# movie_1960_df['lines'], and use "nlp" to convert the line. Next, loop through each token of the line, and select the entity 
# based on POS tagging "ADJ", and append the result to the empity list. Finally, use Counter to count the results,
# create a dataframe and sort them using "most_common," and print the top 10 adjectives.
# And then repeat the process for 2009 movies.

adjectives = []
for line in movie[movie['year']==1960]['lines']:
    doc = nlp(line)
    for token in doc:
        if token.pos_ == 'ADJ':
            adjectives.append(token.text)
adj_count = Counter(adjectives)
adj_1960 = pd.DataFrame(adj_count.most_common(), columns=['adjectives', 'count'])
print('1960', adj_1960[:10])

adjectives2 = []
for line in movie[movie['year']==2009]['lines']:
    doc = nlp(line)
    for token in doc:
        if token.pos_ == 'ADJ':
            adjectives2.append(token.text)
adj2_count = Counter(adjectives2)
adj_2009 = pd.DataFrame(adj2_count.most_common(), columns=['adjectives', 'count'])
print('2009', adj_2009[:10])

1960   adjectives  count
0     little     35
1      right     26
2      sorry     23
3      wrong     23
4       good     20
5      other     19
6        old     18
7       Good     17
8       sure     17
9       wise     16
2009   adjectives  count
0       more     22
1       dead     19
2       good     19
3     little     17
4      other     15
5      right     15
6      ready     12
7       Good     11
8        bad     11
9       next     10


import re # regular expression
from IPython.display import Markdown, display # for visualization


# Visualize the data from the movie Casablanca
movie_casablanca = movie[movie['mname']=='casablanca'].reset_index(drop=True)
for line in movie_casablanca['lines']:
    displacy.render(nlp(line), style="ent")


# Define a function to find keywords in its context
def find_sentences_with_keyword(keyword, document): 
    # loop through all the sentences in the document and pull out the text of each sentence
    for sentence in document.sents:
        sentence = sentence.text        
        # check to see if the keyword is in the sentence (and ignore capitalization by making both lowercase)
        if keyword.lower() in sentence.lower():            
            # use regular expression to replace linebreaks and to make the keyword bolded, again ignoring capitalization
            sentence = re.sub('\n', ' ', sentence)
            sentence = re.sub(f"{keyword}", f"**{keyword}**", sentence, flags=re.IGNORECASE)            
            display(Markdown(sentence))


# Highlight the name of the protagonist Rick in its context of the lines of the second character of the movie
find_sentences_with_keyword(keyword="Rick", document=nlp(movie_casablanca['lines'][1]))


# Create a list of tokens and POS labels from document if the token is a word 
tokens_and_labels = [(token.text, token.pos_) for token in nlp(movie_casablanca['lines'][1]) if token.is_alpha]
tokens_and_labels

[('I', 'PRON'),
 ('shall', 'AUX'),
 ('remember', 'VERB'),
 ('to', 'PART'),
 ('pay', 'VERB'),
 ('it', 'PRON'),
 ('to', 'ADP'),
 ('myself', 'PRON'),
 ('Of', 'ADV'),
 ('course', 'ADV'),
 ('they', 'PRON'),
 ('stay', 'VERB'),
 ('Rick', 'PROPN'),
 ('would', 'AUX'),
 ('be', 'AUX'),
 ('Rick', 'PROPN'),
 ('without', 'ADP'),
 ('them', 'PRON'),
 ('Hmmm', 'PROPN'),
 ('I', 'PRON'),
 ('happen', 'VERB'),
 ('to', 'PART'),
 ('know', 'VERB'),
 ('that', 'SCONJ'),
 ('he', 'PRON'),
 ('gets', 'VERB'),
 ('ten', 'NUM'),
 ('percent', 'NOUN'),
 ('But', 'CCONJ'),
 ('he', 'PRON'),
 ('worth', 'ADJ'),
 ('five', 'NUM'),
 ('Ah', 'INTJ'),
 ('to', 'PART'),
 ('get', 'VERB'),
 ('out', 'ADP'),
 ('of', 'ADP'),
 ('Casablanca', 'PROPN'),
 ('and', 'CCONJ'),
 ('go', 'VERB'),
 ('to', 'ADP'),
 ('America', 'PROPN'),
 ('You', 'PRON'),
 ('a', 'DET'),
 ('lucky', 'ADJ'),
 ('man', 'NOUN'),
 ('Shall', 'AUX'),
 ('we', 'PRON'),
 ('draw', 'VERB'),
 ('up', 'ADP'),
 ('the', 'DET'),
 ('papers', 'NOUN'),
 ('or', 'CCONJ'),
 ('is', 'AUX'),
 ('our', 'PRON'),
 ('handshake', 'NOUN'),
 ('good', 'ADJ'),
 ('enough', 'ADV'),
 ('Rick', 'PROPN'),
 ('do', 'AUX'),
 ('be', 'AUX'),
 ('a', 'DET'),
 ('fool', 'NOUN'),
 ('Take', 'VERB'),
 ('me', 'PRON'),
 ('into', 'ADP'),
 ('your', 'PRON'),
 ('confidence', 'NOUN'),
 ('You', 'PRON'),
 ('need', 'VERB'),
 ('a', 'DET'),
 ('partner', 'NOUN'),
 ('Rick', 'PROPN'),
 ('I', 'PRON'),
 ('put', 'VERB'),
 ('my', 'PRON'),
 ('cards', 'NOUN'),
 ('on', 'ADP'),
 ('the', 'DET'),
 ('table', 'NOUN'),
 ('I', 'PRON'),
 ('think', 'VERB'),
 ('you', 'PRON'),
 ('know', 'VERB'),
 ('where', 'SCONJ'),
 ('those', 'DET'),
 ('letters', 'NOUN'),
 ('are', 'AUX'),
 ('Naturally', 'ADV'),
 ('there', 'PRON'),
 ('will', 'AUX'),
 ('be', 'AUX'),
 ('a', 'DET'),
 ('few', 'ADJ'),
 ('incidental', 'ADJ'),
 ('expenses', 'NOUN'),
 ('That', 'PRON'),
 ('is', 'AUX'),
 ('the', 'DET'),
 ('proposition', 'NOUN'),
 ('I', 'PRON'),
 ('have', 'VERB'),
 ('for', 'ADP'),
 ('whoever', 'PRON'),
 ('has', 'AUX'),
 ('those', 'DET'),
 ('letters', 'NOUN'),
 ('I', 'PRON'),
 ('have', 'VERB'),
 ('a', 'DET'),
 ('proposition', 'NOUN'),
 ('for', 'ADP'),
 ('whoever', 'PRON'),
 ('has', 'AUX'),
 ('those', 'DET'),
 ('letters', 'NOUN'),
 ('I', 'PRON'),
 ('will', 'AUX'),
 ('handle', 'VERB'),
 ('the', 'DET'),
 ('entire', 'ADJ'),
 ('transaction', 'NOUN'),
 ('get', 'VERB'),
 ('rid', 'VERB'),
 ('of', 'ADP'),
 ('the', 'DET'),
 ('letters', 'NOUN'),
 ('take', 'VERB'),
 ('all', 'DET'),
 ('the', 'DET'),
 ('risk', 'NOUN'),
 ('for', 'ADP'),
 ('a', 'DET'),
 ('small', 'ADJ'),
 ('percentage', 'NOUN'),
 ('If', 'SCONJ'),
 ('I', 'PRON'),
 ('could', 'AUX'),
 ('lay', 'VERB'),
 ('my', 'PRON'),
 ('hands', 'NOUN'),
 ('on', 'ADP'),
 ('those', 'DET'),
 ('letters', 'NOUN'),
 ('I', 'PRON'),
 ('could', 'AUX'),
 ('make', 'VERB'),
 ('a', 'DET'),
 ('fortune', 'NOUN'),
 ('Of', 'ADV'),
 ('course', 'ADV'),
 ('not', 'PART'),
 ('What', 'PRON'),
 ('upsets', 'VERB'),
 ('me', 'PRON'),
 ('is', 'AUX'),
 ('the', 'DET'),
 ('fact', 'NOUN'),
 ('that', 'SCONJ'),
 ('Ugarte', 'PROPN'),
 ('is', 'AUX'),
 ('dead', 'ADJ'),
 ('and', 'CCONJ'),
 ('no', 'DET'),
 ('one', 'NOUN'),
 ('knows', 'VERB'),
 ('where', 'SCONJ'),
 ('those', 'DET'),
 ('letters', 'NOUN'),
 ('of', 'ADP'),
 ('transit', 'NOUN'),
 ('are', 'AUX'),
 ('The', 'DET'),
 ('bourbon', 'NOUN'),
 ('The', 'DET'),
 ('news', 'NOUN'),
 ('about', 'ADP'),
 ('Ugarte', 'PROPN'),
 ('upset', 'VERB'),
 ('me', 'PRON'),
 ('very', 'ADV'),
 ('much', 'ADV'),
 ('Carrying', 'VERB'),
 ('charges', 'NOUN'),
 ('my', 'PRON'),
 ('boy', 'NOUN'),
 ('carrying', 'VERB'),
 ('charges', 'NOUN'),
 ('Here', 'ADV'),
 ('sit', 'VERB'),
 ('down', 'ADP'),
 ('There', 'PRON'),
 ('something', 'PRON'),
 ('I', 'PRON'),
 ('want', 'VERB'),
 ('to', 'PART'),
 ('talk', 'VERB'),
 ('over', 'ADP'),
 ('with', 'ADP'),
 ('you', 'PRON'),
 ('anyhow', 'ADV'),
 ('No', 'DET'),
 ('hurry', 'NOUN'),
 ('I', 'PRON'),
 ('have', 'VERB'),
 ('it', 'PRON'),
 ('sent', 'VERB'),
 ('over', 'ADP'),
 ('Have', 'VERB'),
 ('a', 'DET'),
 ('drink', 'NOUN'),
 ('with', 'ADP'),
 ('me', 'PRON'),
 ('My', 'PRON'),
 ('dear', 'ADJ'),
 ('Rick', 'PROPN'),
 ('when', 'SCONJ'),
 ('will', 'AUX'),
 ('you', 'PRON'),
 ('realize', 'VERB'),
 ('that', 'SCONJ'),
 ('in', 'ADP'),
 ('this', 'DET'),
 ('world', 'NOUN'),
 ('today', 'NOUN'),
 ('isolationism', 'NOUN'),
 ('is', 'AUX'),
 ('no', 'ADV'),
 ('longer', 'ADV'),
 ('a', 'DET'),
 ('practical', 'ADJ'),
 ('policy', 'NOUN'),
 ('Suppose', 'VERB'),
 ('we', 'PRON'),
 ('ask', 'VERB'),
 ('Sam', 'PROPN'),
 ('Maybe', 'ADV'),
 ('he', 'PRON'),
 ('like', 'VERB'),
 ('to', 'PART'),
 ('make', 'VERB'),
 ('a', 'DET'),
 ('change', 'NOUN'),
 ('That', 'PRON'),
 ('too', 'ADV'),
 ('bad', 'ADJ'),
 ('That', 'PRON'),
 ('Casablanca', 'PROPN'),
 ('leading', 'VERB'),
 ('commodity', 'NOUN'),
 ('In', 'ADP'),
 ('refugees', 'NOUN'),
 ('alone', 'ADV'),
 ('we', 'PRON'),
 ('could', 'AUX'),
 ('make', 'VERB'),
 ('a', 'DET'),
 ('fortune', 'NOUN'),
 ('if', 'SCONJ'),
 ('you', 'PRON'),
 ('would', 'AUX'),
 ('work', 'VERB'),
 ('with', 'ADP'),
 ('me', 'PRON'),
 ('through', 'ADP'),
 ('the', 'DET'),
 ('black', 'ADJ'),
 ('market', 'NOUN'),
 ('What', 'PRON'),
 ('do', 'AUX'),
 ('you', 'PRON'),
 ('want', 'VERB'),
 ('for', 'ADP'),
 ('Sam', 'PROPN'),
 ('You', 'PRON'),
 ('have', 'AUX'),
 ('heard', 'VERB'),
 ('my', 'PRON'),
 ('offer', 'NOUN'),
 ('Fine', 'INTJ'),
 ('but', 'CCONJ'),
 ('I', 'PRON'),
 ('would', 'AUX'),
 ('like', 'VERB'),
 ('to', 'PART'),
 ('buy', 'VERB'),
 ('your', 'PRON'),
 ('cafe', 'NOUN'),
 ('Hello', 'PROPN'),
 ('Rick', 'PROPN'),
 ('It', 'PRON'),
 ('was', 'AUX'),
 ('gracious', 'ADJ'),
 ('of', 'ADP'),
 ('you', 'PRON'),
 ('to', 'PART'),
 ('share', 'VERB'),
 ('it', 'PRON'),
 ('with', 'ADP'),
 ('me', 'PRON'),
 ('Good', 'ADJ'),
 ('day', 'NOUN'),
 ('Mademoiselle', 'PROPN'),
 ('Monsieur', 'PROPN'),
 ('He', 'PRON'),
 ('is', 'AUX'),
 ('a', 'DET'),
 ('difficult', 'ADJ'),
 ('customer', 'NOUN'),
 ('that', 'SCONJ'),
 ('Rick', 'PROPN'),
 ('One', 'NUM'),
 ('never', 'ADV'),
 ('knows', 'VERB'),
 ('what', 'PRON'),
 ('he', 'PRON'),
 ('do', 'VERB'),
 ('or', 'CCONJ'),
 ('why', 'SCONJ'),
 ('But', 'CCONJ'),
 ('it', 'PRON'),
 ('is', 'AUX'),
 ('worth', 'ADJ'),
 ('a', 'DET'),
 ('chance', 'NOUN'),
 ('Not', 'PART'),
 ('for', 'ADP'),
 ('sure', 'ADJ'),
 ('Monsieur', 'PROPN'),
 ('but', 'CCONJ'),
 ('I', 'PRON'),
 ('will', 'AUX'),
 ('venture', 'VERB'),
 ('to', 'PART'),
 ('guess', 'VERB'),
 ('that', 'SCONJ'),
 ('Ugarte', 'PROPN'),
 ('left', 'VERB'),
 ('those', 'DET'),
 ('letters', 'NOUN'),
 ('with', 'ADP'),
 ('Monsieur', 'PROPN'),
 ('Rick', 'PROPN'),
 ('Those', 'DET'),
 ('letters', 'NOUN'),
 ('were', 'AUX'),
 ('not', 'PART'),
 ('found', 'VERB'),
 ('on', 'ADP'),
 ('Ugarte', 'PROPN'),
 ('when', 'SCONJ'),
 ('they', 'PRON'),
 ('arrested', 'VERB'),
 ('him', 'PRON'),
 ('I', 'PRON'),
 ('observe', 'VERB'),
 ('that', 'SCONJ'),
 ('you', 'PRON'),
 ('in', 'ADP'),
 ('one', 'NUM'),
 ('respect', 'NOUN'),
 ('are', 'AUX'),
 ('a', 'DET'),
 ('very', 'ADV'),
 ('fortunate', 'ADJ'),
 ('man', 'NOUN'),
 ('Monsieur', 'PROPN'),
 ('I', 'PRON'),
 ('am', 'AUX'),
 ('moved', 'VERB'),
 ('to', 'PART'),
 ('make', 'VERB'),
 ('one', 'NUM'),
 ('more', 'ADJ'),
 ('suggestion', 'NOUN'),
 ('why', 'SCONJ'),
 ('I', 'PRON'),
 ('do', 'AUX'),
 ('not', 'PART'),
 ('know', 'VERB'),
 ('because', 'SCONJ'),
 ('it', 'PRON'),
 ('can', 'AUX'),
 ('not', 'PART'),
 ('possibly', 'ADV'),
 ('profit', 'VERB'),
 ('me', 'PRON'),
 ('but', 'CCONJ'),
 ('have', 'AUX'),
 ('you', 'PRON'),
 ('heard', 'VERB'),
 ('about', 'ADP'),
 ('Signor', 'PROPN'),
 ('Ugarte', 'PROPN'),
 ('and', 'CCONJ'),
 ('the', 'DET'),
 ('letters', 'NOUN'),
 ('of', 'ADP'),
 ('transit', 'NOUN'),
 ('Well', 'INTJ'),
 ('good', 'ADJ'),
 ('luck', 'NOUN'),
 ('But', 'CCONJ'),
 ('be', 'AUX'),
 ('careful', 'ADJ'),
 ('You', 'PRON'),
 ('know', 'VERB'),
 ('you', 'PRON'),
 ('being', 'AUX'),
 ('shadowed', 'VERB'),
 ('We', 'PRON'),
 ('might', 'AUX'),
 ('as', 'ADV'),
 ('well', 'ADV'),
 ('be', 'AUX'),
 ('frank', 'ADJ'),
 ('Monsieur', 'PROPN'),
 ('It', 'PRON'),
 ('will', 'AUX'),
 ('take', 'VERB'),
 ('a', 'DET'),
 ('miracle', 'NOUN'),
 ('to', 'PART'),
 ('get', 'VERB'),
 ('you', 'PRON'),
 ('out', 'ADP'),
 ('of', 'ADP'),
 ('Casablanca', 'PROPN'),
 ('And', 'CCONJ'),
 ('the', 'DET'),
 ('Germans', 'PROPN'),
 ('have', 'AUX'),
 ('outlawed', 'VERB'),
 ('miracles', 'NOUN'),
 ('As', 'ADP'),
 ('leader', 'NOUN'),
 ('of', 'ADP'),
 ('all', 'DET'),
 ('illegal', 'ADJ'),
 ('activities', 'NOUN'),
 ('in', 'ADP'),
 ('Casablanca', 'PROPN'),
 ('I', 'PRON'),
 ('am', 'AUX'),
 ('an', 'DET'),
 ('influential', 'ADJ'),
 ('and', 'CCONJ'),
 ('respected', 'ADJ'),
 ('man', 'NOUN'),
 ('It', 'PRON'),
 ('would', 'AUX'),
 ('not', 'PART'),
 ('be', 'AUX'),
 ('worth', 'ADJ'),
 ('my', 'PRON'),
 ('life', 'NOUN'),
 ('to', 'PART'),
 ('do', 'VERB'),
 ('anything', 'PRON'),
 ('for', 'ADP'),
 ('Monsieur', 'PROPN'),
 ('Laszlo', 'PROPN'),
 ('You', 'PRON'),
 ('however', 'ADV'),
 ('are', 'AUX'),
 ('a', 'DET'),
 ('different', 'ADJ'),
 ('matter', 'NOUN')]


# Define a function to get all two-word combinations
def get_bigrams(word_list, number_consecutive_words=2):  
    ngrams = []
    adj_length_of_word_list = len(word_list) - (number_consecutive_words - 1)   
    # loop through numbers from 0 to the (slightly adjusted) length of your word list
    for word_index in range(adj_length_of_word_list):       
        # index the list at each number, grabbing the word at that number index as well as N number of words after it
        ngram = word_list[word_index : word_index + number_consecutive_words]        
        # append this word combo to the master list "ngrams"
        ngrams.append(ngram)        
    return ngrams


# Getting all bigram of the lines, including both the word and its POS label
bigrams = get_bigrams(tokens_and_labels)
bigrams

[[('I', 'PRON'), ('shall', 'AUX')],
 [('shall', 'AUX'), ('remember', 'VERB')],
 [('remember', 'VERB'), ('to', 'PART')],
 [('to', 'PART'), ('pay', 'VERB')],
 [('pay', 'VERB'), ('it', 'PRON')],
 [('it', 'PRON'), ('to', 'ADP')],
 [('to', 'ADP'), ('myself', 'PRON')],
 [('myself', 'PRON'), ('Of', 'ADV')],
 [('Of', 'ADV'), ('course', 'ADV')],
 [('course', 'ADV'), ('they', 'PRON')],
 [('they', 'PRON'), ('stay', 'VERB')],
 [('stay', 'VERB'), ('Rick', 'PROPN')],
 [('Rick', 'PROPN'), ('would', 'AUX')],
 [('would', 'AUX'), ('be', 'AUX')],
 [('be', 'AUX'), ('Rick', 'PROPN')],
 [('Rick', 'PROPN'), ('without', 'ADP')],
 [('without', 'ADP'), ('them', 'PRON')],
 [('them', 'PRON'), ('Hmmm', 'PROPN')],
 [('Hmmm', 'PROPN'), ('I', 'PRON')],
 [('I', 'PRON'), ('happen', 'VERB')],
 [('happen', 'VERB'), ('to', 'PART')],
 [('to', 'PART'), ('know', 'VERB')],
 [('know', 'VERB'), ('that', 'SCONJ')],
 [('that', 'SCONJ'), ('he', 'PRON')],
 [('he', 'PRON'), ('gets', 'VERB')],
 [('gets', 'VERB'), ('ten', 'NUM')],
 [('ten', 'NUM'), ('percent', 'NOUN')],
 [('percent', 'NOUN'), ('But', 'CCONJ')],
 [('But', 'CCONJ'), ('he', 'PRON')],
 [('he', 'PRON'), ('worth', 'ADJ')],
 [('worth', 'ADJ'), ('five', 'NUM')],
 [('five', 'NUM'), ('Ah', 'INTJ')],
 [('Ah', 'INTJ'), ('to', 'PART')],
 [('to', 'PART'), ('get', 'VERB')],
 [('get', 'VERB'), ('out', 'ADP')],
 [('out', 'ADP'), ('of', 'ADP')],
 [('of', 'ADP'), ('Casablanca', 'PROPN')],
 [('Casablanca', 'PROPN'), ('and', 'CCONJ')],
 [('and', 'CCONJ'), ('go', 'VERB')],
 [('go', 'VERB'), ('to', 'ADP')],
 [('to', 'ADP'), ('America', 'PROPN')],
 [('America', 'PROPN'), ('You', 'PRON')],
 [('You', 'PRON'), ('a', 'DET')],
 [('a', 'DET'), ('lucky', 'ADJ')],
 [('lucky', 'ADJ'), ('man', 'NOUN')],
 [('man', 'NOUN'), ('Shall', 'AUX')],
 [('Shall', 'AUX'), ('we', 'PRON')],
 [('we', 'PRON'), ('draw', 'VERB')],
 [('draw', 'VERB'), ('up', 'ADP')],
 [('up', 'ADP'), ('the', 'DET')],
 [('the', 'DET'), ('papers', 'NOUN')],
 [('papers', 'NOUN'), ('or', 'CCONJ')],
 [('or', 'CCONJ'), ('is', 'AUX')],
 [('is', 'AUX'), ('our', 'PRON')],
 [('our', 'PRON'), ('handshake', 'NOUN')],
 [('handshake', 'NOUN'), ('good', 'ADJ')],
 [('good', 'ADJ'), ('enough', 'ADV')],
 [('enough', 'ADV'), ('Rick', 'PROPN')],
 [('Rick', 'PROPN'), ('do', 'AUX')],
 [('do', 'AUX'), ('be', 'AUX')],
 [('be', 'AUX'), ('a', 'DET')],
 [('a', 'DET'), ('fool', 'NOUN')],
 [('fool', 'NOUN'), ('Take', 'VERB')],
 [('Take', 'VERB'), ('me', 'PRON')],
 [('me', 'PRON'), ('into', 'ADP')],
 [('into', 'ADP'), ('your', 'PRON')],
 [('your', 'PRON'), ('confidence', 'NOUN')],
 [('confidence', 'NOUN'), ('You', 'PRON')],
 [('You', 'PRON'), ('need', 'VERB')],
 [('need', 'VERB'), ('a', 'DET')],
 [('a', 'DET'), ('partner', 'NOUN')],
 [('partner', 'NOUN'), ('Rick', 'PROPN')],
 [('Rick', 'PROPN'), ('I', 'PRON')],
 [('I', 'PRON'), ('put', 'VERB')],
 [('put', 'VERB'), ('my', 'PRON')],
 [('my', 'PRON'), ('cards', 'NOUN')],
 [('cards', 'NOUN'), ('on', 'ADP')],
 [('on', 'ADP'), ('the', 'DET')],
 [('the', 'DET'), ('table', 'NOUN')],
 [('table', 'NOUN'), ('I', 'PRON')],
 [('I', 'PRON'), ('think', 'VERB')],
 [('think', 'VERB'), ('you', 'PRON')],
 [('you', 'PRON'), ('know', 'VERB')],
 [('know', 'VERB'), ('where', 'SCONJ')],
 [('where', 'SCONJ'), ('those', 'DET')],
 [('those', 'DET'), ('letters', 'NOUN')],
 [('letters', 'NOUN'), ('are', 'AUX')],
 [('are', 'AUX'), ('Naturally', 'ADV')],
 [('Naturally', 'ADV'), ('there', 'PRON')],
 [('there', 'PRON'), ('will', 'AUX')],
 [('will', 'AUX'), ('be', 'AUX')],
 [('be', 'AUX'), ('a', 'DET')],
 [('a', 'DET'), ('few', 'ADJ')],
 [('few', 'ADJ'), ('incidental', 'ADJ')],
 [('incidental', 'ADJ'), ('expenses', 'NOUN')],
 [('expenses', 'NOUN'), ('That', 'PRON')],
 [('That', 'PRON'), ('is', 'AUX')],
 [('is', 'AUX'), ('the', 'DET')],
 [('the', 'DET'), ('proposition', 'NOUN')],
 [('proposition', 'NOUN'), ('I', 'PRON')],
 [('I', 'PRON'), ('have', 'VERB')],
 [('have', 'VERB'), ('for', 'ADP')],
 [('for', 'ADP'), ('whoever', 'PRON')],
 [('whoever', 'PRON'), ('has', 'AUX')],
 [('has', 'AUX'), ('those', 'DET')],
 [('those', 'DET'), ('letters', 'NOUN')],
 [('letters', 'NOUN'), ('I', 'PRON')],
 [('I', 'PRON'), ('have', 'VERB')],
 [('have', 'VERB'), ('a', 'DET')],
 [('a', 'DET'), ('proposition', 'NOUN')],
 [('proposition', 'NOUN'), ('for', 'ADP')],
 [('for', 'ADP'), ('whoever', 'PRON')],
 [('whoever', 'PRON'), ('has', 'AUX')],
 [('has', 'AUX'), ('those', 'DET')],
 [('those', 'DET'), ('letters', 'NOUN')],
 [('letters', 'NOUN'), ('I', 'PRON')],
 [('I', 'PRON'), ('will', 'AUX')],
 [('will', 'AUX'), ('handle', 'VERB')],
 [('handle', 'VERB'), ('the', 'DET')],
 [('the', 'DET'), ('entire', 'ADJ')],
 [('entire', 'ADJ'), ('transaction', 'NOUN')],
 [('transaction', 'NOUN'), ('get', 'VERB')],
 [('get', 'VERB'), ('rid', 'VERB')],
 [('rid', 'VERB'), ('of', 'ADP')],
 [('of', 'ADP'), ('the', 'DET')],
 [('the', 'DET'), ('letters', 'NOUN')],
 [('letters', 'NOUN'), ('take', 'VERB')],
 [('take', 'VERB'), ('all', 'DET')],
 [('all', 'DET'), ('the', 'DET')],
 [('the', 'DET'), ('risk', 'NOUN')],
 [('risk', 'NOUN'), ('for', 'ADP')],
 [('for', 'ADP'), ('a', 'DET')],
 [('a', 'DET'), ('small', 'ADJ')],
 [('small', 'ADJ'), ('percentage', 'NOUN')],
 [('percentage', 'NOUN'), ('If', 'SCONJ')],
 [('If', 'SCONJ'), ('I', 'PRON')],
 [('I', 'PRON'), ('could', 'AUX')],
 [('could', 'AUX'), ('lay', 'VERB')],
 [('lay', 'VERB'), ('my', 'PRON')],
 [('my', 'PRON'), ('hands', 'NOUN')],
 [('hands', 'NOUN'), ('on', 'ADP')],
 [('on', 'ADP'), ('those', 'DET')],
 [('those', 'DET'), ('letters', 'NOUN')],
 [('letters', 'NOUN'), ('I', 'PRON')],
 [('I', 'PRON'), ('could', 'AUX')],
 [('could', 'AUX'), ('make', 'VERB')],
 [('make', 'VERB'), ('a', 'DET')],
 [('a', 'DET'), ('fortune', 'NOUN')],
 [('fortune', 'NOUN'), ('Of', 'ADV')],
 [('Of', 'ADV'), ('course', 'ADV')],
 [('course', 'ADV'), ('not', 'PART')],
 [('not', 'PART'), ('What', 'PRON')],
 [('What', 'PRON'), ('upsets', 'VERB')],
 [('upsets', 'VERB'), ('me', 'PRON')],
 [('me', 'PRON'), ('is', 'AUX')],
 [('is', 'AUX'), ('the', 'DET')],
 [('the', 'DET'), ('fact', 'NOUN')],
 [('fact', 'NOUN'), ('that', 'SCONJ')],
 [('that', 'SCONJ'), ('Ugarte', 'PROPN')],
 [('Ugarte', 'PROPN'), ('is', 'AUX')],
 [('is', 'AUX'), ('dead', 'ADJ')],
 [('dead', 'ADJ'), ('and', 'CCONJ')],
 [('and', 'CCONJ'), ('no', 'DET')],
 [('no', 'DET'), ('one', 'NOUN')],
 [('one', 'NOUN'), ('knows', 'VERB')],
 [('knows', 'VERB'), ('where', 'SCONJ')],
 [('where', 'SCONJ'), ('those', 'DET')],
 [('those', 'DET'), ('letters', 'NOUN')],
 [('letters', 'NOUN'), ('of', 'ADP')],
 [('of', 'ADP'), ('transit', 'NOUN')],
 [('transit', 'NOUN'), ('are', 'AUX')],
 [('are', 'AUX'), ('The', 'DET')],
 [('The', 'DET'), ('bourbon', 'NOUN')],
 [('bourbon', 'NOUN'), ('The', 'DET')],
 [('The', 'DET'), ('news', 'NOUN')],
 [('news', 'NOUN'), ('about', 'ADP')],
 [('about', 'ADP'), ('Ugarte', 'PROPN')],
 [('Ugarte', 'PROPN'), ('upset', 'VERB')],
 [('upset', 'VERB'), ('me', 'PRON')],
 [('me', 'PRON'), ('very', 'ADV')],
 [('very', 'ADV'), ('much', 'ADV')],
 [('much', 'ADV'), ('Carrying', 'VERB')],
 [('Carrying', 'VERB'), ('charges', 'NOUN')],
 [('charges', 'NOUN'), ('my', 'PRON')],
 [('my', 'PRON'), ('boy', 'NOUN')],
 [('boy', 'NOUN'), ('carrying', 'VERB')],
 [('carrying', 'VERB'), ('charges', 'NOUN')],
 [('charges', 'NOUN'), ('Here', 'ADV')],
 [('Here', 'ADV'), ('sit', 'VERB')],
 [('sit', 'VERB'), ('down', 'ADP')],
 [('down', 'ADP'), ('There', 'PRON')],
 [('There', 'PRON'), ('something', 'PRON')],
 [('something', 'PRON'), ('I', 'PRON')],
 [('I', 'PRON'), ('want', 'VERB')],
 [('want', 'VERB'), ('to', 'PART')],
 [('to', 'PART'), ('talk', 'VERB')],
 [('talk', 'VERB'), ('over', 'ADP')],
 [('over', 'ADP'), ('with', 'ADP')],
 [('with', 'ADP'), ('you', 'PRON')],
 [('you', 'PRON'), ('anyhow', 'ADV')],
 [('anyhow', 'ADV'), ('No', 'DET')],
 [('No', 'DET'), ('hurry', 'NOUN')],
 [('hurry', 'NOUN'), ('I', 'PRON')],
 [('I', 'PRON'), ('have', 'VERB')],
 [('have', 'VERB'), ('it', 'PRON')],
 [('it', 'PRON'), ('sent', 'VERB')],
 [('sent', 'VERB'), ('over', 'ADP')],
 [('over', 'ADP'), ('Have', 'VERB')],
 [('Have', 'VERB'), ('a', 'DET')],
 [('a', 'DET'), ('drink', 'NOUN')],
 [('drink', 'NOUN'), ('with', 'ADP')],
 [('with', 'ADP'), ('me', 'PRON')],
 [('me', 'PRON'), ('My', 'PRON')],
 [('My', 'PRON'), ('dear', 'ADJ')],
 [('dear', 'ADJ'), ('Rick', 'PROPN')],
 [('Rick', 'PROPN'), ('when', 'SCONJ')],
 [('when', 'SCONJ'), ('will', 'AUX')],
 [('will', 'AUX'), ('you', 'PRON')],
 [('you', 'PRON'), ('realize', 'VERB')],
 [('realize', 'VERB'), ('that', 'SCONJ')],
 [('that', 'SCONJ'), ('in', 'ADP')],
 [('in', 'ADP'), ('this', 'DET')],
 [('this', 'DET'), ('world', 'NOUN')],
 [('world', 'NOUN'), ('today', 'NOUN')],
 [('today', 'NOUN'), ('isolationism', 'NOUN')],
 [('isolationism', 'NOUN'), ('is', 'AUX')],
 [('is', 'AUX'), ('no', 'ADV')],
 [('no', 'ADV'), ('longer', 'ADV')],
 [('longer', 'ADV'), ('a', 'DET')],
 [('a', 'DET'), ('practical', 'ADJ')],
 [('practical', 'ADJ'), ('policy', 'NOUN')],
 [('policy', 'NOUN'), ('Suppose', 'VERB')],
 [('Suppose', 'VERB'), ('we', 'PRON')],
 [('we', 'PRON'), ('ask', 'VERB')],
 [('ask', 'VERB'), ('Sam', 'PROPN')],
 [('Sam', 'PROPN'), ('Maybe', 'ADV')],
 [('Maybe', 'ADV'), ('he', 'PRON')],
 [('he', 'PRON'), ('like', 'VERB')],
 [('like', 'VERB'), ('to', 'PART')],
 [('to', 'PART'), ('make', 'VERB')],
 [('make', 'VERB'), ('a', 'DET')],
 [('a', 'DET'), ('change', 'NOUN')],
 [('change', 'NOUN'), ('That', 'PRON')],
 [('That', 'PRON'), ('too', 'ADV')],
 [('too', 'ADV'), ('bad', 'ADJ')],
 [('bad', 'ADJ'), ('That', 'PRON')],
 [('That', 'PRON'), ('Casablanca', 'PROPN')],
 [('Casablanca', 'PROPN'), ('leading', 'VERB')],
 [('leading', 'VERB'), ('commodity', 'NOUN')],
 [('commodity', 'NOUN'), ('In', 'ADP')],
 [('In', 'ADP'), ('refugees', 'NOUN')],
 [('refugees', 'NOUN'), ('alone', 'ADV')],
 [('alone', 'ADV'), ('we', 'PRON')],
 [('we', 'PRON'), ('could', 'AUX')],
 [('could', 'AUX'), ('make', 'VERB')],
 [('make', 'VERB'), ('a', 'DET')],
 [('a', 'DET'), ('fortune', 'NOUN')],
 [('fortune', 'NOUN'), ('if', 'SCONJ')],
 [('if', 'SCONJ'), ('you', 'PRON')],
 [('you', 'PRON'), ('would', 'AUX')],
 [('would', 'AUX'), ('work', 'VERB')],
 [('work', 'VERB'), ('with', 'ADP')],
 [('with', 'ADP'), ('me', 'PRON')],
 [('me', 'PRON'), ('through', 'ADP')],
 [('through', 'ADP'), ('the', 'DET')],
 [('the', 'DET'), ('black', 'ADJ')],
 [('black', 'ADJ'), ('market', 'NOUN')],
 [('market', 'NOUN'), ('What', 'PRON')],
 [('What', 'PRON'), ('do', 'AUX')],
 [('do', 'AUX'), ('you', 'PRON')],
 [('you', 'PRON'), ('want', 'VERB')],
 [('want', 'VERB'), ('for', 'ADP')],
 [('for', 'ADP'), ('Sam', 'PROPN')],
 [('Sam', 'PROPN'), ('You', 'PRON')],
 [('You', 'PRON'), ('have', 'AUX')],
 [('have', 'AUX'), ('heard', 'VERB')],
 [('heard', 'VERB'), ('my', 'PRON')],
 [('my', 'PRON'), ('offer', 'NOUN')],
 [('offer', 'NOUN'), ('Fine', 'INTJ')],
 [('Fine', 'INTJ'), ('but', 'CCONJ')],
 [('but', 'CCONJ'), ('I', 'PRON')],
 [('I', 'PRON'), ('would', 'AUX')],
 [('would', 'AUX'), ('like', 'VERB')],
 [('like', 'VERB'), ('to', 'PART')],
 [('to', 'PART'), ('buy', 'VERB')],
 [('buy', 'VERB'), ('your', 'PRON')],
 [('your', 'PRON'), ('cafe', 'NOUN')],
 [('cafe', 'NOUN'), ('Hello', 'PROPN')],
 [('Hello', 'PROPN'), ('Rick', 'PROPN')],
 [('Rick', 'PROPN'), ('It', 'PRON')],
 [('It', 'PRON'), ('was', 'AUX')],
 [('was', 'AUX'), ('gracious', 'ADJ')],
 [('gracious', 'ADJ'), ('of', 'ADP')],
 [('of', 'ADP'), ('you', 'PRON')],
 [('you', 'PRON'), ('to', 'PART')],
 [('to', 'PART'), ('share', 'VERB')],
 [('share', 'VERB'), ('it', 'PRON')],
 [('it', 'PRON'), ('with', 'ADP')],
 [('with', 'ADP'), ('me', 'PRON')],
 [('me', 'PRON'), ('Good', 'ADJ')],
 [('Good', 'ADJ'), ('day', 'NOUN')],
 [('day', 'NOUN'), ('Mademoiselle', 'PROPN')],
 [('Mademoiselle', 'PROPN'), ('Monsieur', 'PROPN')],
 [('Monsieur', 'PROPN'), ('He', 'PRON')],
 [('He', 'PRON'), ('is', 'AUX')],
 [('is', 'AUX'), ('a', 'DET')],
 [('a', 'DET'), ('difficult', 'ADJ')],
 [('difficult', 'ADJ'), ('customer', 'NOUN')],
 [('customer', 'NOUN'), ('that', 'SCONJ')],
 [('that', 'SCONJ'), ('Rick', 'PROPN')],
 [('Rick', 'PROPN'), ('One', 'NUM')],
 [('One', 'NUM'), ('never', 'ADV')],
 [('never', 'ADV'), ('knows', 'VERB')],
 [('knows', 'VERB'), ('what', 'PRON')],
 [('what', 'PRON'), ('he', 'PRON')],
 [('he', 'PRON'), ('do', 'VERB')],
 [('do', 'VERB'), ('or', 'CCONJ')],
 [('or', 'CCONJ'), ('why', 'SCONJ')],
 [('why', 'SCONJ'), ('But', 'CCONJ')],
 [('But', 'CCONJ'), ('it', 'PRON')],
 [('it', 'PRON'), ('is', 'AUX')],
 [('is', 'AUX'), ('worth', 'ADJ')],
 [('worth', 'ADJ'), ('a', 'DET')],
 [('a', 'DET'), ('chance', 'NOUN')],
 [('chance', 'NOUN'), ('Not', 'PART')],
 [('Not', 'PART'), ('for', 'ADP')],
 [('for', 'ADP'), ('sure', 'ADJ')],
 [('sure', 'ADJ'), ('Monsieur', 'PROPN')],
 [('Monsieur', 'PROPN'), ('but', 'CCONJ')],
 [('but', 'CCONJ'), ('I', 'PRON')],
 [('I', 'PRON'), ('will', 'AUX')],
 [('will', 'AUX'), ('venture', 'VERB')],
 [('venture', 'VERB'), ('to', 'PART')],
 [('to', 'PART'), ('guess', 'VERB')],
 [('guess', 'VERB'), ('that', 'SCONJ')],
 [('that', 'SCONJ'), ('Ugarte', 'PROPN')],
 [('Ugarte', 'PROPN'), ('left', 'VERB')],
 [('left', 'VERB'), ('those', 'DET')],
 [('those', 'DET'), ('letters', 'NOUN')],
 [('letters', 'NOUN'), ('with', 'ADP')],
 [('with', 'ADP'), ('Monsieur', 'PROPN')],
 [('Monsieur', 'PROPN'), ('Rick', 'PROPN')],
 [('Rick', 'PROPN'), ('Those', 'DET')],
 [('Those', 'DET'), ('letters', 'NOUN')],
 [('letters', 'NOUN'), ('were', 'AUX')],
 [('were', 'AUX'), ('not', 'PART')],
 [('not', 'PART'), ('found', 'VERB')],
 [('found', 'VERB'), ('on', 'ADP')],
 [('on', 'ADP'), ('Ugarte', 'PROPN')],
 [('Ugarte', 'PROPN'), ('when', 'SCONJ')],
 [('when', 'SCONJ'), ('they', 'PRON')],
 [('they', 'PRON'), ('arrested', 'VERB')],
 [('arrested', 'VERB'), ('him', 'PRON')],
 [('him', 'PRON'), ('I', 'PRON')],
 [('I', 'PRON'), ('observe', 'VERB')],
 [('observe', 'VERB'), ('that', 'SCONJ')],
 [('that', 'SCONJ'), ('you', 'PRON')],
 [('you', 'PRON'), ('in', 'ADP')],
 [('in', 'ADP'), ('one', 'NUM')],
 [('one', 'NUM'), ('respect', 'NOUN')],
 [('respect', 'NOUN'), ('are', 'AUX')],
 [('are', 'AUX'), ('a', 'DET')],
 [('a', 'DET'), ('very', 'ADV')],
 [('very', 'ADV'), ('fortunate', 'ADJ')],
 [('fortunate', 'ADJ'), ('man', 'NOUN')],
 [('man', 'NOUN'), ('Monsieur', 'PROPN')],
 [('Monsieur', 'PROPN'), ('I', 'PRON')],
 [('I', 'PRON'), ('am', 'AUX')],
 [('am', 'AUX'), ('moved', 'VERB')],
 [('moved', 'VERB'), ('to', 'PART')],
 [('to', 'PART'), ('make', 'VERB')],
 [('make', 'VERB'), ('one', 'NUM')],
 [('one', 'NUM'), ('more', 'ADJ')],
 [('more', 'ADJ'), ('suggestion', 'NOUN')],
 [('suggestion', 'NOUN'), ('why', 'SCONJ')],
 [('why', 'SCONJ'), ('I', 'PRON')],
 [('I', 'PRON'), ('do', 'AUX')],
 [('do', 'AUX'), ('not', 'PART')],
 [('not', 'PART'), ('know', 'VERB')],
 [('know', 'VERB'), ('because', 'SCONJ')],
 [('because', 'SCONJ'), ('it', 'PRON')],
 [('it', 'PRON'), ('can', 'AUX')],
 [('can', 'AUX'), ('not', 'PART')],
 [('not', 'PART'), ('possibly', 'ADV')],
 [('possibly', 'ADV'), ('profit', 'VERB')],
 [('profit', 'VERB'), ('me', 'PRON')],
 [('me', 'PRON'), ('but', 'CCONJ')],
 [('but', 'CCONJ'), ('have', 'AUX')],
 [('have', 'AUX'), ('you', 'PRON')],
 [('you', 'PRON'), ('heard', 'VERB')],
 [('heard', 'VERB'), ('about', 'ADP')],
 [('about', 'ADP'), ('Signor', 'PROPN')],
 [('Signor', 'PROPN'), ('Ugarte', 'PROPN')],
 [('Ugarte', 'PROPN'), ('and', 'CCONJ')],
 [('and', 'CCONJ'), ('the', 'DET')],
 [('the', 'DET'), ('letters', 'NOUN')],
 [('letters', 'NOUN'), ('of', 'ADP')],
 [('of', 'ADP'), ('transit', 'NOUN')],
 [('transit', 'NOUN'), ('Well', 'INTJ')],
 [('Well', 'INTJ'), ('good', 'ADJ')],
 [('good', 'ADJ'), ('luck', 'NOUN')],
 [('luck', 'NOUN'), ('But', 'CCONJ')],
 [('But', 'CCONJ'), ('be', 'AUX')],
 [('be', 'AUX'), ('careful', 'ADJ')],
 [('careful', 'ADJ'), ('You', 'PRON')],
 [('You', 'PRON'), ('know', 'VERB')],
 [('know', 'VERB'), ('you', 'PRON')],
 [('you', 'PRON'), ('being', 'AUX')],
 [('being', 'AUX'), ('shadowed', 'VERB')],
 [('shadowed', 'VERB'), ('We', 'PRON')],
 [('We', 'PRON'), ('might', 'AUX')],
 [('might', 'AUX'), ('as', 'ADV')],
 [('as', 'ADV'), ('well', 'ADV')],
 [('well', 'ADV'), ('be', 'AUX')],
 [('be', 'AUX'), ('frank', 'ADJ')],
 [('frank', 'ADJ'), ('Monsieur', 'PROPN')],
 [('Monsieur', 'PROPN'), ('It', 'PRON')],
 [('It', 'PRON'), ('will', 'AUX')],
 [('will', 'AUX'), ('take', 'VERB')],
 [('take', 'VERB'), ('a', 'DET')],
 [('a', 'DET'), ('miracle', 'NOUN')],
 [('miracle', 'NOUN'), ('to', 'PART')],
 [('to', 'PART'), ('get', 'VERB')],
 [('get', 'VERB'), ('you', 'PRON')],
 [('you', 'PRON'), ('out', 'ADP')],
 [('out', 'ADP'), ('of', 'ADP')],
 [('of', 'ADP'), ('Casablanca', 'PROPN')],
 [('Casablanca', 'PROPN'), ('And', 'CCONJ')],
 [('And', 'CCONJ'), ('the', 'DET')],
 [('the', 'DET'), ('Germans', 'PROPN')],
 [('Germans', 'PROPN'), ('have', 'AUX')],
 [('have', 'AUX'), ('outlawed', 'VERB')],
 [('outlawed', 'VERB'), ('miracles', 'NOUN')],
 [('miracles', 'NOUN'), ('As', 'ADP')],
 [('As', 'ADP'), ('leader', 'NOUN')],
 [('leader', 'NOUN'), ('of', 'ADP')],
 [('of', 'ADP'), ('all', 'DET')],
 [('all', 'DET'), ('illegal', 'ADJ')],
 [('illegal', 'ADJ'), ('activities', 'NOUN')],
 [('activities', 'NOUN'), ('in', 'ADP')],
 [('in', 'ADP'), ('Casablanca', 'PROPN')],
 [('Casablanca', 'PROPN'), ('I', 'PRON')],
 [('I', 'PRON'), ('am', 'AUX')],
 [('am', 'AUX'), ('an', 'DET')],
 [('an', 'DET'), ('influential', 'ADJ')],
 [('influential', 'ADJ'), ('and', 'CCONJ')],
 [('and', 'CCONJ'), ('respected', 'ADJ')],
 [('respected', 'ADJ'), ('man', 'NOUN')],
 [('man', 'NOUN'), ('It', 'PRON')],
 [('It', 'PRON'), ('would', 'AUX')],
 [('would', 'AUX'), ('not', 'PART')],
 [('not', 'PART'), ('be', 'AUX')],
 [('be', 'AUX'), ('worth', 'ADJ')],
 [('worth', 'ADJ'), ('my', 'PRON')],
 [('my', 'PRON'), ('life', 'NOUN')],
 [('life', 'NOUN'), ('to', 'PART')],
 [('to', 'PART'), ('do', 'VERB')],
 [('do', 'VERB'), ('anything', 'PRON')],
 [('anything', 'PRON'), ('for', 'ADP')],
 [('for', 'ADP'), ('Monsieur', 'PROPN')],
 [('Monsieur', 'PROPN'), ('Laszlo', 'PROPN')],
 [('Laszlo', 'PROPN'), ('You', 'PRON')],
 [('You', 'PRON'), ('however', 'ADV')],
 [('however', 'ADV'), ('are', 'AUX')],
 [('are', 'AUX'), ('a', 'DET')],
 [('a', 'DET'), ('different', 'ADJ')],
 [('different', 'ADJ'), ('matter', 'NOUN')]]


# Define a function to get the neighboring words based on bigrams
def get_neighbor_words(keyword, bigrams, pos_label = None):    
    neighbor_words = []
    keyword = keyword.lower()    
    for bigram in bigrams:       
        # extract just the lowercased words (not the labels) for each bigram
        words = [word.lower() for word, label in bigram]                
        # check to see if keyword is in the bigram
        if keyword in words:         
            for word, label in bigram:
                if word.lower() != keyword: # focusing on the neighbor word, not the keyword
                    neighbor_words.append(word.lower())    
    # return the word list after sorting it
    return Counter(neighbor_words).most_common()


# Get the neighboring words of the character Rick
get_neighbor_words("Rick", bigrams)

[('stay', 1),
 ('would', 1),
 ('be', 1),
 ('without', 1),
 ('enough', 1),
 ('do', 1),
 ('partner', 1),
 ('i', 1),
 ('dear', 1),
 ('when', 1),
 ('hello', 1),
 ('it', 1),
 ('that', 1),
 ('one', 1),
 ('monsieur', 1),
 ('those', 1)]


# Print the neighboring words of the character Rick in each character's lines of the movie Casablanca and the characters' names.
# Write a loop through the movie_casablanca dataframe, copy the codes above of creating a list of tokens and POS labels 
# from each line if the token is a word, and then get all bigram of the lines. Finally, print movie_casablanca['cname'] of 
# that line, along with the neighboring words of Rick in that line.

for i in range(len(movie_casablanca['lines'])):
    tokens_and_labels = [(token.text, token.pos_) for token in nlp(movie_casablanca['lines'][i]) if token.is_alpha]
    bigrams = get_bigrams(tokens_and_labels)
    print(movie_casablanca['cname'][i], get_neighbor_words('Rick',bigrams))

ANNINA [('monsieur', 3), ('i', 1), ('what', 1)]
FERRARI [('stay', 1), ('would', 1), ('be', 1), ('without', 1), ('enough', 1), ('do', 1), ('partner', 1), ('i', 1), ('dear', 1), ('when', 1), ('hello', 1), ('it', 1), ('that', 1), ('one', 1), ('monsieur', 1), ('those', 1)]
ILSA [('no', 3), ('i', 3), ('is', 2), ('what', 2), ('oh', 1), ('but', 1), ('the', 1), ('will', 1), ('yes', 1), ('story', 1), ('do', 1), ('me', 1), ('it', 1), ('not', 1), ('some', 1), ('goodbye', 1), ('god', 1), ('about', 1), ('with', 1), ('he', 1), ('hello', 1), ('who', 1)]
LASZLO [('day', 1), ('do', 1), ('about', 1), ('in', 1)]
RENAULT [('and', 3), ('in', 2), ('with', 2), ('is', 2), ('about', 2), ('you', 2), ('there', 2), ('earlier', 1), ('met', 1), ('mademoiselle', 1), ('but', 1), ('sam', 1), ('monsieur', 1), ('victor', 1), ('if', 1), ('has', 1), ('of', 1), ('courage', 1), ('comes', 1), ('this', 1), ('at', 1), ('everybody', 1), ('to', 1), ('realizing', 1), ('well', 1), ('later', 1), ('huh', 1), ('germans', 1), ('have', 1), ('no', 1), ('never', 1), ('makes', 1), ('a', 1), ('half', 1), ('laszlo', 1), ('casablanca', 1), ('that', 1), ('know', 1), ('we', 1), ('less', 1)]
RICK [('owe', 1), ('a', 1)]
SAM []
STRASSER [('do', 1), ('i', 1), ('about', 1), ('himself', 1)]
UGARTE [('know', 2), ('hide', 1), ('me', 1), ('do', 1), ('something', 1), ('help', 1), ('i', 1), ('well', 1), ('after', 1), ('person', 1), ('if', 1), ('watching', 1), ('hello', 1)]

	mid	cid	cname	mname	gender	wordcount	year	genres	comedy	thriller	drama	romance	lines
0	m0	u0	BIANCA	10 things i hate about you	f	959	1999	['comedy', 'romance']	True	False	False	True	They do not! / I hope so. / Let's go. / Okay -...
1	m0	u2	CAMERON	10 things i hate about you	m	527	1999	['comedy', 'romance']	True	False	False	True	They do to! / She okay? / Wow / No / The "real...
2	m0	u4	JOEY	10 things i hate about you	m	278	1999	['comedy', 'romance']	True	False	False	True	Listen, I want to talk to you about the prom. ...
3	m0	u5	KAT	10 things i hate about you	f	1217	1999	['comedy', 'romance']	True	False	False	True	Perm? / It's just you. / What? To completely d...
4	m0	u6	MANDELLA	10 things i hate about you	f	157	1999	['comedy', 'romance']	True	False	False	True	William - he asked me to meet him here. / Have...
...	...	...	...	...	...	...	...	...	...	...	...	...	...
2964	m98	u1455	ELSA	indiana jones and the last crusade	f	289	1989	['action', 'adventure', 'thriller', 'action', ...	False	True	False	False	I can reach it. I can reach it... / It's ours,...
2965	m98	u1456	HENRY	indiana jones and the last crusade	m	729	1989	['action', 'adventure', 'thriller', 'action', ...	False	True	False	False	Got lost in his own museum, huh? / The Name of...
2966	m98	u1457	INDY	indiana jones and the last crusade	m	1436	1989	['action', 'adventure', 'thriller', 'action', ...	False	True	False	False	It's... a leap of faith. Oh, God. / I'm going ...
2967	m99	u1463	INDIANA	indiana jones and the temple of doom	m	1500	1984	['action', 'adventure']	False	False	False	False	Then she must have run out of the room and you...
2968	m99	u1468	WILLIE	indiana jones and the temple of doom	f	950	1984	['action', 'adventure']	False	False	False	False	It's some kind of cult! And they've got the sa...

	character	count
0	Linda	15
1	Frank	14
2	Phil	7
3	Earl	6
4	Jimmy	6
...	...	...
68	I'M SMART	1
69	Jean Baptiste	1
70	Willa Cather	1
71	Dad	1
72	Picky	1

	verb	count
0	know	80
1	have	79
2	do	75
3	go	50
4	want	49
...	...	...
460	drink	1
461	presume	1
462	spoken	1
463	Sounds	1
464	threatening	1

Named Entity Recognition (NER)¶

Exploring Different Types of Entities¶

Task 1¶

Task 2¶

Part-of-Speech (POS) Tagging¶

Task 3¶

Keyword Extraction¶

Task 4¶

Task 5¶