Named Entity Recognition (NER) will help us computationally identify people, places, and things (of various kinds) in a text or collection of texts. It is useful for extracting key information from texts. You might use NER to identify the most frequently appearing characters in a novel or build a network of characters (related to network analysis), or you might use NER to identify the geographic locations mentioned in texts, a first step toward mapping the locations (related to spatial analysis).
# Install spaCy for NER and other Natural Language Processing (NLP) tasks
!pip install -U spacy
Requirement already satisfied: spacy in c:\users\colto\anaconda3\lib\site-packages (3.7.2) Requirement already satisfied: jinja2 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.11.3) Requirement already satisfied: packaging>=20.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (21.3) Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (1.0.5) Requirement already satisfied: weasel<0.4.0,>=0.1.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (0.3.4) Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (3.0.9) Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.27.1) Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (1.0.10) Requirement already satisfied: srsly<3.0.0,>=2.4.3 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.4.8) Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (1.1.2) Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.5.2) Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (4.64.0) Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (3.3.0) Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.0.10) Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (2.0.8) Requirement already satisfied: thinc<8.3.0,>=8.1.8 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (8.2.1) Requirement already satisfied: numpy>=1.19.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (1.21.5) Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (6.4.0) Requirement already satisfied: typer<0.10.0,>=0.3.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (0.9.0) Requirement already satisfied: setuptools in c:\users\colto\anaconda3\lib\site-packages (from spacy) (61.2.0) Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in c:\users\colto\anaconda3\lib\site-packages (from spacy) (3.0.12) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\colto\anaconda3\lib\site-packages (from packaging>=20.0->spacy) (3.0.4) Requirement already satisfied: typing-extensions>=4.6.1 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.8.0) Requirement already satisfied: pydantic-core==2.14.5 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.14.5) Requirement already satisfied: annotated-types>=0.4.0 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.6.0) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0.4) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (1.26.9) Requirement already satisfied: idna<4,>=2.5 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (3.3) Requirement already satisfied: certifi>=2017.4.17 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy) (2021.10.8) Requirement already satisfied: blis<0.8.0,>=0.7.8 in c:\users\colto\anaconda3\lib\site-packages (from thinc<8.3.0,>=8.1.8->spacy) (0.7.11) Requirement already satisfied: confection<1.0.0,>=0.0.1 in c:\users\colto\anaconda3\lib\site-packages (from thinc<8.3.0,>=8.1.8->spacy) (0.1.4) Requirement already satisfied: colorama in c:\users\colto\anaconda3\lib\site-packages (from tqdm<5.0.0,>=4.38.0->spacy) (0.4.6) Requirement already satisfied: click<9.0.0,>=7.1.1 in c:\users\colto\anaconda3\lib\site-packages (from typer<0.10.0,>=0.3.0->spacy) (8.0.4) Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in c:\users\colto\anaconda3\lib\site-packages (from weasel<0.4.0,>=0.1.0->spacy) (0.16.0) Requirement already satisfied: MarkupSafe>=0.23 in c:\users\colto\anaconda3\lib\site-packages (from jinja2->spacy) (2.0.1)
spaCy relies on machine learning models that were trained on a large amount of carefully-labeled texts. The English-language spaCy model that we’re going to use was trained on an annotated corpus called "OntoNotes": 2 million+ words drawn from "news, broadcast, talk shows, weblogs, usenet newsgroups, and conversational telephone speech," which were meticulously tagged by a group of researchers and professionals for people’s names and places, for nouns and verbs, for subjects and objects, and much more.
import spacy
from spacy import displacy # visualization based on spaCy
from collections import Counter # counting the results
import pandas as pd # dealing with dataframe
# Download the English-language model
!python -m spacy download en_core_web_sm
Collecting en-core-web-sm==3.7.1
Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
Requirement already satisfied: spacy<3.8.0,>=3.7.2 in c:\users\colto\anaconda3\lib\site-packages (from en-core-web-sm==3.7.1) (3.7.2)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.5.2)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.9)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.8)
Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3.0)
Requirement already satisfied: weasel<0.4.0,>=0.1.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.3.4)
Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.10)
Requirement already satisfied: srsly<3.0.0,>=2.4.3 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.4.8)
Requirement already satisfied: packaging>=20.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (21.3)
Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.1.2)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.27.1)
Requirement already satisfied: thinc<8.3.0,>=8.1.8 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.2.1)
Requirement already satisfied: numpy>=1.19.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.21.5)
Requirement already satisfied: jinja2 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.11.3)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.12)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.64.0)
Requirement already satisfied: setuptools in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (61.2.0)
Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (6.4.0)
Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.5)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.10)
Requirement already satisfied: typer<0.10.0,>=0.3.0 in c:\users\colto\anaconda3\lib\site-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.9.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\colto\anaconda3\lib\site-packages (from packaging>=20.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.4)
Requirement already satisfied: pydantic-core==2.14.5 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.14.5)
Requirement already satisfied: annotated-types>=0.4.0 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.6.0)
Requirement already satisfied: typing-extensions>=4.6.1 in c:\users\colto\anaconda3\lib\site-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.8.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.26.9)
Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\colto\anaconda3\lib\site-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.4)
Requirement already satisfied: confection<1.0.0,>=0.0.1 in c:\users\colto\anaconda3\lib\site-packages (from thinc<8.3.0,>=8.1.8->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.1.4)
Requirement already satisfied: blis<0.8.0,>=0.7.8 in c:\users\colto\anaconda3\lib\site-packages (from thinc<8.3.0,>=8.1.8->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.7.11)
Requirement already satisfied: colorama in c:\users\colto\anaconda3\lib\site-packages (from tqdm<5.0.0,>=4.38.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.4.6)
Requirement already satisfied: click<9.0.0,>=7.1.1 in c:\users\colto\anaconda3\lib\site-packages (from typer<0.10.0,>=0.3.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.0.4)
Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in c:\users\colto\anaconda3\lib\site-packages (from weasel<0.4.0,>=0.1.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.16.0)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\colto\anaconda3\lib\site-packages (from jinja2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.1)
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')
# Load the English language model
import en_core_web_sm
nlp = en_core_web_sm.load()
# Load the dataset
movie = pd.read_csv('Data/movie_dialogue.csv')
movie
| mid | cid | cname | mname | gender | wordcount | year | genres | comedy | thriller | drama | romance | lines | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | m0 | u0 | BIANCA | 10 things i hate about you | f | 959 | 1999 | ['comedy', 'romance'] | True | False | False | True | They do not! / I hope so. / Let's go. / Okay -... |
| 1 | m0 | u2 | CAMERON | 10 things i hate about you | m | 527 | 1999 | ['comedy', 'romance'] | True | False | False | True | They do to! / She okay? / Wow / No / The "real... |
| 2 | m0 | u4 | JOEY | 10 things i hate about you | m | 278 | 1999 | ['comedy', 'romance'] | True | False | False | True | Listen, I want to talk to you about the prom. ... |
| 3 | m0 | u5 | KAT | 10 things i hate about you | f | 1217 | 1999 | ['comedy', 'romance'] | True | False | False | True | Perm? / It's just you. / What? To completely d... |
| 4 | m0 | u6 | MANDELLA | 10 things i hate about you | f | 157 | 1999 | ['comedy', 'romance'] | True | False | False | True | William - he asked me to meet him here. / Have... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2964 | m98 | u1455 | ELSA | indiana jones and the last crusade | f | 289 | 1989 | ['action', 'adventure', 'thriller', 'action', ... | False | True | False | False | I can reach it. I can reach it... / It's ours,... |
| 2965 | m98 | u1456 | HENRY | indiana jones and the last crusade | m | 729 | 1989 | ['action', 'adventure', 'thriller', 'action', ... | False | True | False | False | Got lost in his own museum, huh? / The Name of... |
| 2966 | m98 | u1457 | INDY | indiana jones and the last crusade | m | 1436 | 1989 | ['action', 'adventure', 'thriller', 'action', ... | False | True | False | False | It's... a leap of faith. Oh, God. / I'm going ... |
| 2967 | m99 | u1463 | INDIANA | indiana jones and the temple of doom | m | 1500 | 1984 | ['action', 'adventure'] | False | False | False | False | Then she must have run out of the room and you... |
| 2968 | m99 | u1468 | WILLIE | indiana jones and the temple of doom | f | 950 | 1984 | ['action', 'adventure'] | False | False | False | False | It's some kind of cult! And they've got the sa... |
2969 rows × 13 columns
# Let's see the lines of the 1st character in the 1st movie
print(movie['lines'][0])
They do not! / I hope so. / Let's go. / Okay -- you're gonna need to learn how to lie. / I'm kidding. You know how sometimes you just become this "persona"? And you don't know how to quit? / Like my fear of wearing pastels? / What good stuff? / Me. This endless ...blonde babble. I'm like, boring myself. / do you listen to this crap? / Then Guillermo says, "If you go any lighter, you're gonna look like an extra on 90210." / But / Well, no... / I was? / Tons / You know Chastity? / Hi. / Who knows? All I've ever heard her say is that she'd dip before dating a guy that smokes. / Lesbian? No. I found a picture of Jared Leto in one of her drawers, so I'm pretty sure she's not harboring same-sex tendencies. / I really, really, really wanna go, but I can't. Not unless my sister goes. / Eber's Deep Conditioner every two days. And I never, ever use a blowdryer without the diffuser attachment. / You're sweet. / I counted on you to help my cause. You and that thug are obviously failing. Aren't we ever going on our date? / Where? / How is our little Find the Wench A Date plan progressing? / Forget French. / I don't want to know how to say that though. I want to know useful things. Like where the good stores are. How much does champagne cost? Stuff like Chat. I have never in my life had to point out my head to someone. / C'esc ma tete. This is my head / Gosh, if only we could find Kat a boyfriend... / Unsolved mystery. She used to be really popular when she started high school, then it was just like she got sick of it or something. / The thing is, Cameron -- I'm at the mercy of a particularly hideous breed of loser. My sister. I can't date until she does. / No, no, it's my fault -- we didn't have a proper introduction --- / You're asking me out. That's so cute. What's your name again? / Not the hacking and gagging and spitting part. Please. / Can we make this quick? Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad. Again. / I did. / I have to be home in twenty minutes. / Sometimes I wonder if the guys we're supposed to want to go out with are the ones we actually want to go out with, you know? / Combination. I don't know -- I thought he'd be different. More of a gentleman... / He practically proposed when he found out we had the same dermatologist. I mean. Dr. Bonchowski is great an all, but he's not exactly relevant party conversation. / Would you mind getting me a drink, Cameron? / Joey. / Where did he go? He was just here. / You might wanna think about it / Did you change your hair? / You know the deal. I can ' t go if Kat doesn't go -- / Hi, Joey. / Neat... / Queen Harry? / Hopefully. / Expensive? / Patrick -- is that- a. / Is that woman a complete fruit-loop or is it just me? / No! I just wanted / I just wanted -- / Let go! / You looked beautiful last night, you know. / I guess I'll never know, will I? / God, you're just like him! Just keep me locked away in the dark, so I can't experience anything for myself / I'm not stupid enough to repeat your mistakes. / No. you didn't! If you really thought I could make my own decisions, you would've let me go out with him instead of helping Daddy hold me hostage. / Why didn't you tell me? / But / You did what? / As in... / But you hate Joey / Why? / What? / I wish I had that luxury. I'm the only sophomore that got asked to the prom and I can't go, because you won ' t. / Like you care. / I don't get you. You act like you're too good for any of this, and then you go totally apeshit when you get here. / I really don't think I need any social advice from you right now. / You are so completely unbalanced. / Yeah, he's your freak friend Mandella's boyfriend. I guess since I'm not allowed to go out, I should obsess over a dead guy, too. / Like I'm supposed to know what that even means. / Can't you forget for just one night that you're completely wretched? / Bogey Lowenstein's party is normal, but you're too busy listening to Bitches Who Need Prozac to know that. / You're ruining my life' Because you won't be normal, I can't be normal. / I think you're a freak. I think you do this to torture me. And I think you suck. / Oh, I thought you might have a date I don't know why I'm bothering to ask, but are you going to Bogey Lowenstein's party Saturday night? / Oh my God, does this mean you're becoming normal? / Can you at least start wearing a bra? / Nowhere... Hi, Daddy. / I have a date, Daddy. And he ' s not a captain of oppression like some men we know. / Fine. I see that I'm a prisoner in my own house. I'm not a daughter. I'm a possession! / He's not a "hot rod". Whatever that is. / No, but / Daddy, I want to discuss the prom with you. It's tomorrow night -- / Why? / Daddy, no! / It's just a party. Daddy. / Daddy, people expect me to be there! / It's just a party. Daddy, but I knew you'd forbid me to go since "Gloria Steinem" over there isn't going -- / If you must know, we were attempting to go to a small study group of friends. / Daddy, I -- / But she doesn't want to date. / But it's not fair -- she's a mutant, Daddy! / What if she never starts dating? / Now don't get upset. Daddy, but there's this boy... and I think he might ask...
# Visualize the name entity spaCy recognizes (note that the results are not perfect)
document = nlp(movie['lines'][0])
displacy.render(document, style="ent")
# Count and sort the number of characters with lines of each movie
movie.groupby('mname')['mid'].count().sort_values(ascending=False)
mname
magnolia 18
lone star 16
the anniversary party 12
nixon 12
grand hotel 11
..
dark city 1
quantum project 1
the nightmare before christmas 1
metropolis 1
predator 1
Name: mid, Length: 600, dtype: int64
# Count the number of entities in the lines of the characters in the movie magnolia
ent_types = dict() # initialize a dictionary
for line in movie[movie['mname']=='magnolia']['lines']: # loop through the lines in "magnolia"
doc = nlp(line)
for entity in doc.ents: # for each character, loop through all the entities
label = entity.label_ # get their labels
if label not in ent_types: # make sure there's a key for this label in the dictionary
ent_types[label] = Counter() # each label key points to a Counter for examples
text = entity.text
ent_types[label][text] += 1 # count the number of times we see each example
# Count of each type of entities
for etype, examples in ent_types.items():
print(etype, len(examples))
PERSON 73 GPE 12 ORG 31 TIME 11 CARDINAL 19 DATE 27 NORP 2 WORK_OF_ART 8 ORDINAL 1 QUANTITY 3 LAW 1 FAC 2
# Explain the entity type "PERSON"
spacy.explain('PERSON')
'People, including fictional'
# Get all people
people = []
for line in movie[movie['mname']=='magnolia']['lines']:
doc = nlp(line)
for named_entity in doc.ents:
if named_entity.label_ == "PERSON": # we only want the labels with a type "PERSON"
people.append(named_entity.text)
people_count = Counter(people)
# sort the people names by their occurences, and convert the results into a dataframe
people_magnolia = pd.DataFrame(people_count.most_common(), columns=['character', 'count'])
people_magnolia
| character | count | |
|---|---|---|
| 0 | Linda | 15 |
| 1 | Frank | 14 |
| 2 | Phil | 7 |
| 3 | Earl | 6 |
| 4 | Jimmy | 6 |
| ... | ... | ... |
| 68 | I'M SMART | 1 |
| 69 | Jean Baptiste | 1 |
| 70 | Willa Cather | 1 |
| 71 | Dad | 1 |
| 72 | Picky | 1 |
73 rows × 2 columns
# Explain a certain type of entity "TIME" and "DATE"
print("TIME:", spacy.explain('TIME'))
print("DATE:", spacy.explain('DATE'))
TIME: Times smaller than a day DATE: Absolute or relative dates or periods
# Get entities related to time and their numbers in the movie magnolia and convert the sorted results to a dataframe.
# Like what we did for "people", first create an empty list, and then loop through each line in the movie
# and use "nlp" to convert the line. Next, loop through each entity of the line, and select the entity based on labels
# "TIME" or "DATE", and append the result to the empity list. Finally, use Counter to count the results,
# create a dataframe and sort them using "most_common."
time = []
for line in movie[movie['mname']=='magnolia']['lines']:
doc = nlp(line)
for named_entity in doc.ents:
if named_entity.label_ == "TIME" or named_entity.label_ == "DATE":
time.append(named_entity.text)
time_count = Counter(time)
time_magnolia = pd.DataFrame(time_count.most_common(), columns=['character', 'count'])
print(time_magnolia)
character count 0 today 6 1 tonight 4 2 last night 3 3 four years 2 4 ten o'clock 2 5 an hour 2 6 a minute 1 7 Eight o'clock 1 8 About two years ago 1 9 tomorrow 1 10 seven-eleven 1 11 age twelve 1 12 age seventeen 1 13 thirty eight years 1 14 12 years old 1 15 '84 1 16 one minute 1 17 five years 1 18 1980 1 19 three years ago 1 20 Ten o'clock 1 21 night 1 22 two and a half 1 23 a day 1 24 about three years 1 25 thirty years 1 26 ' years 1 27 twenty years old 1 28 Fifteen minutes ago 1 29 ten years ago 1 30 this morning 1 31 Two years...three years 1 32 fifty years old 1 33 two days 1 34 About three weeks ago 1 35 six months 1 36 ' days 1 37 ten minutes ago 1
# Get the total word count of all lines of movies released in the year of 1960
movie_1960_df = movie[(movie['year']==1960)]
movie_1960_wordcount = movie_1960_df["wordcount"].sum()
movie_1960_wordcount
19535
# Get the total word count of all lines of movies released in the year of 2009
movie_2009_df = movie[(movie['year']==2009)]
movie_2009_wordcount = movie_2009_df["wordcount"].sum()
movie_2009_wordcount
15858
# Calculate the number of entities related to time in movies released in 1960 and 2009, and divide by respective word counts.
# First, initialize a variable "time_1960_count" and set it as 0. Then like what we did before, loop through each line in
# movie_1960_df['lines'], and use "nlp" to convert the line. Next, loop through each entity of the line,
# and if the entity has a label of "TIME" or "DATE", add 1 to time_1960_count. Finally, print time_1960_count,
# divided by movie_1960_wordcount. And then repeat the process for 2009 movies.
time_1960_count = 0
for line in movie[movie['year']==1960]['lines']:
doc = nlp(line)
for named_entity in doc.ents:
if named_entity.label_ == "TIME" or named_entity.label_ == "DATE":
time_1960_count += 1
print('1960:',time_1960_count/movie_1960_wordcount)
time_2009_count = 0
for line in movie[movie['year']==2009]['lines']:
doc = nlp(line)
for named_entity in doc.ents:
if named_entity.label_ == "TIME" or named_entity.label_ == "DATE":
time_2009_count += 1
print('2009:', time_2009_count/movie_2009_wordcount)
1960: 0.008395188123880215 2009: 0.004540295119182747
Parts of speech are the grammatical units of language, such as (in English) nouns, verbs, adjectives, adverbs, pronouns, and prepositions. Each of these parts of speech plays a different role in a sentence. By computationally identifying parts of speech, we can start computationally exploring syntax, the relationship between words, rather than only focusing on words in isolation, as we did with tf-idf.
# Get the POS tagging of a sample text
sample = """Or set upon a golden bough to sing to lords and ladies of Byzantium of what is past, or passing, or to come."""
# This is an excerpt from "Sailing to Byzantium" by the Irish poet W. B. Yeats
document = nlp(sample)
options = {"compact": True, "distance": 90, "color": "yellow", "bg": "black", "font": "Gill Sans"}
displacy.render(document, style="dep", options=options) # visualize it
# Get part-of-speech tags
for token in document:
print(token.text, token.pos_, token.dep_) # pos_ means part-of-speech tags, and dep_ means dependency
Or CCONJ cc set VERB ROOT upon SCONJ prep a DET det golden ADJ amod bough NOUN pobj to PART aux sing VERB advcl to ADP prep lords NOUN pobj and CCONJ cc ladies NOUN conj of ADP prep Byzantium PROPN pobj of ADP prep what PRON nsubj is AUX pcomp past ADJ acomp , PUNCT punct or CCONJ cc passing VERB conj , PUNCT punct or CCONJ cc to PART aux come VERB conj . PUNCT punct
# Get verbs from the movie "magnolia" in the movie dialogue dataset
verbs = []
for line in movie[movie['mname']=='magnolia']['lines']:
doc = nlp(line)
for token in doc: # loop through the token, instead of entities
if token.pos_ == 'VERB': # we only want the tokens with a POS tagging of "VERB"
verbs.append(token.text)
verbs_count = Counter(verbs)
# sort the verb by their occurences, and convert the results into a dataframe
verbs_magnolia = pd.DataFrame(verbs_count.most_common(), columns=['verb', 'count'])
verbs_magnolia
| verb | count | |
|---|---|---|
| 0 | know | 80 |
| 1 | have | 79 |
| 2 | do | 75 |
| 3 | go | 50 |
| 4 | want | 49 |
| ... | ... | ... |
| 460 | drink | 1 |
| 461 | presume | 1 |
| 462 | spoken | 1 |
| 463 | Sounds | 1 |
| 464 | threatening | 1 |
465 rows × 2 columns
# Get the top 10 adjectives by count in movies released in 1960 and 2009.
# Like what we did for the movie "magnolia", first create an empty list of adjectives, and then loop through each line in
# movie_1960_df['lines'], and use "nlp" to convert the line. Next, loop through each token of the line, and select the entity
# based on POS tagging "ADJ", and append the result to the empity list. Finally, use Counter to count the results,
# create a dataframe and sort them using "most_common," and print the top 10 adjectives.
# And then repeat the process for 2009 movies.
adjectives = []
for line in movie[movie['year']==1960]['lines']:
doc = nlp(line)
for token in doc:
if token.pos_ == 'ADJ':
adjectives.append(token.text)
adj_count = Counter(adjectives)
adj_1960 = pd.DataFrame(adj_count.most_common(), columns=['adjectives', 'count'])
print('1960', adj_1960[:10])
adjectives2 = []
for line in movie[movie['year']==2009]['lines']:
doc = nlp(line)
for token in doc:
if token.pos_ == 'ADJ':
adjectives2.append(token.text)
adj2_count = Counter(adjectives2)
adj_2009 = pd.DataFrame(adj2_count.most_common(), columns=['adjectives', 'count'])
print('2009', adj_2009[:10])
1960 adjectives count 0 little 35 1 right 26 2 sorry 23 3 wrong 23 4 good 20 5 other 19 6 old 18 7 Good 17 8 sure 17 9 wise 16 2009 adjectives count 0 more 22 1 dead 19 2 good 19 3 little 17 4 other 15 5 right 15 6 ready 12 7 Good 11 8 bad 11 9 next 10
import re # regular expression
from IPython.display import Markdown, display # for visualization
# Visualize the data from the movie Casablanca
movie_casablanca = movie[movie['mname']=='casablanca'].reset_index(drop=True)
for line in movie_casablanca['lines']:
displacy.render(nlp(line), style="ent")
# Define a function to find keywords in its context
def find_sentences_with_keyword(keyword, document):
# loop through all the sentences in the document and pull out the text of each sentence
for sentence in document.sents:
sentence = sentence.text
# check to see if the keyword is in the sentence (and ignore capitalization by making both lowercase)
if keyword.lower() in sentence.lower():
# use regular expression to replace linebreaks and to make the keyword bolded, again ignoring capitalization
sentence = re.sub('\n', ' ', sentence)
sentence = re.sub(f"{keyword}", f"**{keyword}**", sentence, flags=re.IGNORECASE)
display(Markdown(sentence))
# Highlight the name of the protagonist Rick in its context of the lines of the second character of the movie
find_sentences_with_keyword(keyword="Rick", document=nlp(movie_casablanca['lines'][1]))
Rick's wouldn't be Rick's without them.
/ Rick, don't be a fool.
Rick, I'll put my cards on the table.
My dear Rick, when will you realize that in this world today isolationism is no longer a practical policy?
Hello, Rick. /
He is a difficult customer, that Rick.
Not for sure, Monsieur, but I will venture to guess that Ugarte left those letters with Monsieur Rick.
# Create a list of tokens and POS labels from document if the token is a word
tokens_and_labels = [(token.text, token.pos_) for token in nlp(movie_casablanca['lines'][1]) if token.is_alpha]
tokens_and_labels
[('I', 'PRON'),
('shall', 'AUX'),
('remember', 'VERB'),
('to', 'PART'),
('pay', 'VERB'),
('it', 'PRON'),
('to', 'ADP'),
('myself', 'PRON'),
('Of', 'ADV'),
('course', 'ADV'),
('they', 'PRON'),
('stay', 'VERB'),
('Rick', 'PROPN'),
('would', 'AUX'),
('be', 'AUX'),
('Rick', 'PROPN'),
('without', 'ADP'),
('them', 'PRON'),
('Hmmm', 'PROPN'),
('I', 'PRON'),
('happen', 'VERB'),
('to', 'PART'),
('know', 'VERB'),
('that', 'SCONJ'),
('he', 'PRON'),
('gets', 'VERB'),
('ten', 'NUM'),
('percent', 'NOUN'),
('But', 'CCONJ'),
('he', 'PRON'),
('worth', 'ADJ'),
('five', 'NUM'),
('Ah', 'INTJ'),
('to', 'PART'),
('get', 'VERB'),
('out', 'ADP'),
('of', 'ADP'),
('Casablanca', 'PROPN'),
('and', 'CCONJ'),
('go', 'VERB'),
('to', 'ADP'),
('America', 'PROPN'),
('You', 'PRON'),
('a', 'DET'),
('lucky', 'ADJ'),
('man', 'NOUN'),
('Shall', 'AUX'),
('we', 'PRON'),
('draw', 'VERB'),
('up', 'ADP'),
('the', 'DET'),
('papers', 'NOUN'),
('or', 'CCONJ'),
('is', 'AUX'),
('our', 'PRON'),
('handshake', 'NOUN'),
('good', 'ADJ'),
('enough', 'ADV'),
('Rick', 'PROPN'),
('do', 'AUX'),
('be', 'AUX'),
('a', 'DET'),
('fool', 'NOUN'),
('Take', 'VERB'),
('me', 'PRON'),
('into', 'ADP'),
('your', 'PRON'),
('confidence', 'NOUN'),
('You', 'PRON'),
('need', 'VERB'),
('a', 'DET'),
('partner', 'NOUN'),
('Rick', 'PROPN'),
('I', 'PRON'),
('put', 'VERB'),
('my', 'PRON'),
('cards', 'NOUN'),
('on', 'ADP'),
('the', 'DET'),
('table', 'NOUN'),
('I', 'PRON'),
('think', 'VERB'),
('you', 'PRON'),
('know', 'VERB'),
('where', 'SCONJ'),
('those', 'DET'),
('letters', 'NOUN'),
('are', 'AUX'),
('Naturally', 'ADV'),
('there', 'PRON'),
('will', 'AUX'),
('be', 'AUX'),
('a', 'DET'),
('few', 'ADJ'),
('incidental', 'ADJ'),
('expenses', 'NOUN'),
('That', 'PRON'),
('is', 'AUX'),
('the', 'DET'),
('proposition', 'NOUN'),
('I', 'PRON'),
('have', 'VERB'),
('for', 'ADP'),
('whoever', 'PRON'),
('has', 'AUX'),
('those', 'DET'),
('letters', 'NOUN'),
('I', 'PRON'),
('have', 'VERB'),
('a', 'DET'),
('proposition', 'NOUN'),
('for', 'ADP'),
('whoever', 'PRON'),
('has', 'AUX'),
('those', 'DET'),
('letters', 'NOUN'),
('I', 'PRON'),
('will', 'AUX'),
('handle', 'VERB'),
('the', 'DET'),
('entire', 'ADJ'),
('transaction', 'NOUN'),
('get', 'VERB'),
('rid', 'VERB'),
('of', 'ADP'),
('the', 'DET'),
('letters', 'NOUN'),
('take', 'VERB'),
('all', 'DET'),
('the', 'DET'),
('risk', 'NOUN'),
('for', 'ADP'),
('a', 'DET'),
('small', 'ADJ'),
('percentage', 'NOUN'),
('If', 'SCONJ'),
('I', 'PRON'),
('could', 'AUX'),
('lay', 'VERB'),
('my', 'PRON'),
('hands', 'NOUN'),
('on', 'ADP'),
('those', 'DET'),
('letters', 'NOUN'),
('I', 'PRON'),
('could', 'AUX'),
('make', 'VERB'),
('a', 'DET'),
('fortune', 'NOUN'),
('Of', 'ADV'),
('course', 'ADV'),
('not', 'PART'),
('What', 'PRON'),
('upsets', 'VERB'),
('me', 'PRON'),
('is', 'AUX'),
('the', 'DET'),
('fact', 'NOUN'),
('that', 'SCONJ'),
('Ugarte', 'PROPN'),
('is', 'AUX'),
('dead', 'ADJ'),
('and', 'CCONJ'),
('no', 'DET'),
('one', 'NOUN'),
('knows', 'VERB'),
('where', 'SCONJ'),
('those', 'DET'),
('letters', 'NOUN'),
('of', 'ADP'),
('transit', 'NOUN'),
('are', 'AUX'),
('The', 'DET'),
('bourbon', 'NOUN'),
('The', 'DET'),
('news', 'NOUN'),
('about', 'ADP'),
('Ugarte', 'PROPN'),
('upset', 'VERB'),
('me', 'PRON'),
('very', 'ADV'),
('much', 'ADV'),
('Carrying', 'VERB'),
('charges', 'NOUN'),
('my', 'PRON'),
('boy', 'NOUN'),
('carrying', 'VERB'),
('charges', 'NOUN'),
('Here', 'ADV'),
('sit', 'VERB'),
('down', 'ADP'),
('There', 'PRON'),
('something', 'PRON'),
('I', 'PRON'),
('want', 'VERB'),
('to', 'PART'),
('talk', 'VERB'),
('over', 'ADP'),
('with', 'ADP'),
('you', 'PRON'),
('anyhow', 'ADV'),
('No', 'DET'),
('hurry', 'NOUN'),
('I', 'PRON'),
('have', 'VERB'),
('it', 'PRON'),
('sent', 'VERB'),
('over', 'ADP'),
('Have', 'VERB'),
('a', 'DET'),
('drink', 'NOUN'),
('with', 'ADP'),
('me', 'PRON'),
('My', 'PRON'),
('dear', 'ADJ'),
('Rick', 'PROPN'),
('when', 'SCONJ'),
('will', 'AUX'),
('you', 'PRON'),
('realize', 'VERB'),
('that', 'SCONJ'),
('in', 'ADP'),
('this', 'DET'),
('world', 'NOUN'),
('today', 'NOUN'),
('isolationism', 'NOUN'),
('is', 'AUX'),
('no', 'ADV'),
('longer', 'ADV'),
('a', 'DET'),
('practical', 'ADJ'),
('policy', 'NOUN'),
('Suppose', 'VERB'),
('we', 'PRON'),
('ask', 'VERB'),
('Sam', 'PROPN'),
('Maybe', 'ADV'),
('he', 'PRON'),
('like', 'VERB'),
('to', 'PART'),
('make', 'VERB'),
('a', 'DET'),
('change', 'NOUN'),
('That', 'PRON'),
('too', 'ADV'),
('bad', 'ADJ'),
('That', 'PRON'),
('Casablanca', 'PROPN'),
('leading', 'VERB'),
('commodity', 'NOUN'),
('In', 'ADP'),
('refugees', 'NOUN'),
('alone', 'ADV'),
('we', 'PRON'),
('could', 'AUX'),
('make', 'VERB'),
('a', 'DET'),
('fortune', 'NOUN'),
('if', 'SCONJ'),
('you', 'PRON'),
('would', 'AUX'),
('work', 'VERB'),
('with', 'ADP'),
('me', 'PRON'),
('through', 'ADP'),
('the', 'DET'),
('black', 'ADJ'),
('market', 'NOUN'),
('What', 'PRON'),
('do', 'AUX'),
('you', 'PRON'),
('want', 'VERB'),
('for', 'ADP'),
('Sam', 'PROPN'),
('You', 'PRON'),
('have', 'AUX'),
('heard', 'VERB'),
('my', 'PRON'),
('offer', 'NOUN'),
('Fine', 'INTJ'),
('but', 'CCONJ'),
('I', 'PRON'),
('would', 'AUX'),
('like', 'VERB'),
('to', 'PART'),
('buy', 'VERB'),
('your', 'PRON'),
('cafe', 'NOUN'),
('Hello', 'PROPN'),
('Rick', 'PROPN'),
('It', 'PRON'),
('was', 'AUX'),
('gracious', 'ADJ'),
('of', 'ADP'),
('you', 'PRON'),
('to', 'PART'),
('share', 'VERB'),
('it', 'PRON'),
('with', 'ADP'),
('me', 'PRON'),
('Good', 'ADJ'),
('day', 'NOUN'),
('Mademoiselle', 'PROPN'),
('Monsieur', 'PROPN'),
('He', 'PRON'),
('is', 'AUX'),
('a', 'DET'),
('difficult', 'ADJ'),
('customer', 'NOUN'),
('that', 'SCONJ'),
('Rick', 'PROPN'),
('One', 'NUM'),
('never', 'ADV'),
('knows', 'VERB'),
('what', 'PRON'),
('he', 'PRON'),
('do', 'VERB'),
('or', 'CCONJ'),
('why', 'SCONJ'),
('But', 'CCONJ'),
('it', 'PRON'),
('is', 'AUX'),
('worth', 'ADJ'),
('a', 'DET'),
('chance', 'NOUN'),
('Not', 'PART'),
('for', 'ADP'),
('sure', 'ADJ'),
('Monsieur', 'PROPN'),
('but', 'CCONJ'),
('I', 'PRON'),
('will', 'AUX'),
('venture', 'VERB'),
('to', 'PART'),
('guess', 'VERB'),
('that', 'SCONJ'),
('Ugarte', 'PROPN'),
('left', 'VERB'),
('those', 'DET'),
('letters', 'NOUN'),
('with', 'ADP'),
('Monsieur', 'PROPN'),
('Rick', 'PROPN'),
('Those', 'DET'),
('letters', 'NOUN'),
('were', 'AUX'),
('not', 'PART'),
('found', 'VERB'),
('on', 'ADP'),
('Ugarte', 'PROPN'),
('when', 'SCONJ'),
('they', 'PRON'),
('arrested', 'VERB'),
('him', 'PRON'),
('I', 'PRON'),
('observe', 'VERB'),
('that', 'SCONJ'),
('you', 'PRON'),
('in', 'ADP'),
('one', 'NUM'),
('respect', 'NOUN'),
('are', 'AUX'),
('a', 'DET'),
('very', 'ADV'),
('fortunate', 'ADJ'),
('man', 'NOUN'),
('Monsieur', 'PROPN'),
('I', 'PRON'),
('am', 'AUX'),
('moved', 'VERB'),
('to', 'PART'),
('make', 'VERB'),
('one', 'NUM'),
('more', 'ADJ'),
('suggestion', 'NOUN'),
('why', 'SCONJ'),
('I', 'PRON'),
('do', 'AUX'),
('not', 'PART'),
('know', 'VERB'),
('because', 'SCONJ'),
('it', 'PRON'),
('can', 'AUX'),
('not', 'PART'),
('possibly', 'ADV'),
('profit', 'VERB'),
('me', 'PRON'),
('but', 'CCONJ'),
('have', 'AUX'),
('you', 'PRON'),
('heard', 'VERB'),
('about', 'ADP'),
('Signor', 'PROPN'),
('Ugarte', 'PROPN'),
('and', 'CCONJ'),
('the', 'DET'),
('letters', 'NOUN'),
('of', 'ADP'),
('transit', 'NOUN'),
('Well', 'INTJ'),
('good', 'ADJ'),
('luck', 'NOUN'),
('But', 'CCONJ'),
('be', 'AUX'),
('careful', 'ADJ'),
('You', 'PRON'),
('know', 'VERB'),
('you', 'PRON'),
('being', 'AUX'),
('shadowed', 'VERB'),
('We', 'PRON'),
('might', 'AUX'),
('as', 'ADV'),
('well', 'ADV'),
('be', 'AUX'),
('frank', 'ADJ'),
('Monsieur', 'PROPN'),
('It', 'PRON'),
('will', 'AUX'),
('take', 'VERB'),
('a', 'DET'),
('miracle', 'NOUN'),
('to', 'PART'),
('get', 'VERB'),
('you', 'PRON'),
('out', 'ADP'),
('of', 'ADP'),
('Casablanca', 'PROPN'),
('And', 'CCONJ'),
('the', 'DET'),
('Germans', 'PROPN'),
('have', 'AUX'),
('outlawed', 'VERB'),
('miracles', 'NOUN'),
('As', 'ADP'),
('leader', 'NOUN'),
('of', 'ADP'),
('all', 'DET'),
('illegal', 'ADJ'),
('activities', 'NOUN'),
('in', 'ADP'),
('Casablanca', 'PROPN'),
('I', 'PRON'),
('am', 'AUX'),
('an', 'DET'),
('influential', 'ADJ'),
('and', 'CCONJ'),
('respected', 'ADJ'),
('man', 'NOUN'),
('It', 'PRON'),
('would', 'AUX'),
('not', 'PART'),
('be', 'AUX'),
('worth', 'ADJ'),
('my', 'PRON'),
('life', 'NOUN'),
('to', 'PART'),
('do', 'VERB'),
('anything', 'PRON'),
('for', 'ADP'),
('Monsieur', 'PROPN'),
('Laszlo', 'PROPN'),
('You', 'PRON'),
('however', 'ADV'),
('are', 'AUX'),
('a', 'DET'),
('different', 'ADJ'),
('matter', 'NOUN')]
# Define a function to get all two-word combinations
def get_bigrams(word_list, number_consecutive_words=2):
ngrams = []
adj_length_of_word_list = len(word_list) - (number_consecutive_words - 1)
# loop through numbers from 0 to the (slightly adjusted) length of your word list
for word_index in range(adj_length_of_word_list):
# index the list at each number, grabbing the word at that number index as well as N number of words after it
ngram = word_list[word_index : word_index + number_consecutive_words]
# append this word combo to the master list "ngrams"
ngrams.append(ngram)
return ngrams
# Getting all bigram of the lines, including both the word and its POS label
bigrams = get_bigrams(tokens_and_labels)
bigrams
[[('I', 'PRON'), ('shall', 'AUX')],
[('shall', 'AUX'), ('remember', 'VERB')],
[('remember', 'VERB'), ('to', 'PART')],
[('to', 'PART'), ('pay', 'VERB')],
[('pay', 'VERB'), ('it', 'PRON')],
[('it', 'PRON'), ('to', 'ADP')],
[('to', 'ADP'), ('myself', 'PRON')],
[('myself', 'PRON'), ('Of', 'ADV')],
[('Of', 'ADV'), ('course', 'ADV')],
[('course', 'ADV'), ('they', 'PRON')],
[('they', 'PRON'), ('stay', 'VERB')],
[('stay', 'VERB'), ('Rick', 'PROPN')],
[('Rick', 'PROPN'), ('would', 'AUX')],
[('would', 'AUX'), ('be', 'AUX')],
[('be', 'AUX'), ('Rick', 'PROPN')],
[('Rick', 'PROPN'), ('without', 'ADP')],
[('without', 'ADP'), ('them', 'PRON')],
[('them', 'PRON'), ('Hmmm', 'PROPN')],
[('Hmmm', 'PROPN'), ('I', 'PRON')],
[('I', 'PRON'), ('happen', 'VERB')],
[('happen', 'VERB'), ('to', 'PART')],
[('to', 'PART'), ('know', 'VERB')],
[('know', 'VERB'), ('that', 'SCONJ')],
[('that', 'SCONJ'), ('he', 'PRON')],
[('he', 'PRON'), ('gets', 'VERB')],
[('gets', 'VERB'), ('ten', 'NUM')],
[('ten', 'NUM'), ('percent', 'NOUN')],
[('percent', 'NOUN'), ('But', 'CCONJ')],
[('But', 'CCONJ'), ('he', 'PRON')],
[('he', 'PRON'), ('worth', 'ADJ')],
[('worth', 'ADJ'), ('five', 'NUM')],
[('five', 'NUM'), ('Ah', 'INTJ')],
[('Ah', 'INTJ'), ('to', 'PART')],
[('to', 'PART'), ('get', 'VERB')],
[('get', 'VERB'), ('out', 'ADP')],
[('out', 'ADP'), ('of', 'ADP')],
[('of', 'ADP'), ('Casablanca', 'PROPN')],
[('Casablanca', 'PROPN'), ('and', 'CCONJ')],
[('and', 'CCONJ'), ('go', 'VERB')],
[('go', 'VERB'), ('to', 'ADP')],
[('to', 'ADP'), ('America', 'PROPN')],
[('America', 'PROPN'), ('You', 'PRON')],
[('You', 'PRON'), ('a', 'DET')],
[('a', 'DET'), ('lucky', 'ADJ')],
[('lucky', 'ADJ'), ('man', 'NOUN')],
[('man', 'NOUN'), ('Shall', 'AUX')],
[('Shall', 'AUX'), ('we', 'PRON')],
[('we', 'PRON'), ('draw', 'VERB')],
[('draw', 'VERB'), ('up', 'ADP')],
[('up', 'ADP'), ('the', 'DET')],
[('the', 'DET'), ('papers', 'NOUN')],
[('papers', 'NOUN'), ('or', 'CCONJ')],
[('or', 'CCONJ'), ('is', 'AUX')],
[('is', 'AUX'), ('our', 'PRON')],
[('our', 'PRON'), ('handshake', 'NOUN')],
[('handshake', 'NOUN'), ('good', 'ADJ')],
[('good', 'ADJ'), ('enough', 'ADV')],
[('enough', 'ADV'), ('Rick', 'PROPN')],
[('Rick', 'PROPN'), ('do', 'AUX')],
[('do', 'AUX'), ('be', 'AUX')],
[('be', 'AUX'), ('a', 'DET')],
[('a', 'DET'), ('fool', 'NOUN')],
[('fool', 'NOUN'), ('Take', 'VERB')],
[('Take', 'VERB'), ('me', 'PRON')],
[('me', 'PRON'), ('into', 'ADP')],
[('into', 'ADP'), ('your', 'PRON')],
[('your', 'PRON'), ('confidence', 'NOUN')],
[('confidence', 'NOUN'), ('You', 'PRON')],
[('You', 'PRON'), ('need', 'VERB')],
[('need', 'VERB'), ('a', 'DET')],
[('a', 'DET'), ('partner', 'NOUN')],
[('partner', 'NOUN'), ('Rick', 'PROPN')],
[('Rick', 'PROPN'), ('I', 'PRON')],
[('I', 'PRON'), ('put', 'VERB')],
[('put', 'VERB'), ('my', 'PRON')],
[('my', 'PRON'), ('cards', 'NOUN')],
[('cards', 'NOUN'), ('on', 'ADP')],
[('on', 'ADP'), ('the', 'DET')],
[('the', 'DET'), ('table', 'NOUN')],
[('table', 'NOUN'), ('I', 'PRON')],
[('I', 'PRON'), ('think', 'VERB')],
[('think', 'VERB'), ('you', 'PRON')],
[('you', 'PRON'), ('know', 'VERB')],
[('know', 'VERB'), ('where', 'SCONJ')],
[('where', 'SCONJ'), ('those', 'DET')],
[('those', 'DET'), ('letters', 'NOUN')],
[('letters', 'NOUN'), ('are', 'AUX')],
[('are', 'AUX'), ('Naturally', 'ADV')],
[('Naturally', 'ADV'), ('there', 'PRON')],
[('there', 'PRON'), ('will', 'AUX')],
[('will', 'AUX'), ('be', 'AUX')],
[('be', 'AUX'), ('a', 'DET')],
[('a', 'DET'), ('few', 'ADJ')],
[('few', 'ADJ'), ('incidental', 'ADJ')],
[('incidental', 'ADJ'), ('expenses', 'NOUN')],
[('expenses', 'NOUN'), ('That', 'PRON')],
[('That', 'PRON'), ('is', 'AUX')],
[('is', 'AUX'), ('the', 'DET')],
[('the', 'DET'), ('proposition', 'NOUN')],
[('proposition', 'NOUN'), ('I', 'PRON')],
[('I', 'PRON'), ('have', 'VERB')],
[('have', 'VERB'), ('for', 'ADP')],
[('for', 'ADP'), ('whoever', 'PRON')],
[('whoever', 'PRON'), ('has', 'AUX')],
[('has', 'AUX'), ('those', 'DET')],
[('those', 'DET'), ('letters', 'NOUN')],
[('letters', 'NOUN'), ('I', 'PRON')],
[('I', 'PRON'), ('have', 'VERB')],
[('have', 'VERB'), ('a', 'DET')],
[('a', 'DET'), ('proposition', 'NOUN')],
[('proposition', 'NOUN'), ('for', 'ADP')],
[('for', 'ADP'), ('whoever', 'PRON')],
[('whoever', 'PRON'), ('has', 'AUX')],
[('has', 'AUX'), ('those', 'DET')],
[('those', 'DET'), ('letters', 'NOUN')],
[('letters', 'NOUN'), ('I', 'PRON')],
[('I', 'PRON'), ('will', 'AUX')],
[('will', 'AUX'), ('handle', 'VERB')],
[('handle', 'VERB'), ('the', 'DET')],
[('the', 'DET'), ('entire', 'ADJ')],
[('entire', 'ADJ'), ('transaction', 'NOUN')],
[('transaction', 'NOUN'), ('get', 'VERB')],
[('get', 'VERB'), ('rid', 'VERB')],
[('rid', 'VERB'), ('of', 'ADP')],
[('of', 'ADP'), ('the', 'DET')],
[('the', 'DET'), ('letters', 'NOUN')],
[('letters', 'NOUN'), ('take', 'VERB')],
[('take', 'VERB'), ('all', 'DET')],
[('all', 'DET'), ('the', 'DET')],
[('the', 'DET'), ('risk', 'NOUN')],
[('risk', 'NOUN'), ('for', 'ADP')],
[('for', 'ADP'), ('a', 'DET')],
[('a', 'DET'), ('small', 'ADJ')],
[('small', 'ADJ'), ('percentage', 'NOUN')],
[('percentage', 'NOUN'), ('If', 'SCONJ')],
[('If', 'SCONJ'), ('I', 'PRON')],
[('I', 'PRON'), ('could', 'AUX')],
[('could', 'AUX'), ('lay', 'VERB')],
[('lay', 'VERB'), ('my', 'PRON')],
[('my', 'PRON'), ('hands', 'NOUN')],
[('hands', 'NOUN'), ('on', 'ADP')],
[('on', 'ADP'), ('those', 'DET')],
[('those', 'DET'), ('letters', 'NOUN')],
[('letters', 'NOUN'), ('I', 'PRON')],
[('I', 'PRON'), ('could', 'AUX')],
[('could', 'AUX'), ('make', 'VERB')],
[('make', 'VERB'), ('a', 'DET')],
[('a', 'DET'), ('fortune', 'NOUN')],
[('fortune', 'NOUN'), ('Of', 'ADV')],
[('Of', 'ADV'), ('course', 'ADV')],
[('course', 'ADV'), ('not', 'PART')],
[('not', 'PART'), ('What', 'PRON')],
[('What', 'PRON'), ('upsets', 'VERB')],
[('upsets', 'VERB'), ('me', 'PRON')],
[('me', 'PRON'), ('is', 'AUX')],
[('is', 'AUX'), ('the', 'DET')],
[('the', 'DET'), ('fact', 'NOUN')],
[('fact', 'NOUN'), ('that', 'SCONJ')],
[('that', 'SCONJ'), ('Ugarte', 'PROPN')],
[('Ugarte', 'PROPN'), ('is', 'AUX')],
[('is', 'AUX'), ('dead', 'ADJ')],
[('dead', 'ADJ'), ('and', 'CCONJ')],
[('and', 'CCONJ'), ('no', 'DET')],
[('no', 'DET'), ('one', 'NOUN')],
[('one', 'NOUN'), ('knows', 'VERB')],
[('knows', 'VERB'), ('where', 'SCONJ')],
[('where', 'SCONJ'), ('those', 'DET')],
[('those', 'DET'), ('letters', 'NOUN')],
[('letters', 'NOUN'), ('of', 'ADP')],
[('of', 'ADP'), ('transit', 'NOUN')],
[('transit', 'NOUN'), ('are', 'AUX')],
[('are', 'AUX'), ('The', 'DET')],
[('The', 'DET'), ('bourbon', 'NOUN')],
[('bourbon', 'NOUN'), ('The', 'DET')],
[('The', 'DET'), ('news', 'NOUN')],
[('news', 'NOUN'), ('about', 'ADP')],
[('about', 'ADP'), ('Ugarte', 'PROPN')],
[('Ugarte', 'PROPN'), ('upset', 'VERB')],
[('upset', 'VERB'), ('me', 'PRON')],
[('me', 'PRON'), ('very', 'ADV')],
[('very', 'ADV'), ('much', 'ADV')],
[('much', 'ADV'), ('Carrying', 'VERB')],
[('Carrying', 'VERB'), ('charges', 'NOUN')],
[('charges', 'NOUN'), ('my', 'PRON')],
[('my', 'PRON'), ('boy', 'NOUN')],
[('boy', 'NOUN'), ('carrying', 'VERB')],
[('carrying', 'VERB'), ('charges', 'NOUN')],
[('charges', 'NOUN'), ('Here', 'ADV')],
[('Here', 'ADV'), ('sit', 'VERB')],
[('sit', 'VERB'), ('down', 'ADP')],
[('down', 'ADP'), ('There', 'PRON')],
[('There', 'PRON'), ('something', 'PRON')],
[('something', 'PRON'), ('I', 'PRON')],
[('I', 'PRON'), ('want', 'VERB')],
[('want', 'VERB'), ('to', 'PART')],
[('to', 'PART'), ('talk', 'VERB')],
[('talk', 'VERB'), ('over', 'ADP')],
[('over', 'ADP'), ('with', 'ADP')],
[('with', 'ADP'), ('you', 'PRON')],
[('you', 'PRON'), ('anyhow', 'ADV')],
[('anyhow', 'ADV'), ('No', 'DET')],
[('No', 'DET'), ('hurry', 'NOUN')],
[('hurry', 'NOUN'), ('I', 'PRON')],
[('I', 'PRON'), ('have', 'VERB')],
[('have', 'VERB'), ('it', 'PRON')],
[('it', 'PRON'), ('sent', 'VERB')],
[('sent', 'VERB'), ('over', 'ADP')],
[('over', 'ADP'), ('Have', 'VERB')],
[('Have', 'VERB'), ('a', 'DET')],
[('a', 'DET'), ('drink', 'NOUN')],
[('drink', 'NOUN'), ('with', 'ADP')],
[('with', 'ADP'), ('me', 'PRON')],
[('me', 'PRON'), ('My', 'PRON')],
[('My', 'PRON'), ('dear', 'ADJ')],
[('dear', 'ADJ'), ('Rick', 'PROPN')],
[('Rick', 'PROPN'), ('when', 'SCONJ')],
[('when', 'SCONJ'), ('will', 'AUX')],
[('will', 'AUX'), ('you', 'PRON')],
[('you', 'PRON'), ('realize', 'VERB')],
[('realize', 'VERB'), ('that', 'SCONJ')],
[('that', 'SCONJ'), ('in', 'ADP')],
[('in', 'ADP'), ('this', 'DET')],
[('this', 'DET'), ('world', 'NOUN')],
[('world', 'NOUN'), ('today', 'NOUN')],
[('today', 'NOUN'), ('isolationism', 'NOUN')],
[('isolationism', 'NOUN'), ('is', 'AUX')],
[('is', 'AUX'), ('no', 'ADV')],
[('no', 'ADV'), ('longer', 'ADV')],
[('longer', 'ADV'), ('a', 'DET')],
[('a', 'DET'), ('practical', 'ADJ')],
[('practical', 'ADJ'), ('policy', 'NOUN')],
[('policy', 'NOUN'), ('Suppose', 'VERB')],
[('Suppose', 'VERB'), ('we', 'PRON')],
[('we', 'PRON'), ('ask', 'VERB')],
[('ask', 'VERB'), ('Sam', 'PROPN')],
[('Sam', 'PROPN'), ('Maybe', 'ADV')],
[('Maybe', 'ADV'), ('he', 'PRON')],
[('he', 'PRON'), ('like', 'VERB')],
[('like', 'VERB'), ('to', 'PART')],
[('to', 'PART'), ('make', 'VERB')],
[('make', 'VERB'), ('a', 'DET')],
[('a', 'DET'), ('change', 'NOUN')],
[('change', 'NOUN'), ('That', 'PRON')],
[('That', 'PRON'), ('too', 'ADV')],
[('too', 'ADV'), ('bad', 'ADJ')],
[('bad', 'ADJ'), ('That', 'PRON')],
[('That', 'PRON'), ('Casablanca', 'PROPN')],
[('Casablanca', 'PROPN'), ('leading', 'VERB')],
[('leading', 'VERB'), ('commodity', 'NOUN')],
[('commodity', 'NOUN'), ('In', 'ADP')],
[('In', 'ADP'), ('refugees', 'NOUN')],
[('refugees', 'NOUN'), ('alone', 'ADV')],
[('alone', 'ADV'), ('we', 'PRON')],
[('we', 'PRON'), ('could', 'AUX')],
[('could', 'AUX'), ('make', 'VERB')],
[('make', 'VERB'), ('a', 'DET')],
[('a', 'DET'), ('fortune', 'NOUN')],
[('fortune', 'NOUN'), ('if', 'SCONJ')],
[('if', 'SCONJ'), ('you', 'PRON')],
[('you', 'PRON'), ('would', 'AUX')],
[('would', 'AUX'), ('work', 'VERB')],
[('work', 'VERB'), ('with', 'ADP')],
[('with', 'ADP'), ('me', 'PRON')],
[('me', 'PRON'), ('through', 'ADP')],
[('through', 'ADP'), ('the', 'DET')],
[('the', 'DET'), ('black', 'ADJ')],
[('black', 'ADJ'), ('market', 'NOUN')],
[('market', 'NOUN'), ('What', 'PRON')],
[('What', 'PRON'), ('do', 'AUX')],
[('do', 'AUX'), ('you', 'PRON')],
[('you', 'PRON'), ('want', 'VERB')],
[('want', 'VERB'), ('for', 'ADP')],
[('for', 'ADP'), ('Sam', 'PROPN')],
[('Sam', 'PROPN'), ('You', 'PRON')],
[('You', 'PRON'), ('have', 'AUX')],
[('have', 'AUX'), ('heard', 'VERB')],
[('heard', 'VERB'), ('my', 'PRON')],
[('my', 'PRON'), ('offer', 'NOUN')],
[('offer', 'NOUN'), ('Fine', 'INTJ')],
[('Fine', 'INTJ'), ('but', 'CCONJ')],
[('but', 'CCONJ'), ('I', 'PRON')],
[('I', 'PRON'), ('would', 'AUX')],
[('would', 'AUX'), ('like', 'VERB')],
[('like', 'VERB'), ('to', 'PART')],
[('to', 'PART'), ('buy', 'VERB')],
[('buy', 'VERB'), ('your', 'PRON')],
[('your', 'PRON'), ('cafe', 'NOUN')],
[('cafe', 'NOUN'), ('Hello', 'PROPN')],
[('Hello', 'PROPN'), ('Rick', 'PROPN')],
[('Rick', 'PROPN'), ('It', 'PRON')],
[('It', 'PRON'), ('was', 'AUX')],
[('was', 'AUX'), ('gracious', 'ADJ')],
[('gracious', 'ADJ'), ('of', 'ADP')],
[('of', 'ADP'), ('you', 'PRON')],
[('you', 'PRON'), ('to', 'PART')],
[('to', 'PART'), ('share', 'VERB')],
[('share', 'VERB'), ('it', 'PRON')],
[('it', 'PRON'), ('with', 'ADP')],
[('with', 'ADP'), ('me', 'PRON')],
[('me', 'PRON'), ('Good', 'ADJ')],
[('Good', 'ADJ'), ('day', 'NOUN')],
[('day', 'NOUN'), ('Mademoiselle', 'PROPN')],
[('Mademoiselle', 'PROPN'), ('Monsieur', 'PROPN')],
[('Monsieur', 'PROPN'), ('He', 'PRON')],
[('He', 'PRON'), ('is', 'AUX')],
[('is', 'AUX'), ('a', 'DET')],
[('a', 'DET'), ('difficult', 'ADJ')],
[('difficult', 'ADJ'), ('customer', 'NOUN')],
[('customer', 'NOUN'), ('that', 'SCONJ')],
[('that', 'SCONJ'), ('Rick', 'PROPN')],
[('Rick', 'PROPN'), ('One', 'NUM')],
[('One', 'NUM'), ('never', 'ADV')],
[('never', 'ADV'), ('knows', 'VERB')],
[('knows', 'VERB'), ('what', 'PRON')],
[('what', 'PRON'), ('he', 'PRON')],
[('he', 'PRON'), ('do', 'VERB')],
[('do', 'VERB'), ('or', 'CCONJ')],
[('or', 'CCONJ'), ('why', 'SCONJ')],
[('why', 'SCONJ'), ('But', 'CCONJ')],
[('But', 'CCONJ'), ('it', 'PRON')],
[('it', 'PRON'), ('is', 'AUX')],
[('is', 'AUX'), ('worth', 'ADJ')],
[('worth', 'ADJ'), ('a', 'DET')],
[('a', 'DET'), ('chance', 'NOUN')],
[('chance', 'NOUN'), ('Not', 'PART')],
[('Not', 'PART'), ('for', 'ADP')],
[('for', 'ADP'), ('sure', 'ADJ')],
[('sure', 'ADJ'), ('Monsieur', 'PROPN')],
[('Monsieur', 'PROPN'), ('but', 'CCONJ')],
[('but', 'CCONJ'), ('I', 'PRON')],
[('I', 'PRON'), ('will', 'AUX')],
[('will', 'AUX'), ('venture', 'VERB')],
[('venture', 'VERB'), ('to', 'PART')],
[('to', 'PART'), ('guess', 'VERB')],
[('guess', 'VERB'), ('that', 'SCONJ')],
[('that', 'SCONJ'), ('Ugarte', 'PROPN')],
[('Ugarte', 'PROPN'), ('left', 'VERB')],
[('left', 'VERB'), ('those', 'DET')],
[('those', 'DET'), ('letters', 'NOUN')],
[('letters', 'NOUN'), ('with', 'ADP')],
[('with', 'ADP'), ('Monsieur', 'PROPN')],
[('Monsieur', 'PROPN'), ('Rick', 'PROPN')],
[('Rick', 'PROPN'), ('Those', 'DET')],
[('Those', 'DET'), ('letters', 'NOUN')],
[('letters', 'NOUN'), ('were', 'AUX')],
[('were', 'AUX'), ('not', 'PART')],
[('not', 'PART'), ('found', 'VERB')],
[('found', 'VERB'), ('on', 'ADP')],
[('on', 'ADP'), ('Ugarte', 'PROPN')],
[('Ugarte', 'PROPN'), ('when', 'SCONJ')],
[('when', 'SCONJ'), ('they', 'PRON')],
[('they', 'PRON'), ('arrested', 'VERB')],
[('arrested', 'VERB'), ('him', 'PRON')],
[('him', 'PRON'), ('I', 'PRON')],
[('I', 'PRON'), ('observe', 'VERB')],
[('observe', 'VERB'), ('that', 'SCONJ')],
[('that', 'SCONJ'), ('you', 'PRON')],
[('you', 'PRON'), ('in', 'ADP')],
[('in', 'ADP'), ('one', 'NUM')],
[('one', 'NUM'), ('respect', 'NOUN')],
[('respect', 'NOUN'), ('are', 'AUX')],
[('are', 'AUX'), ('a', 'DET')],
[('a', 'DET'), ('very', 'ADV')],
[('very', 'ADV'), ('fortunate', 'ADJ')],
[('fortunate', 'ADJ'), ('man', 'NOUN')],
[('man', 'NOUN'), ('Monsieur', 'PROPN')],
[('Monsieur', 'PROPN'), ('I', 'PRON')],
[('I', 'PRON'), ('am', 'AUX')],
[('am', 'AUX'), ('moved', 'VERB')],
[('moved', 'VERB'), ('to', 'PART')],
[('to', 'PART'), ('make', 'VERB')],
[('make', 'VERB'), ('one', 'NUM')],
[('one', 'NUM'), ('more', 'ADJ')],
[('more', 'ADJ'), ('suggestion', 'NOUN')],
[('suggestion', 'NOUN'), ('why', 'SCONJ')],
[('why', 'SCONJ'), ('I', 'PRON')],
[('I', 'PRON'), ('do', 'AUX')],
[('do', 'AUX'), ('not', 'PART')],
[('not', 'PART'), ('know', 'VERB')],
[('know', 'VERB'), ('because', 'SCONJ')],
[('because', 'SCONJ'), ('it', 'PRON')],
[('it', 'PRON'), ('can', 'AUX')],
[('can', 'AUX'), ('not', 'PART')],
[('not', 'PART'), ('possibly', 'ADV')],
[('possibly', 'ADV'), ('profit', 'VERB')],
[('profit', 'VERB'), ('me', 'PRON')],
[('me', 'PRON'), ('but', 'CCONJ')],
[('but', 'CCONJ'), ('have', 'AUX')],
[('have', 'AUX'), ('you', 'PRON')],
[('you', 'PRON'), ('heard', 'VERB')],
[('heard', 'VERB'), ('about', 'ADP')],
[('about', 'ADP'), ('Signor', 'PROPN')],
[('Signor', 'PROPN'), ('Ugarte', 'PROPN')],
[('Ugarte', 'PROPN'), ('and', 'CCONJ')],
[('and', 'CCONJ'), ('the', 'DET')],
[('the', 'DET'), ('letters', 'NOUN')],
[('letters', 'NOUN'), ('of', 'ADP')],
[('of', 'ADP'), ('transit', 'NOUN')],
[('transit', 'NOUN'), ('Well', 'INTJ')],
[('Well', 'INTJ'), ('good', 'ADJ')],
[('good', 'ADJ'), ('luck', 'NOUN')],
[('luck', 'NOUN'), ('But', 'CCONJ')],
[('But', 'CCONJ'), ('be', 'AUX')],
[('be', 'AUX'), ('careful', 'ADJ')],
[('careful', 'ADJ'), ('You', 'PRON')],
[('You', 'PRON'), ('know', 'VERB')],
[('know', 'VERB'), ('you', 'PRON')],
[('you', 'PRON'), ('being', 'AUX')],
[('being', 'AUX'), ('shadowed', 'VERB')],
[('shadowed', 'VERB'), ('We', 'PRON')],
[('We', 'PRON'), ('might', 'AUX')],
[('might', 'AUX'), ('as', 'ADV')],
[('as', 'ADV'), ('well', 'ADV')],
[('well', 'ADV'), ('be', 'AUX')],
[('be', 'AUX'), ('frank', 'ADJ')],
[('frank', 'ADJ'), ('Monsieur', 'PROPN')],
[('Monsieur', 'PROPN'), ('It', 'PRON')],
[('It', 'PRON'), ('will', 'AUX')],
[('will', 'AUX'), ('take', 'VERB')],
[('take', 'VERB'), ('a', 'DET')],
[('a', 'DET'), ('miracle', 'NOUN')],
[('miracle', 'NOUN'), ('to', 'PART')],
[('to', 'PART'), ('get', 'VERB')],
[('get', 'VERB'), ('you', 'PRON')],
[('you', 'PRON'), ('out', 'ADP')],
[('out', 'ADP'), ('of', 'ADP')],
[('of', 'ADP'), ('Casablanca', 'PROPN')],
[('Casablanca', 'PROPN'), ('And', 'CCONJ')],
[('And', 'CCONJ'), ('the', 'DET')],
[('the', 'DET'), ('Germans', 'PROPN')],
[('Germans', 'PROPN'), ('have', 'AUX')],
[('have', 'AUX'), ('outlawed', 'VERB')],
[('outlawed', 'VERB'), ('miracles', 'NOUN')],
[('miracles', 'NOUN'), ('As', 'ADP')],
[('As', 'ADP'), ('leader', 'NOUN')],
[('leader', 'NOUN'), ('of', 'ADP')],
[('of', 'ADP'), ('all', 'DET')],
[('all', 'DET'), ('illegal', 'ADJ')],
[('illegal', 'ADJ'), ('activities', 'NOUN')],
[('activities', 'NOUN'), ('in', 'ADP')],
[('in', 'ADP'), ('Casablanca', 'PROPN')],
[('Casablanca', 'PROPN'), ('I', 'PRON')],
[('I', 'PRON'), ('am', 'AUX')],
[('am', 'AUX'), ('an', 'DET')],
[('an', 'DET'), ('influential', 'ADJ')],
[('influential', 'ADJ'), ('and', 'CCONJ')],
[('and', 'CCONJ'), ('respected', 'ADJ')],
[('respected', 'ADJ'), ('man', 'NOUN')],
[('man', 'NOUN'), ('It', 'PRON')],
[('It', 'PRON'), ('would', 'AUX')],
[('would', 'AUX'), ('not', 'PART')],
[('not', 'PART'), ('be', 'AUX')],
[('be', 'AUX'), ('worth', 'ADJ')],
[('worth', 'ADJ'), ('my', 'PRON')],
[('my', 'PRON'), ('life', 'NOUN')],
[('life', 'NOUN'), ('to', 'PART')],
[('to', 'PART'), ('do', 'VERB')],
[('do', 'VERB'), ('anything', 'PRON')],
[('anything', 'PRON'), ('for', 'ADP')],
[('for', 'ADP'), ('Monsieur', 'PROPN')],
[('Monsieur', 'PROPN'), ('Laszlo', 'PROPN')],
[('Laszlo', 'PROPN'), ('You', 'PRON')],
[('You', 'PRON'), ('however', 'ADV')],
[('however', 'ADV'), ('are', 'AUX')],
[('are', 'AUX'), ('a', 'DET')],
[('a', 'DET'), ('different', 'ADJ')],
[('different', 'ADJ'), ('matter', 'NOUN')]]
# Define a function to get the neighboring words based on bigrams
def get_neighbor_words(keyword, bigrams, pos_label = None):
neighbor_words = []
keyword = keyword.lower()
for bigram in bigrams:
# extract just the lowercased words (not the labels) for each bigram
words = [word.lower() for word, label in bigram]
# check to see if keyword is in the bigram
if keyword in words:
for word, label in bigram:
if word.lower() != keyword: # focusing on the neighbor word, not the keyword
neighbor_words.append(word.lower())
# return the word list after sorting it
return Counter(neighbor_words).most_common()
# Get the neighboring words of the character Rick
get_neighbor_words("Rick", bigrams)
[('stay', 1),
('would', 1),
('be', 1),
('without', 1),
('enough', 1),
('do', 1),
('partner', 1),
('i', 1),
('dear', 1),
('when', 1),
('hello', 1),
('it', 1),
('that', 1),
('one', 1),
('monsieur', 1),
('those', 1)]
# Print the neighboring words of the character Rick in each character's lines of the movie Casablanca and the characters' names.
# Write a loop through the movie_casablanca dataframe, copy the codes above of creating a list of tokens and POS labels
# from each line if the token is a word, and then get all bigram of the lines. Finally, print movie_casablanca['cname'] of
# that line, along with the neighboring words of Rick in that line.
for i in range(len(movie_casablanca['lines'])):
tokens_and_labels = [(token.text, token.pos_) for token in nlp(movie_casablanca['lines'][i]) if token.is_alpha]
bigrams = get_bigrams(tokens_and_labels)
print(movie_casablanca['cname'][i], get_neighbor_words('Rick',bigrams))
ANNINA [('monsieur', 3), ('i', 1), ('what', 1)]
FERRARI [('stay', 1), ('would', 1), ('be', 1), ('without', 1), ('enough', 1), ('do', 1), ('partner', 1), ('i', 1), ('dear', 1), ('when', 1), ('hello', 1), ('it', 1), ('that', 1), ('one', 1), ('monsieur', 1), ('those', 1)]
ILSA [('no', 3), ('i', 3), ('is', 2), ('what', 2), ('oh', 1), ('but', 1), ('the', 1), ('will', 1), ('yes', 1), ('story', 1), ('do', 1), ('me', 1), ('it', 1), ('not', 1), ('some', 1), ('goodbye', 1), ('god', 1), ('about', 1), ('with', 1), ('he', 1), ('hello', 1), ('who', 1)]
LASZLO [('day', 1), ('do', 1), ('about', 1), ('in', 1)]
RENAULT [('and', 3), ('in', 2), ('with', 2), ('is', 2), ('about', 2), ('you', 2), ('there', 2), ('earlier', 1), ('met', 1), ('mademoiselle', 1), ('but', 1), ('sam', 1), ('monsieur', 1), ('victor', 1), ('if', 1), ('has', 1), ('of', 1), ('courage', 1), ('comes', 1), ('this', 1), ('at', 1), ('everybody', 1), ('to', 1), ('realizing', 1), ('well', 1), ('later', 1), ('huh', 1), ('germans', 1), ('have', 1), ('no', 1), ('never', 1), ('makes', 1), ('a', 1), ('half', 1), ('laszlo', 1), ('casablanca', 1), ('that', 1), ('know', 1), ('we', 1), ('less', 1)]
RICK [('owe', 1), ('a', 1)]
SAM []
STRASSER [('do', 1), ('i', 1), ('about', 1), ('himself', 1)]
UGARTE [('know', 2), ('hide', 1), ('me', 1), ('do', 1), ('something', 1), ('help', 1), ('i', 1), ('well', 1), ('after', 1), ('person', 1), ('if', 1), ('watching', 1), ('hello', 1)]
What insights can be derived from the movie dialogue dataset using the Named Entity Recognition extraction method? Similarly, what insights can be gained through the Position-of-Speech tagging method? Please provide an example for each method, either based on the experiments covered above or by thinking of something else.
Some of the key takeaways we can pull from the movie dialogues using the Named entity recognition is being able to identify different types of entities in the movie dialogue. Although not 100% accurate this does serve to get an idea of how many People, Time, Cardinal, etc are in a given text. For instance if we are doing an analysis of the movie magnolia like in Task 1 but analyzing the motif of time we can analyze the occurrences of different words like today, tonight, years, etc. By doing this we can extract different themes from the text/movie. With position-of-speech tagging we can understand what common words are surrounding characters from a piece of work like a movie/text/etc. What we did in Task 4 is show different characters names with neighboring words. This can give a lot of context for the characters and their importance/role in the play such as Annina being referred to as 'Monsieur' multiple times showing that she is a woman of high importance maybe even high social status or class.