All assignments must be well documented in written lab reports,
which will be graded.
(Lab 2 can give a maximum of 10
points.)
The reports may be written in either any Scandinavian
language or English.
Send in the solution to a specific lab exercise
no later than one month after the day the assignment was handed
out (NB! points will be deducted for lab reports delivered after the
deadline).
Thus the deadline for lab 2 is Tuesday March 26.
The assignments will be handed out and explained at the scheduled
time.
Attendance at the lecture about the assignment is
recommended but not obligatory.
The lab consists of two parts, where the first concerns basic grammars and the second parsers in Prolog.
For students attending LING2206, parts 1C2, and 2F are
not obligatory
(and the lab report will not be graded).
import nltk
if you have problems installing the toolkit on your computer, please make contact.
cfg.py
.
Have a look at it and load it into Python.
from cfg import mygrammar
grammarobj = nltk.parse_cfg(mygrammar)
sent = "a man saw a cat".split()
rd_parser = nltk.RecursiveDescentParser(grammarobj)
for tree in rd_parser.nbest_parse(sent):
print tree
NP -> NP PP
what happens if you straight-forwardly introduce this rule? And how can you
nltk.RecursiveDescentParser() nltk.ShiftReduceParser() nltk.EarleyChartParser() nltk.BottomUpChartParser() nltk.EarleyChartParser()
Try out each of the parsers on some simple sentences and familiarise
yourself with how they work.
To analyse, for example, the sentence a man saw a man with a cat
with the grammar you have crafted and the recursive-descent parser, try:
sent = "a man saw a man with a cat".split() rd_parser = nltk.RecursiveDescentParser(grammar2,trace=2) for tree in rd_parser.nbest_parse(sent): print treeUse different sentences and different versions of your grammar depending on
NLTK offers a visualization application for the recursive-descent parser.
Open this application by typing:
nltk.app.rdparser()
and paste the grammar you have crafted into the applications grammar from the
edit menu. Click autostep and watch as it parses your input sentences. Apart
from not being able to use grammars containing rules as the ones you straight-
forwardly implemented above, this parser has another major weakness. What is
this weakness?
The recursive-descent parser in the previous builds the trees top-down,
recursively trying to expand new rules that mach the input. Use the
visualization tool to step through how the parser parses an input
sentence and in what order it expands the rules.
Write a description of this process and show some steps (3-4) of the way from the
visualization to illustrate your explanation.
If you instead try the simple shift-reduce parser described in top of part 2
you will notice that this parser has problems finding a parse if the grammar
contains empty rules. Explain why this happens.
Note that this shift-reduce parser is a very simple implementation. It does
not backtrack, and is therefore not guaranteed to find a parse even for a
valid sentence according to your grammar.
Illustrate your answers with visualizations from the parser tool
nltk.app.sdparser()
.
If the rules in our recursive descent parser were to be to be expanded in a breadth-first
fashion as opposed to depth-first, what would the consequences be for the parser?
Briefly describe what the difference betweeh depth-first and breadth-first parsing
are, and when this will matter.
As you have seen from the visualization of the top-down recursive-descent parser
it has to do a lot of work over again, as it tries to expand new rules to match the
input. Explain how chart parsing is able to mitigate this.
A bottom-up version av a chart parser is the Cook-Kasami-Younger (CKY) algorithm.
Implement this and test it on a grammar i Chomsky Normal Form (CNF). (J&M exercise 13.3).
Why does the grammar have to be in CNF?
You are free to use a programming language of choice. But the many implementations in
NLTK should give you many hints, and certainly verification that your parser works correctly.
Create a corpus of test sentences (at least 10-12, the more the better)
and run the list through each of the parsers in turn, measuring the
processing time and the number of processing steps needed for each parser
(number of rule calls and lexicon look-ups).
To evaluate the running time of Python functions, the module timeit
is convenient. It consists of two parts, something to be repeated, and setup
statement that is carried out once. Code for testing the running time of one
of these examples could be done like this:
from timeit import Timer # first argument is the code to be run, the second "setup" argument is only run once, # and it not included in the execution time. t = Timer("""rd_parser.nbest_parse(sent)""", setup="""import nltk; from cfg import grammar1; grammar2=nltk.parse_cfg(grammar1); rd_parser=nltk.RecursiveDescentParser(grammar2); sent="the man saw a man".split() """) print t.timeit(1000) # repeat 1000 times instead of the default 1millioncompare the recursive-descent parser to the a chart parser of choice. Which parser