Parsing Engine

Package danbikel.parser

Provides the core framework of this extensible statistical parsing engine.


Interface Summary
CountsTable Specifies a mapping between objects and floating-point (double) counts that may be incremented or decremented.
DecoderServerRemote Specifies all methods necessary for a decoder client to get its settings and top-level probabilities from a server object.
Event Provides the specification for arbitrary event types, to be used when collecting counts for and computing probabilities of arbitrary events.
HeadFinder Specifies the methods for the head-finding component of a language package.
MutableEvent Provides additional methods to those of Event that permit modification of the event object.
NonterminalMapper Specifies a single method to map a symbol representing a nonterminal to another symbol, typically an equivalence class.
ParserRemote An interface to serve as a semantic marker that this is a parsing client for a Switchboard instance.
Shift Methods used for the construction of prior states in the Markov process of creating modifier nonterminals.
Subcat Specification for a collection of required arguments to be generated by a parser, also known as a subcategorization frame.
SubcatFactory Specification for a Subcat object factory, to be used by the Subcats static factory class.
TrainerEvent An interface to allow iteration over various kinds of events used by the class Trainer.
Training Specifies methods for language-specific preprocessing of training parse trees.
Treebank A Treebank implementation provides data and methods specific to the structures found in a particular Treebank.
WordFactory Specifies methods for constructing Word objects.
WordFeatures Specifies the methods for getting a word's features in vector form, as represented by the print-name of a symbol.
WordList An interface to specify a fixed-size list of Word objects.

Class Summary
AbstractEvent A convenience class that simply implements the equals method, as specified by the contract in Event.equals(Object).
AnalyzeDisns An analysis and debugging class to analyze the probability distributions of all Models in a ModelCollection.
BaseNPAwareShifter An implementation of the Shift interface that does not shift punctuation into the history when the current parent node label is that of a base NP.
BiCountsTable Provides a mapping between objects and two floating-point (double) values that may be incremented or decremented.
BrokenSubcatBag A “broken” version of SubcatBag that precisely reflects the details specified in Collins’ thesis (used for “clean-room” implementation).
BrokenSubcatBagFactory A factory for creating BrokenSubcatBag objects.
CachingDecoderServer A wrapper object for a DecoderServerRemote instance that provides probability caching.
Chart Provides the skeletal infrastructure for a chart indexed by start and end words, as well as by arbitrary labels taken from the chart items.
Chart.Entry Contains all information and items covering a particular span.
CKYChart Implementation of a chart for probabilistic Cocke-Kasami-Younger (CKY) parsing.
CKYItem An item in a CKYChart for use when parsing via a probabilistic version of the CKY algorithm.
CKYItem.BaseNPAware A base NP–aware version of CKYItem that overrides CKYItem.BaseNPAware.equals(java.lang.Object) and CKYItem.BaseNPAware.hashCode() to take into account the lack of dependence on the distance metric when the root label of an item's set of derivations is NPB.
CKYItem.KBestHack A hack to approximate k-best parsing by effectively turning off dynamic programming (usability depends on reducing the beam size from its normal value).
CKYItem.MappedPrevModBaseNPAware Overrides equals and hashCode methods to compare the last previous modifier on each side of each chart item's head child with respect to their respective equivalence classes, as determined by the mapping provided by
CKYItem.PrevModIsStart Overrides equals and hashCode methods to take the last previous modifier into account only insofar as its equality to the initial Training.startSym() modifier.
Collins Provides a nonterminal mapping scheme that, when applied to previously-generated modifiers, allows for emulation of Michael Collins' modifier-generation model.
Constants Contains static constants for use by this package.
CountsTableImpl Provides a mapping between objects and floating-point (double) counts that may be incremented or decremented.
CountsTrio Class for grouping the three counts tables necessary for counting transitions, histories and unique transitions (or diversity counts for the history events).
Decoder Provides the methods necessary to perform CKY parsing on input sentences.
DecoderServer Provides probabilities and other resources needed by decoders.
DefaultShifter A default implementation of the Shift interface that simply shifts every modifier or word, skipping nothing.
EMChart Implementation of a chart for performing constrained CKY parsing so as to perform the E-step of the Inside-Outside algorithm.
EMChart.Entry Contains all information and items covering a particular span.
EMDecoder Provides the methods necessary to perform constrained CKY parsing on input sentences so as to perform the E-step of the Inside-Outside EM algorithm.
EMItem.AntecedentPair Holds references to the one or two antecedents that yielded a particular consequent, along with the one or more events that generated the consequent.
EMParser An EM parsing client.
EventCountsWriter Provides a method to write CountsTable objects containing counts of TrainerEvent objects to a file or an output stream.
FileBackedTrainerEventMap Presents an immutable map of a type of TrainerEvent objects to observed counts, backed by a file of the form output by Trainer.writeStats(
GapEvent A class to represent the gap generation event implicit in the models supported by this parsing package.
HeadEvent A class to represent the head generation event implicit in the models supported by this parsing package.
HeadTreeNode Provides a convenient data structure for navigating a parse tree in which heads have been found and percolated up through the tree.
IdentityNTMapper Provides the identity mapping function.
InterpolatedKnesserNeyModel Implements a model that uses interpolated Knesser-Ney smoothing.
Item Skeletal class to represent items in a parsing chart.
JointModel Provides a mechanism for grouping related Model objects in order to estimate the probability of some joint event.
Language Provides objects that perform functions specific to a particular language and/or Treebank.
Model This class computes the probability of generating an output element of this parser, where an output element might be, for example, a word, a part of speech tag, a nonterminal label or a subcat frame.
ModelCollection Provides access to all Model objects and maps necessary for parsing.
ModifierEvent A class to represent the modifier generation event implicit in the models supported by this parsing package.
Nonterminal Representation of all possible data present in a complex nonterminal annotation: the base label, any augmentations and any index.
NTMapper A class that provides a static method for mapping nonterminals,
Parser A parsing client.
PrintDisn Provides a single static method, printLogProbDisn, as well as a PrintDisn.main(String[]) method, to print a log-probability distribution for a particular event in a particular model of a model collection.
PriorEvent A class to represent the marginal probabilities of lexicalized nonterminals (loosely, if incorrectly, referred to as “prior probabilities”).
ProbabilityCache A cache for storing arbitrary objects with their probabilities.
ProbabilityStructure Abstract class to represent the probability structure—the entire set of of back-off levels, including the top level—for the estimation of a particular parameter class in the overall parsing model (using "class" in the statistical, non-Java sense of the word).
Settings Provides static settings for this package, primarily via an internal Properties object.
SexpEvent Represents an event composed of one or more Sexp objects.
SexpNumberedObjectReader Reads an underlying stream with a SexpTokenizer, converting S-expressions of the form (num processed obj), where obj is a Sexp and processed is a Symbol whose print-name is the output of String.valueOf(boolean), to NumberedObject objects.
SexpNumberedObjectReaderFactory The default NumberedSentenceReaderFactory used by Switchboard.
SexpObjectReader Reads an underlying stream with a SexpTokenizer, reading each S-expression as a object and returning it when SexpObjectReader.readObject() is invoked.
SexpObjectReaderFactory The default factory used to construct ObjectReader objects by the Switchboard class.
SexpSubcatEvent Represents an event composed of zero or more Sexp objects and zero or one Subcat object.
Shifter A class containing only static methods that mirror the signatures of the Shift interface, allowing a convenient flow-through mechanism to an internal static Shift object, the exact type of which is determined by the value of Settings.shifterClass.
SubcatBag Provides a bag implementation of subcat requirements (a bag is a set that allows multiple occurrences of the same item).
SubcatBagFactory A factory for creating SubcatBag objects.
SubcatList Implements subcats where requirements need to be met in the order in which they are added to this subcat (the strictest form of a subcat).
SubcatListFactory A factory for creating SubcatList objects.
Subcats Static factory for Subcat objects.
SymbolicCollectionWriter Provides static methods to write out the contents of a Map or a Set in an S-expression format.
SymbolPair A simple class for holding a pair of Symbol objects.
Trainer Derives all counts necessary to compute the probabilities for this parser, including the top-level counts and all derived counts.
Trainer.EventEntry Class to represent a MapToPrimitive.Entry object for use by the Trainer.getEventIterator(danbikel.lisp.SexpTokenizer, danbikel.lisp.Symbol) method.
Transition Represents the transition from a particular history to a particular future, to be used when computing the conditional probability of seeing a particular future in the context of a particular history.
Word A Word object is a structured representation of a word.
WordListFactory Provides methods to create new WordList objects.
Words Provides static methods to create Word instances via an internal WordFactory instance.

Exception Summary
Decoder.TimeoutException Exception to be thrown when the maximum parse time has been reached.

Package danbikel.parser Description

Provides the core framework of this extensible statistical parsing engine. Unlike previous approaches, the framework encapsulates all language- and Treebank-specific information in a language package extension, providing a great deal of language-independence.

Several of the classes in this package, particularly the smaller classes that serve primarily as record data structures, implement a copy method. The general contract of this method is to return a deep copy of the object. Java does not permit this sort of semantics to be formally specified, as there is no way for an abstract class or interface to specify that a method of an implementing class must return its own type, because types are not parameterized; hence, we are specifying the contract for this method name here. However, all classes that implement copy also implement the Cloneable interface, where the clone method simply calls the class’ copy method.

Parsing Engine

Author: Dan Bikel.