Parsing Engine

danbikel.parser
Class Model

java.lang.Object
  extended bydanbikel.parser.Model
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
InterpolatedKnesserNeyModel, JointModel

public class Model
extends Object
implements Serializable

This class computes the probability of generating an output element of this parser, where an output element might be, for example, a word, a part of speech tag, a nonterminal label or a subcat frame. It derives counts from top-level TrainerEvent objects, storing these derived counts in its internal data structures. The derived counts are necessary for the smoothing of the top-level probabilities used by the parser, and the particular structure of those levels of smoothing (or, less accurately, back-off) are specified by the ProbabilityStructure argument to the constructor and to the estimateLogProb(int,TrainerEvent) method.

N.B.: While the name of this class is “Model”, more strictly speaking it computes the probabilities for an entire class of parameters used by the overall parsing model. As such—using a looser definition of the term “model”—this class can be considered to represent a “submodel”, in that it contains a model of the generation of a particular type of output element of this parser.

See Also:
ProbabilityStructure, Serialized Form

Field Summary
protected  HashMap[] backOffMap
          A set of numLevels - 1 maps, where map i is a map from back-off level i transitions to i + 1 transitions.
protected  ProbabilityCache[] cache
          A cache of probability estimates at the various back-off levels of this model, used when precomputeProbs is false.
protected  int[] cacheAccesses
          Records the number of cache accesses for each back-off level of this model.
protected  int[] cacheHits
          Records the number of cache hits for each back-off level of this mdoel.
protected  FlexibleMap canonicalEvents
          A reflexive map of canonical Event objects to save memory in the various tables of this model that store such Event objects.
protected  CountsTrio[] counts
          The derived event counts used to estimate probabilities of this model.
protected  boolean createHistBackOffMap
          Indicates whether the histBackOffMap should be created when precomputing probabilities.
protected static boolean deleteCountsWhenPrecomputingProbs
          Indicates whether to set counts to null just before writing this model object to an ObjectOutputStream.
protected  boolean dontAddNewParams
          The boolean value of the Settings.dontAddNewParams setting.
protected  boolean doPruning
          The value of this data member determines whether this model will be pruned when probabilities are precomputed.
protected static boolean globalDoPruning
          Caches the value of Settings.modelDoPruning.
protected  HashMap[] histBackOffMap
          A set of numLevels - 1 maps, where map i is a map from back-off level i histories to i + 1 histories.
protected  HashSet[] historiesToPrune
          A set of sets used to collect histories that are to be pruned.
protected  double[] lambdaFudge
          A cached copy of the smoothing factors of the ProbabilityStructure used by this model.
protected  double[] lambdaFudgeTerm
          A cached copy of the smoothing terms of the ProbabilityStructure used by this model.
protected  double[] lambdaPenalty
          A cached copy of the smoothing penalty factors contained in the ProbabilityStructure used by this model.
protected  double[] logOneMinusLambdaPenalty
          The values of lambdaPenalty but modified such that
logOneMinusLambdaPenalty[i] = Math.log(1 - lambdaPenalty[i])
for all i: 0 ≤ i < lambdaPenalty.size.
protected  int numLevels
          A cached copy of the number of back-off levels in the ProbabilityStructure used by this model.
protected  HashMapDouble[] precomputedLambdas
          Precomputed lambdas for each back-off level of this model.
protected  int precomputedNPBProbCalls
          Records the number of times estimateLogProbUsingPrecomputed(Transition,int) is invoked requesting a probability for an event whose history context has a base NP (NPB) the parent nonterminal.
protected  int[] precomputedNPBProbHits
          Records the number of “hits” to the caches of precomputed probability estimates at the various back-off levels when the caller requests a probability for a context that has a base NP (NPB) as the parent nonterminal.
protected  int precomputedProbCalls
          Records the number of times estimateLogProbUsingPrecomputed(Transition,int) is invoked.
protected  int[] precomputedProbHits
          Records the number of “hits” to the caches of precomputed probability estimates at the various back-off levels, to determine the amount each back-off level is used while decoding.
protected  HashMapDouble[] precomputedProbs
          Precomputed probabilities for each back-off level of this model.
protected static boolean precomputeProbs
          The boolean value of Settings.precomputeProbs, cached here for convenience.
protected static boolean printPrunedEvents
          Indicates whether the method pruneHistoriesAndTransitions() will output pruned events to a special pruned event log file.
protected static boolean printUnprunedEvents
          Indicates whether the method pruneHistoriesAndTransitions() will output events that were not pruned to a special pruned event log file.
protected static double pruningThreshold
          Caches the double value of Settings.modelPruningThreshold.
protected static boolean saveBackOffMap
          If true, indicates that backOffMap should not be set to null after probabilities have been precomputed, which means that it will be saved with this Model instance (for debugging purposes); otherwise, backOffMap is set to null just after precomputation of probabilities.
protected static boolean saveHistBackOffMap
          Indicates whether the histBackOffMap should be created when precomputing probabilities and saved with this Model for debugging purposes.
protected  boolean saveSmoothingParams
          The boolean value of the Settings.saveSmoothingParams setting.
protected  String shortStructureClassName
          The value of structureClassName but without the package qualification.
protected  CountsTable[] smoothingParams
          The smoothing parameters for the history contexts (Event instances) at the back-off levels of this model.
protected  String smoothingParamsFile
          The value of the smoothing parameters file for this model, as given by ProbabilityStructure.smoothingParametersFile().
protected  ProbabilityStructure structure
          The probability structure for this model to use.
protected  String structureClassName
          A cached copy of the name of the concrete type of the ProbabilityStructure instance used by this model.
protected  ProbabilityCache topLevelCache
          A currently-unused cache of probabilities of TrainerEvent objects.
protected  HashSet[] transitionsToPrune
          A set of sets used to collect transitions that are to be pruned.
protected static boolean useCache
          A constant that indicates whether this Model should perform probability caching.
protected  boolean useSmoothingParams
          The boolean value of the Settings.useSmoothingParams setting.
protected  boolean verbose
          Indicates whether to report to stderr what this class is doing.
protected static boolean warnSmoothingHasHistoryNotInTraining
          The value of this constant determines whether estimateProb(ProbabilityStructure,TrainerEvent) emits a warning when it encounters a history for which there is a saved smoothing parameter but was not an observed history as far as the current model is concerned.
 
Constructor Summary
Model(ProbabilityStructure structure)
          Constructs a new object for deriving all counts using the specified probability structure.
 
Method Summary
 void beQuiet()
          Causes this class not to output anything to System.err during the invocation of its methods, such as deriveCounts(CountsTable,Filter,double,FlexibleMap).
 void beVerbose()
          Causes this class to be verbose in its output to System.err during the invocation of its methods, such as deriveCounts(CountsTable,Filter,double,FlexibleMap).
 void canonicalize()
          Since events are typically read-only, this method will allow for canonicalization (or "unique-ifying") of the information contained in the events contained in this object.
 void canonicalize(FlexibleMap map)
          Since events are typically read-only, this method will allow for canonicalization (or "unique-ifying") of the information contained in the events contained in this object using the specified map.
protected static Event canonicalizeEvent(Event event, FlexibleMap canonical)
          This method first canonicalizes the information in the specified event (a Sexp or a Subcat and a Sexp), then it returns a canonical version of the event itself, copying it into the map if necessary.
protected  void cleanup()
          A method invoked after probabilities have been precomputed by precomputeProbs() to clean up (that is, remove) objects from the various counts tables that are no longer needed, as determined by ProbabilityStructure.removeHistory(int,Event) and ProbabilityStructure.removeTransition(int,Transition).
protected  void computeHistoriesAndTransitionsToPrune()
          Schedule for pruning every history and transition whose MLE is equal to that of back-off level's transition.
 void deriveCounts(CountsTable trainerCounts, Filter filter, double threshold, FlexibleMap canonical)
          Derives all counts from the specified counts table, using the probability structure specified in the constructor.
 void deriveCounts(CountsTable trainerCounts, Filter filter, double threshold, FlexibleMap canonical, boolean deriveOtherModelCounts)
          Derives all counts from the specified counts table, using the probability structure specified in the constructor.
protected  void deriveDiversityCounts()
          Deprecated. This method used to be called by deriveCounts(CountsTable,Filter,double,FlexibleMap,boolean), but diversity counts are now derived directly by that method.
protected  void deriveHistories(CountsTable trainerCounts, Filter filter, FlexibleMap canonical)
          Deprecated. This method used to be called by deriveCounts(CountsTable,Filter,double,FlexibleMap,boolean), but histories are now derived directly by that method.
 double estimateLogProb(int id, TrainerEvent event)
          Estimates the log-probability of a conditional event.
protected  double estimateLogProbUsingPrecomputed(ProbabilityStructure structure, TrainerEvent event)
          Estimates the log prob using precomputed probabilities and smoothing values (lambdas).
protected  double estimateLogProbUsingPrecomputed(Transition transition, int atLevel)
          Estimates the log prob of the specified transition using precomputed probabilities and lambdas and histBackOffMap (debugging method).
 double estimateProb(int id, TrainerEvent event)
          Estimates the probability of a conditional event.
protected  double estimateProb(ProbabilityStructure probStructure, TrainerEvent event)
          Returns the smoothed probability estimate of a transition contained in the specified TrainerEvent object.
protected  double estimateProbOld(ProbabilityStructure structure, TrainerEvent event, int level, double prevHistCount)
           
 String getCacheStats()
          Returns a human-readable string containing all the precomputed or non-precomputed probability cache statistics for the life of this Model object.
protected static Transition getCanonical(Transition trans, FlexibleMap canonical)
          This method assumes trans already contains a canonical history and a canonical future.
 Model getModel(int idx)
          Returns this model object.
 ProbabilityStructure getProbStructure()
          Returns the type of ProbabilityStructure object used during the invocation of deriveCounts(CountsTable,Filter,double,FlexibleMap).
protected  Transition[] getTransitions(Transition zeroLevelTrans, Transition[] trans)
          Inserts the Transition objects representing conditional events for all back-off levels of this model into the specified array, with trans[0] = zeroLevelTrans.
protected  void initializeSmoothingParams()
          Sets up the smoothing parameter arrays and maps.
 int numModels()
          Returns 1, as this object does not contain any other, internal Model instances.
 void precomputeProbs()
          Precomputes all probabilities and smoothing values for events seen during all previous invocations of deriveCounts(CountsTable,Filter,double,FlexibleMap,boolean).
protected  void precomputeProbs(CountsTable trainerCounts, Filter filter)
          Deprecated. This method has been superseded by precomputeProbs().
protected  void precomputeProbs(MapToPrimitive.Entry transEntry, double[] lambdas, double[] estimates, Transition[] transitions, Event[] histories, int lastLevel)
          Precomputes the probabilities and smoothing values for the Transition object contained as a key within the specified map entry, where the value is the count of the transition.
protected  void precomputeProbs(TrainerEvent event, Transition[] transitions, Event[] histories)
          Deprecated. This method is called by precomputeProbs(CountsTable,Filter), which is also deprecated.
protected  void pruneHistoriesAndTransitions()
          Analyzes the distributions of this model in order to prune history and transition (i.e., conditional) events from the various counts tables.
protected  void pruneHistoriesAndTransitionsOld()
          Prune every history and transition with a back-off level less than the last level in which the last level history has a diversity of 1 (meaning that the probability is 1, so no need to store a history and transition).
protected  void readSmoothingParams()
          Reads all necessary smoothing parameters from smoothingParamsFile instead of deriving values for smoothing parameters.
protected  void readSmoothingParams(boolean verboseOutput)
          Reads all necessary smoothing parameters from smoothingParamsFile instead of deriving values for smoothing parameters.
protected  void savePrecomputeData(CountsTable trainerCounts, Filter filter)
          Saves the back-off chain for each event derived from each TrainerEvent in the key set of the specified counts table.
 void setCanonicalEvents(FlexibleMap canonical)
          Sets the canonicalEvents member of this object.
 void share(int backOffLevel, Model otherModel, int otherModelBackOffLevel)
          Indicates to use counts or precomputed probabilities from the specified back-off level of this model when estimating probabilities for the specified back-off level of another model.
protected  void storePrecomputedProbs(double[] lambdas, double[] estimates, Transition[] transitions, Event[] histories, int lastLevel)
          Stores the specified smoothing values (lambdas) and smoothed probability estimates in the precomputedProbs and smoothingParams map arrays.
protected  void writeSmoothingParams()
          Writes the smoothing parameters of this model to the file named by smoothingParamsFile.
protected  void writeSmoothingParams(boolean verboseOutput)
          Writes the smoothing parameters of this model to the file named by smoothingParamsFile.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

warnSmoothingHasHistoryNotInTraining

protected static final boolean warnSmoothingHasHistoryNotInTraining
The value of this constant determines whether estimateProb(ProbabilityStructure,TrainerEvent) emits a warning when it encounters a history for which there is a saved smoothing parameter but was not an observed history as far as the current model is concerned. When using smoothing parameters from another training run, it is typical to be operating with a model trained from the same data. In such a case, the set of history contexts observed during the training run that produced the smoothing parameters would be identical to the set of history contexts encountered when training again on that same data. However, there are circumstances when a history context observed in the smoothing training run would not be observed in the subsequent training run, such as when performing EM there is, for example, a long sentence with lots of structure whose total inside probability mass is less than Double.MIN_VALUE. In such a case, the EMDecoder will issue an underflow warning and not emit any expected events for that sentence. If a history context was only observed in the one or more sentences that have underflow problems in a particular EM iteration, then it will effectively not be observed in that iteration, and therefore not in any subsequent iteration.

See Also:
Constant Field Values

deleteCountsWhenPrecomputingProbs

protected static final boolean deleteCountsWhenPrecomputingProbs
Indicates whether to set counts to null just before writing this model object to an ObjectOutputStream. Normally, this boolean should be true, but setting it to false can be useful for debugging purposes.

See Also:
AnalyzeDisns, Constant Field Values

precomputeProbs

protected static final boolean precomputeProbs
The boolean value of Settings.precomputeProbs, cached here for convenience.


useCache

protected static final boolean useCache
A constant that indicates whether this Model should perform probability caching. This constant is usually true, but may redefined as false for debugging purposes (recompilation is necessary after redefining this constant).

See Also:
Constant Field Values

saveBackOffMap

protected static final boolean saveBackOffMap
If true, indicates that backOffMap should not be set to null after probabilities have been precomputed, which means that it will be saved with this Model instance (for debugging purposes); otherwise, backOffMap is set to null just after precomputation of probabilities. Normally, the value of this boolean should be false. The value of this boolean is only consulted when Settings.precomputeProbs is true.

See Also:
Constant Field Values

saveHistBackOffMap

protected static final boolean saveHistBackOffMap
Indicates whether the histBackOffMap should be created when precomputing probabilities and saved with this Model for debugging purposes. Normally, the value of this boolean should be false. The value of this boolean is only consulted when Settings.precomputeProbs is true.

See Also:
AnalyzeDisns, Constant Field Values

globalDoPruning

protected static final boolean globalDoPruning
Caches the value of Settings.modelDoPruning.


pruningThreshold

protected static final double pruningThreshold
Caches the double value of Settings.modelPruningThreshold.


printPrunedEvents

protected static final boolean printPrunedEvents
Indicates whether the method pruneHistoriesAndTransitions() will output pruned events to a special pruned event log file.

See Also:
Constant Field Values

printUnprunedEvents

protected static final boolean printUnprunedEvents
Indicates whether the method pruneHistoriesAndTransitions() will output events that were not pruned to a special pruned event log file.

See Also:
Constant Field Values

structure

protected ProbabilityStructure structure
The probability structure for this model to use.


structureClassName

protected String structureClassName
A cached copy of the name of the concrete type of the ProbabilityStructure instance used by this model.


shortStructureClassName

protected String shortStructureClassName
The value of structureClassName but without the package qualification.


numLevels

protected int numLevels
A cached copy of the number of back-off levels in the ProbabilityStructure used by this model.

See Also:
ProbabilityStructure.numLevels()

lambdaFudge

protected double[] lambdaFudge
A cached copy of the smoothing factors of the ProbabilityStructure used by this model. This array is of size numLevels.

See Also:
ProbabilityStructure.lambdaFudge(int)

lambdaFudgeTerm

protected double[] lambdaFudgeTerm
A cached copy of the smoothing terms of the ProbabilityStructure used by this model. This array is of size numLevels.

See Also:
ProbabilityStructure.lambdaFudgeTerm(int)

lambdaPenalty

protected double[] lambdaPenalty
A cached copy of the smoothing penalty factors contained in the ProbabilityStructure used by this model. This array is of size equal to numLevels.

See Also:
ProbabilityStructure.lambdaPenalty(int)

logOneMinusLambdaPenalty

protected double[] logOneMinusLambdaPenalty
The values of lambdaPenalty but modified such that
logOneMinusLambdaPenalty[i] = Math.log(1 - lambdaPenalty[i])
for all i: 0 ≤ i < lambdaPenalty.size.


counts

protected CountsTrio[] counts
The derived event counts used to estimate probabilities of this model.


verbose

protected boolean verbose
Indicates whether to report to stderr what this class is doing.


precomputedProbs

protected HashMapDouble[] precomputedProbs
Precomputed probabilities for each back-off level of this model. The keys of each of the HashMapDouble maps in this array are Transition objects.


precomputedLambdas

protected HashMapDouble[] precomputedLambdas
Precomputed lambdas for each back-off level of this model. The keys of each of the HashMapDouble maps in this array are Event instances.

For the modified Witten-Bell smoothing method used by this class, the values of the maps of this array are actually the log of one minus the lambda of a particular event at a particular back-off level, for ease of computing a smoothed estimate. That is, if event is some history context whose associated smoothing value is λi, then precomputedLambdas[i].get(event) will be equal to ln(1 − λi), where ln is the natural log function that is implemented by Math.log.


precomputedProbHits

protected transient int[] precomputedProbHits
Records the number of “hits” to the caches of precomputed probability estimates at the various back-off levels, to determine the amount each back-off level is used while decoding.


precomputedProbCalls

protected transient int precomputedProbCalls
Records the number of times estimateLogProbUsingPrecomputed(Transition,int) is invoked.


precomputedNPBProbHits

protected transient int[] precomputedNPBProbHits
Records the number of “hits” to the caches of precomputed probability estimates at the various back-off levels when the caller requests a probability for a context that has a base NP (NPB) as the parent nonterminal. This allows a comparison between NPB hits versus overall hits.


precomputedNPBProbCalls

protected transient int precomputedNPBProbCalls
Records the number of times estimateLogProbUsingPrecomputed(Transition,int) is invoked requesting a probability for an event whose history context has a base NP (NPB) the parent nonterminal. This allows a comparison between NPB method invocations and overall method invocations.


backOffMap

protected HashMap[] backOffMap
A set of numLevels - 1 maps, where map i is a map from back-off level i transitions to i + 1 transitions. These maps are only used temporarily when precomputing probs (and are necessary for incremental training).

See Also:
savePrecomputeData(CountsTable,Filter), saveBackOffMap

histBackOffMap

protected HashMap[] histBackOffMap
A set of numLevels - 1 maps, where map i is a map from back-off level i histories to i + 1 histories. These maps are not necessary for precomputing probabilities, but can be useful when debugging.

See Also:
saveHistBackOffMap, savePrecomputeData(CountsTable,Filter)

topLevelCache

protected transient ProbabilityCache topLevelCache
A currently-unused cache of probabilities of TrainerEvent objects.


cache

protected transient ProbabilityCache[] cache
A cache of probability estimates at the various back-off levels of this model, used when precomputeProbs is false.


cacheHits

protected transient int[] cacheHits
Records the number of cache hits for each back-off level of this mdoel.


cacheAccesses

protected transient int[] cacheAccesses
Records the number of cache accesses for each back-off level of this model.


canonicalEvents

protected transient FlexibleMap canonicalEvents
A reflexive map of canonical Event objects to save memory in the various tables of this model that store such Event objects.


smoothingParamsFile

protected transient String smoothingParamsFile
The value of the smoothing parameters file for this model, as given by ProbabilityStructure.smoothingParametersFile().

See Also:
useSmoothingParams, dontAddNewParams, ProbabilityStructure.smoothingParametersFile()

saveSmoothingParams

protected transient boolean saveSmoothingParams
The boolean value of the Settings.saveSmoothingParams setting.


dontAddNewParams

protected transient boolean dontAddNewParams
The boolean value of the Settings.dontAddNewParams setting.


useSmoothingParams

protected transient boolean useSmoothingParams
The boolean value of the Settings.useSmoothingParams setting.


smoothingParams

protected transient CountsTable[] smoothingParams
The smoothing parameters for the history contexts (Event instances) at the back-off levels of this model.


transitionsToPrune

protected transient HashSet[] transitionsToPrune
A set of sets used to collect transitions that are to be pruned.

See Also:
doPruning, pruneHistoriesAndTransitions()

historiesToPrune

protected transient HashSet[] historiesToPrune
A set of sets used to collect histories that are to be pruned.

See Also:
doPruning, pruneHistoriesAndTransitions()

createHistBackOffMap

protected transient boolean createHistBackOffMap
Indicates whether the histBackOffMap should be created when precomputing probabilities. If either doPruning or saveHistBackOffMap is true, then this data member will be set to true as well. The value of this boolean is set automatically in the constructor, and is only consulted when Settings.precomputeProbs is true.

See Also:
AnalyzeDisns, doPruning

doPruning

protected transient boolean doPruning
The value of this data member determines whether this model will be pruned when probabilities are precomputed. This data member’s value is set automatically in the constructor: it is true if and only if either globalDoPruning is true or if the ProbabilityStructure.doPruning() method invoked on this model’s probability structure object returns true.

See Also:
pruneHistoriesAndTransitions(), precomputeProbs()
Constructor Detail

Model

public Model(ProbabilityStructure structure)
Constructs a new object for deriving all counts using the specified probability structure.

Parameters:
structure - the probability structure to use when deriving counts
Method Detail

setCanonicalEvents

public void setCanonicalEvents(FlexibleMap canonical)
Sets the canonicalEvents member of this object.

Parameters:
canonical - the reflexive map of canonical Event objects
See Also:
ModelCollection.internalReadObject(java.io.ObjectInputStream)

deriveCounts

public void deriveCounts(CountsTable trainerCounts,
                         Filter filter,
                         double threshold,
                         FlexibleMap canonical)
Derives all counts from the specified counts table, using the probability structure specified in the constructor.

Parameters:
trainerCounts - a map from TrainerEvent objects to their counts (as doubles) from which to derive counts
filter - used to filter out TrainerEvent objects whose derived counts should not be derived for this model
threshold - a (currently unused) count cut-off threshold
canonical - a reflexive map used to canonicalize objects created when deriving counts

deriveCounts

public void deriveCounts(CountsTable trainerCounts,
                         Filter filter,
                         double threshold,
                         FlexibleMap canonical,
                         boolean deriveOtherModelCounts)
Derives all counts from the specified counts table, using the probability structure specified in the constructor.

Parameters:
trainerCounts - a map from TrainerEvent objects to their counts (as doubles) from which to derive counts
filter - used to filter out TrainerEvent objects whose derived counts should not be derived for this model
threshold - a (currently unused) count cut-off threshold
canonical - a reflexive map used to canonicalize objects created when deriving counts
deriveOtherModelCounts - an unused parameter, as this class does not contain other, internal Model instances

cleanup

protected void cleanup()
A method invoked after probabilities have been precomputed by precomputeProbs() to clean up (that is, remove) objects from the various counts tables that are no longer needed, as determined by ProbabilityStructure.removeHistory(int,Event) and ProbabilityStructure.removeTransition(int,Transition).

If precomputeProbs is true, then this method will remove entries from the maps of the precomputedProbs array. If precomputeProbs is false or if deleteCountsWhenPrecomputingProbs is false, then this method will remove entries from the maps in the counts array.

See Also:
ProbabilityStructure.removeHistory(int,Event), ProbabilityStructure.removeTransition(int,Transition)

canonicalizeEvent

protected static final Event canonicalizeEvent(Event event,
                                               FlexibleMap canonical)
This method first canonicalizes the information in the specified event (a Sexp or a Subcat and a Sexp), then it returns a canonical version of the event itself, copying it into the map if necessary.


getCanonical

protected static final Transition getCanonical(Transition trans,
                                               FlexibleMap canonical)
This method assumes trans already contains a canonical history and a canonical future. If an equivalent transition is found in the canonical map, it is returned; otherwise, a new Transition object is created with the canonical future and canonical history contained in the specified transition, and that new Transition object is added to the canonical map and returned.


estimateLogProb

public double estimateLogProb(int id,
                              TrainerEvent event)
Estimates the log-probability of a conditional event. The history (conditioning context) and future of this conditional event are contained in the specified maximal-context event.

Parameters:
id - the id of the caller (typically a Switchboard client ID)
event - the maximal-context event containing both the history (conditioning context) and future of the conditional event whose probability is to be estimated
Returns:
an estimate of the log-probaiblity of a conditional event

estimateProb

public double estimateProb(int id,
                           TrainerEvent event)
Estimates the probability of a conditional event. The history (conditioning context) and future of this conditional event are contained in the specified maximal-context event.

Parameters:
id - the id of the caller (typically a Switchboard client ID)
event - the maximal-context event containing both the history (conditioning context) and future of the conditional event whose probability is to be estimated
Returns:
an estimate of the probaiblity of a conditional event

estimateLogProbUsingPrecomputed

protected double estimateLogProbUsingPrecomputed(ProbabilityStructure structure,
                                                 TrainerEvent event)
Estimates the log prob using precomputed probabilities and smoothing values (lambdas). This method is invoked by the public method estimateLogProb(int,TrainerEvent) if precomputeProbs is true.


estimateLogProbUsingPrecomputed

protected double estimateLogProbUsingPrecomputed(Transition transition,
                                                 int atLevel)
Estimates the log prob of the specified transition using precomputed probabilities and lambdas and histBackOffMap (debugging method). N.B.: The history contained within the specified transition must have been observed during training (but not necessarily with the particular future contained in the specified transition).

Parameters:
transition - the transition for which to get a smoothed log-probability estimate
atLevel - the back-off level of the specified transition
See Also:
histBackOffMap, createHistBackOffMap

estimateProb

protected double estimateProb(ProbabilityStructure probStructure,
                              TrainerEvent event)
Returns the smoothed probability estimate of a transition contained in the specified TrainerEvent object.

Parameters:
probStructure - a ProbabilityStructure object that is either structure or a copy of it, used for temporary storage during the computation performed by this method
event - the TrainerEvent containing a transition from a history to a future whose smoothed probability is to be computed
Returns:
the smoothed probability estimate of a transition contained in the specified TrainerEvent object

estimateProbOld

protected double estimateProbOld(ProbabilityStructure structure,
                                 TrainerEvent event,
                                 int level,
                                 double prevHistCount)

deriveDiversityCounts

protected void deriveDiversityCounts()
Deprecated. This method used to be called by deriveCounts(CountsTable,Filter,double,FlexibleMap,boolean), but diversity counts are now derived directly by that method.

Called by deriveCounts(CountsTable,Filter,double,FlexibleMap), for each type of transition observed, this method derives the number of unique transitions from the history context to the possible futures. This number of unique transitions, called the diversity of a random variable, is used in a modified version of Witten-Bell smoothing.


deriveHistories

protected void deriveHistories(CountsTable trainerCounts,
                               Filter filter,
                               FlexibleMap canonical)
Deprecated. This method used to be called by deriveCounts(CountsTable,Filter,double,FlexibleMap,boolean), but histories are now derived directly by that method.

Derives all history-context counts from the specified counts table, using this Model object's probability structure.

Parameters:
trainerCounts - a map from TrainerEvent objects to their counts (as doubles) from which to derive counts
filter - used to filter out TrainerEvent objects whose derived counts should not be derived for this model
canonical - a reflexive map used to canonicalize objects created when deriving counts

getTransitions

protected Transition[] getTransitions(Transition zeroLevelTrans,
                                      Transition[] trans)
Inserts the Transition objects representing conditional events for all back-off levels of this model into the specified array, with trans[0] = zeroLevelTrans. Higher-numbered back-off level events (i.e., events with increasingly coarser history contexts) are gotten from the specified zeroeth-level Transition by use of the backOffMap.

Parameters:
zeroLevelTrans - the Transition object representing a conditional event with a maximal-context history
trans - the array in which to insert Transition objects for all levels of back-off of this model
Returns:
the specified Transition array, having been modified to include Transition objects for all levels of back-off of this model

pruneHistoriesAndTransitionsOld

protected void pruneHistoriesAndTransitionsOld()
Prune every history and transition with a back-off level less than the last level in which the last level history has a diversity of 1 (meaning that the probability is 1, so no need to store a history and transition).


computeHistoriesAndTransitionsToPrune

protected void computeHistoriesAndTransitionsToPrune()
Schedule for pruning every history and transition whose MLE is equal to that of back-off level's transition.


pruneHistoriesAndTransitions

protected void pruneHistoriesAndTransitions()
Analyzes the distributions of this model in order to prune history and transition (i.e., conditional) events from the various counts tables. The analysis is designed so that the histories and transitions that are pruned are likely to have a minimal if any impact on overall parsing performance.

As a side effect, the events that are pruned or not pruned are output to a file named structureClassName + ".prune-log", if printPrunedEvents or printUnprunedEvents, respectively, are true. This allows further analysis of the model.

See Also:
AnalyzeDisns, printPrunedEvents, printUnprunedEvents

initializeSmoothingParams

protected void initializeSmoothingParams()
Sets up the smoothing parameter arrays and maps.

See Also:
smoothingParams

savePrecomputeData

protected void savePrecomputeData(CountsTable trainerCounts,
                                  Filter filter)
Saves the back-off chain for each event derived from each TrainerEvent in the key set of the specified counts table. This method is called by deriveCounts(CountsTable,Filter,double,FlexibleMap,boolean) when Settings.precomputeProbs is true.

Parameters:
trainerCounts - a counts table containing some or all of the TrainerEvent objects collected during training
filter - a filter specifying which TrainerEvent objects to ignore in the key set of the specified counts table
See Also:
deriveCounts(CountsTable,Filter,double,FlexibleMap,boolean), backOffMap

precomputeProbs

public void precomputeProbs()
Precomputes all probabilities and smoothing values for events seen during all previous invocations of deriveCounts(CountsTable,Filter,double,FlexibleMap,boolean).

See Also:
precomputeProbs(MapToPrimitive.Entry, ...), storePrecomputedProbs

precomputeProbs

protected void precomputeProbs(MapToPrimitive.Entry transEntry,
                               double[] lambdas,
                               double[] estimates,
                               Transition[] transitions,
                               Event[] histories,
                               int lastLevel)
Precomputes the probabilities and smoothing values for the Transition object contained as a key within the specified map entry, where the value is the count of the transition.

Parameters:
transEntry - a map entry mapping a Transition object to its count (a double)
lambdas - an array in which to store the smoothing value for each of the back-off levels
estimates - an array in which to store the maximum-likelihood estimate at each of the back-off levels
transitions - an array in which to store the Transition instance for each of the back-off levels
histories - an array in which to store the history, an Event instance, for each of the back-off levels
lastLevel - the last back-off level (the value equal to numLevels - 1)
See Also:
precomputeProbs()

storePrecomputedProbs

protected void storePrecomputedProbs(double[] lambdas,
                                     double[] estimates,
                                     Transition[] transitions,
                                     Event[] histories,
                                     int lastLevel)
Stores the specified smoothing values (lambdas) and smoothed probability estimates in the precomputedProbs and smoothingParams map arrays.

Parameters:
lambdas - an array containing the smoothing value for each of the back-off levels
estimates - an array containing the maximum-likelihood estimate at each of the back-off levels
transitions - an array containing the Transition instance for each of the back-off levels
histories - an array in which to store the history, an Event instance, for each of the back-off levels
lastLevel - the last back-off level (the value equal to numLevels - 1)
See Also:
precomputeProbs()

precomputeProbs

protected void precomputeProbs(CountsTable trainerCounts,
                               Filter filter)
Deprecated. This method has been superseded by precomputeProbs().

Stores precomputed probabilities and smoothing values for events derived from the maximal-context TrainerEvent instances and their counts contained in the specified counts table.

Parameters:
trainerCounts - a map of TrainerEvent instances to their observed counts
filter - a filter indicating which of the TrainerEvent objects in the specified counts table should be ignored by this method as it iterates over all entires in the counts table

precomputeProbs

protected void precomputeProbs(TrainerEvent event,
                               Transition[] transitions,
                               Event[] histories)
Deprecated. This method is called by precomputeProbs(CountsTable,Filter), which is also deprecated.

Precomputes probabilities for the specified event, using the specified arrays as temporary storage during this method invocation.

Parameters:
event - the TrainerEvent object from which probabilities are to be precomputed
transitions - temporary storage to be used during an invocation of this method
histories - temporary storage to be used during an invocation of this method

readSmoothingParams

protected void readSmoothingParams()
Reads all necessary smoothing parameters from smoothingParamsFile instead of deriving values for smoothing parameters. Verbose output is produced.

See Also:
useSmoothingParams, dontAddNewParams, readSmoothingParams(boolean)

readSmoothingParams

protected void readSmoothingParams(boolean verboseOutput)
Reads all necessary smoothing parameters from smoothingParamsFile instead of deriving values for smoothing parameters. Verbose output is produced if the specified argument is true.

Parameters:
verboseOutput - indicates whether or not this method should output verbose messages to System.err.
See Also:
useSmoothingParams, dontAddNewParams

writeSmoothingParams

protected void writeSmoothingParams()
Writes the smoothing parameters of this model to the file named by smoothingParamsFile. Verbose output to System.err is produced.


writeSmoothingParams

protected void writeSmoothingParams(boolean verboseOutput)
Writes the smoothing parameters of this model to the file named by smoothingParamsFile.

Parameters:
verboseOutput - indicates whether to output verbose messages to System.err

share

public void share(int backOffLevel,
                  Model otherModel,
                  int otherModelBackOffLevel)
Indicates to use counts or precomputed probabilities from the specified back-off level of this model when estimating probabilities for the specified back-off level of another model.
N.B.: Note that invoking this method destructively alters the specified Model.

Parameters:
backOffLevel - the back-off level of this model that is to be shared by another model
otherModel - the other model that will share a particular back-off level with this mdoel (that is, use the counts or precomputed probabilities from this model)
otherModelBackOffLevel - the back-off level of the other model that is to be made the same as the specified back-off level of this model (that is, use the counts or precomputed probabilities from this model)

canonicalize

public void canonicalize()
Since events are typically read-only, this method will allow for canonicalization (or "unique-ifying") of the information contained in the events contained in this object. Use of this method is intended to conserve memory by removing duplicate copies of event information in different event objects.


canonicalize

public void canonicalize(FlexibleMap map)
Since events are typically read-only, this method will allow for canonicalization (or "unique-ifying") of the information contained in the events contained in this object using the specified map. Use of this method is intended to conserve memory by removing duplicate copies of event information in different event objects.

Parameters:
map - a map of canonical information structures of the Event objects contained in this object; this parameter allows multiple Model objects to have their events structures canonicalized with respect to each other

getProbStructure

public ProbabilityStructure getProbStructure()
Returns the type of ProbabilityStructure object used during the invocation of deriveCounts(CountsTable,Filter,double,FlexibleMap).

A copy of this object should be created and stored for each parsing client thread, for use when the clients need to call the probability-computation methods of this class. This scheme allows the reusable data members inside the ProbabilityStructure objects to be used by multiple clients without any concurrency problems, thereby maintaining their efficiency and thread-safety.


numModels

public int numModels()
Returns 1, as this object does not contain any other, internal Model instances.


getModel

public Model getModel(int idx)
Returns this model object.

Parameters:
idx - an unused parameter, as this object does not contain any other, internal Model instances.
Returns:
this model object

beVerbose

public void beVerbose()
Causes this class to be verbose in its output to System.err during the invocation of its methods, such as deriveCounts(CountsTable,Filter,double,FlexibleMap).


beQuiet

public void beQuiet()
Causes this class not to output anything to System.err during the invocation of its methods, such as deriveCounts(CountsTable,Filter,double,FlexibleMap).


getCacheStats

public String getCacheStats()
Returns a human-readable string containing all the precomputed or non-precomputed probability cache statistics for the life of this Model object.

Returns:
a human-readable string containing all the precomputed or non-precomputed probability cache statistics for the life of this Model object.

Parsing Engine

Author: Dan Bikel.