Parsing Engine

danbikel.parser
Class SubcatBag

java.lang.Object
  extended bydanbikel.parser.SubcatBag
All Implemented Interfaces:
Event, Externalizable, MutableEvent, Serializable, SexpConvertible, Subcat

public class SubcatBag
extends Object
implements Subcat, Externalizable

Provides a bag implementation of subcat requirements (a bag is a set that allows multiple occurrences of the same item). This list of all argument nonterminals is provided by Training.argNonterminals() map. As a special case, this class also supports gap requirements, that is, requirements equal to Training.gapAugmentation(). This class also provides a separate bin for miscellaneous subcat requirements, such as those that can be matched via AbstractTraining.headSym. All nonterminal requirements are stripped of any augmentations before being counted in this subcat bag.

The comment for the toSexp method describes the way in which this class represents miscellaneous requirements. Bugs:

  1. This class provides special-case bins for counting gap and miscellaneous subcat requirements. If this parsing package is expanded to include additional elements that are possible generative requirements, and these elements do not appear in Training.argNonterminals(), unless it is modified, this class will simply put these elements in the miscellaneous bin.
  2. This class assumes that only single requirements will be passed to its add(Symbol) or remove(Symbol) methods. For example, the generation of the modifying nonterminal NP-A-g satisfies two types of requirements, being an NP argument and having the gap feature. Nevertheless, this class assumes that these two types of requirements will be added or removed in two separate invocations of either add(Symbol) or remove(Symbol), one invocation with NP-A and one with g. Currently, the Decoder class assumes that each nonterminal generated will satisfy only a single requirement (but then, it does not handle the gap feature at all in its current state).
  3. As explained in the documentation for toSexp(), for input/output purposes, this class treats miscellaneous requirements as the symbol +STOP+-A. This “fake” argument nonterminal will not be correctly identified by the Training.isArgumentFast(Symbol) method after the Training.setUpFastArgMap(CountsTable) method has been invoked (unless this fake nonterminal happens to be one of the keys of the map passed to Training.setUpFastArgMap(CountsTable)). It is therefore important not to invoke the Training.setUpFastArgMap(CountsTable) method during training, when requirements are added individually by add(Symbol), which calls validRequirement(Symbol) which in turn invokes Training.isArgumentFast(Symbol).
  4. This class cannot collect more than 127 total occurrences of requirements. This is well beyond the number of arguments ever postulated in any human language, but not necessarily beyond the number of generative requirements that might be needed by a future parsing model. A corollary of this limitation is that the number of occurrences of a particular requirement may not exceed 127.

See Also:
Subcats, toSexp(), Serialized Form

Constructor Summary
SubcatBag()
          Constructs an empty subcat.
SubcatBag(SexpList list)
          Constructs a subcat bag containing the number of occurrences of the symbols of list.
 
Method Summary
 MutableEvent add(int type, Object obj)
          Adds the specified object of the specified type to this event.
 MutableEvent add(Object obj)
          Adds the specified object to this event.
 Subcat add(Symbol requirement)
          Adds the specified requirement to this subcat bag.
 boolean addAll(SexpList list)
          Adds each of the symbols of list to this subcat bag, effectively calling add(Symbol) for each element of list.
 void become(Subcat other)
          Causes this subcat to be equal to the specified subcat by copying the specified subcat's data to this subcat.
 int canonicalize(Map canonical)
          This method does nothing and returns -1, as no internal data to this class can be canonicalized.
 void clear()
          This method sets all counts of this subcat bag to zero.
 boolean contains(Symbol requirement)
          Returns true if this subcat frame contains the specified requirement.
 Event copy()
          Returns a deep copy of this subcat bag.
 boolean empty()
          Returns true if and only if there are zero requirements in this subcat bag.
 void ensureCapacity(int size)
          This method does nothing and returns.
 void ensureCapacity(int type, int size)
          This method does nothing and returns.
 boolean equals(Object obj)
          Returns true if and only if the specified object is of type SubcatBag and has the same number of requirement categories and has the same counts for each of those requirement categories.
 Object get(int type, int index)
          Gets the indexth components of this subcat bag.
 Subcat getCanonical(boolean copyInto, Map map)
          Returns a canonical instance of this object using the specified map (optional operation).
 Class getClass(int type)
          This method returns the one class that Subcat objects need to support: Symbol.class.
 int hashCode()
          Computes the hash code for this subcat.
 Iterator iterator()
          Returns an itrerator over the elements of this subcat bag, returning the canonical version of symbols for each the categories described in add(Symbol); for each occurrence of a miscellaneous item present in this subcat bag, the return value of Training.stopSym() is returned.
 int numComponents()
          An alias for size().
 int numComponents(int type)
          An alias for size().
 int numTypes()
          Returns 1 (Subcat objects only support Symbol objects).
 void readExternal(ObjectInput stream)
          Reads a serialized instance of this class from the specified stream.
 boolean remove(Symbol requirement)
          Removes the specified requirement from this subcat bag, if possible.
static void setUpFastUidMap(CountsTable nonterminals)
           
 int size()
          Returns the number of requirements contained in this subcat bag.
 Sexp toSexp()
          As per the contract of Subcat, this method returns a Sexp such that an equivalent SubcatBag object would result from the addAll(SexpList) method being invoked with this Sexp as its argument.
 String toString()
          Returns a human-readable string representation of the requirements contained in this bag.
 int typeIndex(Class cl)
          Returns 0 if the specified class is equal to Symbol.class, -1 otherwise.
protected  boolean validRequirement(Symbol requirement)
          A method to check if the specified requirement is valid.
 void writeExternal(ObjectOutput stream)
          Writes this object to the specified output stream.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SubcatBag

public SubcatBag()
Constructs an empty subcat.


SubcatBag

public SubcatBag(SexpList list)
Constructs a subcat bag containing the number of occurrences of the symbols of list.

Parameters:
list - a list of Symbol objects to be added to this subcat bag
Method Detail

setUpFastUidMap

public static void setUpFastUidMap(CountsTable nonterminals)

validRequirement

protected boolean validRequirement(Symbol requirement)
A method to check if the specified requirement is valid. For this class, a requirement is valid if it is either Training.gapAugmentation() or a symbol for which Training.isArgumentFast(Symbol) returns true. A subclass may override this method to allow for new or different valid requirements.


add

public Subcat add(Symbol requirement)
Adds the specified requirement to this subcat bag. There are separate bins maintained for each of the nonterminals in the list returned by Training.argNonterminals(), as well as a bin for gap augmentations (that is, requirements that are equal to Training.gapAugmentation()) and a miscellaneous bin for all other requirements, such as those that can be matched via AbstractTraining.headSym.

Specified by:
add in interface Subcat
Parameters:
requirement - the requirement to add to this subcat bag
Returns:
this Subcat object
See Also:
Training.defaultArgAugmentation(), Training.gapAugmentation(), Training.isArgumentFast(Symbol)

addAll

public boolean addAll(SexpList list)
Adds each of the symbols of list to this subcat bag, effectively calling add(Symbol) for each element of list.

Specified by:
addAll in interface Subcat
Parameters:
list - a list of Symbol objects to be added to this subcat bag
Returns:
whether this subcat was modified
Throws:
ClassCastException - if one or more elements of list is not an instance of Symbol

remove

public boolean remove(Symbol requirement)
Removes the specified requirement from this subcat bag, if possible. If the specified requirement is a nonterminal, then it is only removed if it is an argument nonterminal, that is, if Language.training().isArgumentFast(requirement) returns true, and if this subcat contained at least one instance of that nonterminal.

Specified by:
remove in interface Subcat
Parameters:
requirement - the element that has been generated by the parser and is thus a candidate for removal from this subcat
Returns:
true if this subcat bag contained at least one instance of the specified requirement and it was removed, false otherwise
See Also:
Training.isArgumentFast(Symbol)

size

public int size()
Returns the number of requirements contained in this subcat bag.

Specified by:
size in interface Subcat

empty

public boolean empty()
Returns true if and only if there are zero requirements in this subcat bag.

Specified by:
empty in interface Subcat

contains

public boolean contains(Symbol requirement)
Description copied from interface: Subcat
Returns true if this subcat frame contains the specified requirement.

Specified by:
contains in interface Subcat
Parameters:
requirement - the requirement for which membership in this subcat is to be checked
Returns:
true if this subcat contains requirement, that is, returns true if and only if Subcat.remove(Symbol) would remove the specified symbol from this subcat

iterator

public Iterator iterator()
Returns an itrerator over the elements of this subcat bag, returning the canonical version of symbols for each the categories described in add(Symbol); for each occurrence of a miscellaneous item present in this subcat bag, the return value of Training.stopSym() is returned.

Specified by:
iterator in interface Subcat

copy

public Event copy()
Returns a deep copy of this subcat bag.

Specified by:
copy in interface Event

hashCode

public int hashCode()
Computes the hash code for this subcat.

Specified by:
hashCode in interface Subcat

equals

public boolean equals(Object obj)
Returns true if and only if the specified object is of type SubcatBag and has the same number of requirement categories and has the same counts for each of those requirement categories.

Specified by:
equals in interface Subcat

toString

public String toString()
Returns a human-readable string representation of the requirements contained in this bag. Note that nonterminals that are not in the miscellaneous bag will contain argument augmentations.


getCanonical

public Subcat getCanonical(boolean copyInto,
                           Map map)
Description copied from interface: Subcat
Returns a canonical instance of this object using the specified map (optional operation).

Specified by:
getCanonical in interface Subcat
Parameters:
map - the reflexive map to use for canonicalization: the key-value pair of (this, this) should be added to map if this object is not already a key in map
copyInto - specifies whether to copy this subcat before inserting into the canonical map
Returns:
a canonical instance of this object

add

public MutableEvent add(Object obj)
Description copied from interface: MutableEvent
Adds the specified object to this event. The specified object must be of a type that this event is capable of collecting; that is,
 this.typeIndex(obj.getClass())
 
must not return -1.

If an implementation of this interface collects components that are primitive type values, then these values should be wrapped in their corresponding wrapper classes. For example, if an implementation of this interface accepts int values, they should be passed as Integer objects to this method. At present, an Event implementation cannot be designed accept both a primitive type and its associated wrapper class' type (this is, of course, not a serious limitation).

Specified by:
add in interface MutableEvent
Returns:
this object

add

public MutableEvent add(int type,
                        Object obj)
Description copied from interface: MutableEvent
Adds the specified object of the specified type to this event. The specified object must be of the type specified; that is, the expression
 this.typeIndex(obj.getClass()) == type
 
must be true.

If an implementation of this interface collects components that are primitive type values, then these values should be wrapped in their corresponding wrapper classes. For example, if an implementation of this interface accepts int values, they should be passed as an Integer objects to this method. At present, an Event implementation cannot be designed accept both a primitive type and its associated wrapper class' type (this is, of course, not a serious limitation).

Specified by:
add in interface MutableEvent
Returns:
this object

ensureCapacity

public void ensureCapacity(int size)
This method does nothing and returns.

Specified by:
ensureCapacity in interface MutableEvent
Parameters:
size - the size to pre-allocate for all abstract lists of this event

ensureCapacity

public void ensureCapacity(int type,
                           int size)
This method does nothing and returns.

Specified by:
ensureCapacity in interface MutableEvent
Parameters:
type - the type of underlying abstract list for which to pre-allocate space
size - the size to pre-allocate for the specified type of abstract list

getClass

public Class getClass(int type)
This method returns the one class that Subcat objects need to support: Symbol.class.

Specified by:
getClass in interface Event
Returns:
the type (Class) associated with the specified type index

typeIndex

public int typeIndex(Class cl)
Returns 0 if the specified class is equal to Symbol.class, -1 otherwise.

Specified by:
typeIndex in interface Event

numTypes

public int numTypes()
Returns 1 (Subcat objects only support Symbol objects).

Specified by:
numTypes in interface Event

numComponents

public int numComponents()
An alias for size().

Specified by:
numComponents in interface Event

numComponents

public int numComponents(int type)
An alias for size().

Specified by:
numComponents in interface Event

canonicalize

public int canonicalize(Map canonical)
This method does nothing and returns -1, as no internal data to this class can be canonicalized.

Specified by:
canonicalize in interface Event
Parameters:
canonical - a reflexive map of objecs representing event information: for each unique key-value pair, the value is a reference to the key
Returns:
1 if this event was canonicalized, 0 if it was not canonicalized (and had to be added to canonical) or -1 if this event was not even eligible for canonicalization

clear

public void clear()
This method sets all counts of this subcat bag to zero.

Specified by:
clear in interface MutableEvent

get

public Object get(int type,
                  int index)
Gets the indexth components of this subcat bag.

Efficiency note: The time complexity of this method is linear in the number of requirement types.

Specified by:
get in interface Event
Parameters:
type - an unused type parameter (Subcat events only support the type Symbol, so this argument is effectively superfluous for this class)
index - the index of the requirement to get
Returns:
the indexth Symbol of this subcat bag, as would be returned by the indexth invocation of next from the iterator returned by iterator()

toSexp

public Sexp toSexp()
As per the contract of Subcat, this method returns a Sexp such that an equivalent SubcatBag object would result from the addAll(SexpList) method being invoked with this Sexp as its argument.

N.B.: For each occurrence of a miscellaneous item present in this subcat bag, the returned list will contain the symbol Training.stopSym() augmented with the argument augmentation:

 Symbol.get(Training.stopSym().toString() +
            Treebank.canonicalAugDelimiter() +
            Training.defaultArgAugmentation());
 

Specified by:
toSexp in interface Subcat

writeExternal

public void writeExternal(ObjectOutput stream)
                   throws IOException
Writes this object to the specified output stream.

Specified by:
writeExternal in interface Externalizable
Parameters:
stream - the stream to which to write this object
Throws:
IOException - if there is a problem writing to the specified stream

readExternal

public void readExternal(ObjectInput stream)
                  throws IOException,
                         ClassNotFoundException
Reads a serialized instance of this class from the specified stream.

Specified by:
readExternal in interface Externalizable
Parameters:
stream - the stream from which to read a serialized instance of this class
Throws:
IOException - if there is a problem reading from the specified stream
ClassNotFoundException - if the concrete type of the object to be read cannot be found

become

public void become(Subcat other)
Description copied from interface: Subcat
Causes this subcat to be equal to the specified subcat by copying the specified subcat's data to this subcat.

Specified by:
become in interface Subcat

Parsing Engine

Author: Dan Bikel.