Parsing Engine

danbikel.parser
Class BrokenSubcatBag

java.lang.Object
  extended bydanbikel.parser.BrokenSubcatBag
All Implemented Interfaces:
Event, Externalizable, MutableEvent, Serializable, SexpConvertible, Subcat

public class BrokenSubcatBag
extends Object
implements Subcat, Externalizable

A “broken” version of SubcatBag that precisely reflects the details specified in Collins’ thesis (used for “clean-room” implementation). Catuion: Changes made to the way SubcatBag operates may have rendered this class even more “broken” than originally intended.

Provides a bag implementation of subcat requirements (a bag is a set that allows multiple occurrences of the same item). This list of all argument nonterminals is provided by Training.argNonterminals() map. As a special case, this class also supports gap requirements, that is, requirements equal to Training.gapAugmentation(). This class also provides a separate bin for miscellaneous subcat requirements, such as those that can be matched via AbstractTraining.headSym. All nonterminal requirements are stripped of any augmentations before being counted in this subcat bag.

The comment for the toSexp method describes the way in which this class represents miscellaneous requirements.

Bugs:

  1. This class provides special-case bins for counting gap and miscellaneous subcat requirements. If this parsing package is expanded to include additional elements that are possible generative requirements, and these elements do not appear in Training.argNonterminals(), unless it is modified, this class will simply put these elements in the miscellaneous bin.
  2. This class cannot collect more than 127 total occurrences of requirements. This is well beyond the number of arguments ever postulated in any human language, but not necessarily beyond the number of generative requirements that might be needed by a future parsing model. A corollary of this limitation is that the number of occurrences of a particular requirement may not exceed 127.

See Also:
Subcats, toSexp(), Serialized Form

Constructor Summary
BrokenSubcatBag()
          Constructs an empty subcat.
BrokenSubcatBag(SexpList list)
          Constructs a subcat bag containing the number of occurrences of the symbols of list.
 
Method Summary
 MutableEvent add(int type, Object obj)
          Adds the specified object of the specified type to this event.
 MutableEvent add(Object obj)
          Adds the specified object to this event.
 Subcat add(Symbol requirement)
          Adds the specified requirement to this subcat bag.
 boolean addAll(SexpList list)
          Adds each of the symbols of list to this subcat bag, effectively calling add(Symbol) for each element of list.
 void become(Subcat other)
          Causes this subcat to be equal to the specified subcat by copying the specified subcat's data to this subcat.
 int canonicalize(Map canonical)
          This method does nothing and returns -1, as no internal data to this class can be canonicalized.
 void clear()
          This method sets all counts of this subcat bag to zero.
 boolean contains(Symbol requirement)
          Returns true if this subcat frame contains the specified requirement.
 Event copy()
          Returns a deep copy of this subcat bag.
 boolean empty()
          Returns true if and only if there are zero requirements in this subcat bag.
 void ensureCapacity(int size)
          This method does nothing and returns.
 void ensureCapacity(int type, int size)
          This method does nothing and returns.
 boolean equals(Object obj)
          Returns true if and only if the specified object is of type BrokenSubcatBag and has the same number of requirement categories and has the same counts for each of those requirement categories.
 Object get(int type, int index)
          Gets the indexth components of this subcat bag.
 Subcat getCanonical(boolean copyInto, Map map)
          Returns a canonical instance of this object using the specified map (optional operation).
 Class getClass(int type)
          This method returns the one class that Subcat objects need to support: Symbol.class.
 int hashCode()
          Computes the hash code for this subcat.
 Iterator iterator()
          Returns an itrerator over the elements of this subcat bag, returning the canonical version of symbols for each the categories described in add(Symbol); for each occurrence of a miscellaneous item present in this subcat bag, the return value of Training.stopSym() is returned.
 int numComponents()
          An alias for size().
 int numComponents(int type)
          An alias for size().
 int numTypes()
          Returns 1 (Subcat objects only support Symbol objects).
 void readExternal(ObjectInput stream)
           
 boolean remove(Symbol requirement)
          Removes the specified requirement from this subcat bag, if possible.
static void setUpFastUidMap(CountsTable nonterminals)
           
 int size()
          Returns the number of requirements contained in this subcat bag.
 Sexp toSexp()
          As per the contract of Subcat, this method returns a Sexp such that an equivalent BrokenSubcatBag object would result from the addAll(SexpList) method being invoked with this Sexp as its argument.
 String toString()
          Returns a human-readable string representation of the requirements contained in this bag.
 int typeIndex(Class cl)
          Returns 0 if the specified class is equal to Symbol.class, -1 otherwise.
protected  boolean validRequirement(Symbol requirement)
          A method to check if the specified requirement is valid.
 void writeExternal(ObjectOutput stream)
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

BrokenSubcatBag

public BrokenSubcatBag()
Constructs an empty subcat.


BrokenSubcatBag

public BrokenSubcatBag(SexpList list)
Constructs a subcat bag containing the number of occurrences of the symbols of list.

Parameters:
list - a list of Symbol objects to be added to this subcat bag
Method Detail

setUpFastUidMap

public static void setUpFastUidMap(CountsTable nonterminals)

validRequirement

protected boolean validRequirement(Symbol requirement)
A method to check if the specified requirement is valid. For this class, a requirement is valid if it is either Training.gapAugmentation() or a symbol for which Training.isArgumentFast(Symbol) returns true. A subclass may override this method to allow for new or different valid requirements.


add

public Subcat add(Symbol requirement)
Adds the specified requirement to this subcat bag. There are separate bins maintained for each of the nonterminals in the list returned by Training.argNonterminals(), as well as a bin for gap augmentations (that is, requirements that are equal to Training.gapAugmentation()) and a miscellaneous bin for all other requirements, such as those that can be matched via AbstractTraining.headSym.

Specified by:
add in interface Subcat
Parameters:
requirement - the requirement to add to this subcat bag
Returns:
this Subcat object
See Also:
Training.defaultArgAugmentation(), Training.gapAugmentation(), Training.isArgumentFast(Symbol)

addAll

public boolean addAll(SexpList list)
Adds each of the symbols of list to this subcat bag, effectively calling add(Symbol) for each element of list.

Specified by:
addAll in interface Subcat
Parameters:
list - a list of Symbol objects to be added to this subcat bag
Returns:
whether this subcat was modified
Throws:
ClassCastException - if one or more elements of list is not an instance of Symbol

remove

public boolean remove(Symbol requirement)
Removes the specified requirement from this subcat bag, if possible. If the specified requirement is a nonterminal, then it is only removed if it is an argument nonterminal, that is, if Language.training().isArgumentFast(requirement) returns true, and if this subcat contained at least one instance of that nonterminal.

Specified by:
remove in interface Subcat
Parameters:
requirement - the element that has been generated by the parser and is thus a candidate for removal from this subcat
Returns:
true if this subcat bag contained at least one instance of the specified requirement and it was removed, false otherwise
See Also:
Training.isArgumentFast(Symbol)

size

public int size()
Returns the number of requirements contained in this subcat bag.

Specified by:
size in interface Subcat

empty

public boolean empty()
Returns true if and only if there are zero requirements in this subcat bag.

Specified by:
empty in interface Subcat

contains

public boolean contains(Symbol requirement)
Description copied from interface: Subcat
Returns true if this subcat frame contains the specified requirement.

Specified by:
contains in interface Subcat
Parameters:
requirement - the requirement for which membership in this subcat is to be checked
Returns:
true if this subcat contains requirement, that is, returns true if and only if Subcat.remove(Symbol) would remove the specified symbol from this subcat

iterator

public Iterator iterator()
Returns an itrerator over the elements of this subcat bag, returning the canonical version of symbols for each the categories described in add(Symbol); for each occurrence of a miscellaneous item present in this subcat bag, the return value of Training.stopSym() is returned.

Specified by:
iterator in interface Subcat

copy

public Event copy()
Returns a deep copy of this subcat bag.

Specified by:
copy in interface Event

hashCode

public int hashCode()
Computes the hash code for this subcat.

Specified by:
hashCode in interface Subcat

equals

public boolean equals(Object obj)
Returns true if and only if the specified object is of type BrokenSubcatBag and has the same number of requirement categories and has the same counts for each of those requirement categories.

Specified by:
equals in interface Subcat

toString

public String toString()
Returns a human-readable string representation of the requirements contained in this bag. Note that nonterminals that are not in the miscellaneous bag will contain argument augmentations.


getCanonical

public Subcat getCanonical(boolean copyInto,
                           Map map)
Description copied from interface: Subcat
Returns a canonical instance of this object using the specified map (optional operation).

Specified by:
getCanonical in interface Subcat
Parameters:
map - the reflexive map to use for canonicalization: the key-value pair of (this, this) should be added to map if this object is not already a key in map
copyInto - specifies whether to copy this subcat before inserting into the canonical map
Returns:
a canonical instance of this object

add

public MutableEvent add(Object obj)
Description copied from interface: MutableEvent
Adds the specified object to this event. The specified object must be of a type that this event is capable of collecting; that is,
 this.typeIndex(obj.getClass())
 
must not return -1.

If an implementation of this interface collects components that are primitive type values, then these values should be wrapped in their corresponding wrapper classes. For example, if an implementation of this interface accepts int values, they should be passed as Integer objects to this method. At present, an Event implementation cannot be designed accept both a primitive type and its associated wrapper class' type (this is, of course, not a serious limitation).

Specified by:
add in interface MutableEvent
Returns:
this object

add

public MutableEvent add(int type,
                        Object obj)
Description copied from interface: MutableEvent
Adds the specified object of the specified type to this event. The specified object must be of the type specified; that is, the expression
 this.typeIndex(obj.getClass()) == type
 
must be true.

If an implementation of this interface collects components that are primitive type values, then these values should be wrapped in their corresponding wrapper classes. For example, if an implementation of this interface accepts int values, they should be passed as an Integer objects to this method. At present, an Event implementation cannot be designed accept both a primitive type and its associated wrapper class' type (this is, of course, not a serious limitation).

Specified by:
add in interface MutableEvent
Returns:
this object

ensureCapacity

public void ensureCapacity(int size)
This method does nothing and returns.

Specified by:
ensureCapacity in interface MutableEvent
Parameters:
size - the size to pre-allocate for all abstract lists of this event

ensureCapacity

public void ensureCapacity(int type,
                           int size)
This method does nothing and returns.

Specified by:
ensureCapacity in interface MutableEvent
Parameters:
type - the type of underlying abstract list for which to pre-allocate space
size - the size to pre-allocate for the specified type of abstract list

getClass

public Class getClass(int type)
This method returns the one class that Subcat objects need to support: Symbol.class.

Specified by:
getClass in interface Event
Returns:
the type (Class) associated with the specified type index

typeIndex

public int typeIndex(Class cl)
Returns 0 if the specified class is equal to Symbol.class, -1 otherwise.

Specified by:
typeIndex in interface Event

numTypes

public int numTypes()
Returns 1 (Subcat objects only support Symbol objects).

Specified by:
numTypes in interface Event

numComponents

public int numComponents()
An alias for size().

Specified by:
numComponents in interface Event

numComponents

public int numComponents(int type)
An alias for size().

Specified by:
numComponents in interface Event

canonicalize

public int canonicalize(Map canonical)
This method does nothing and returns -1, as no internal data to this class can be canonicalized.

Specified by:
canonicalize in interface Event
Parameters:
canonical - a reflexive map of objecs representing event information: for each unique key-value pair, the value is a reference to the key
Returns:
1 if this event was canonicalized, 0 if it was not canonicalized (and had to be added to canonical) or -1 if this event was not even eligible for canonicalization

clear

public void clear()
This method sets all counts of this subcat bag to zero.

Specified by:
clear in interface MutableEvent

get

public Object get(int type,
                  int index)
Gets the indexth components of this subcat bag.

Efficiency note: The time complexity of this method is linear in the number of requirement types.

Specified by:
get in interface Event
Parameters:
type - an unused type parameter (Subcat events only support the type Symbol, so this argument is effectively superfluous for this class)
index - the index of the requirement to get
Returns:
the indexth Symbol of this subcat bag, as would be returned by the indexth invocation of next from the iterator returned by iterator()

toSexp

public Sexp toSexp()
As per the contract of Subcat, this method returns a Sexp such that an equivalent BrokenSubcatBag object would result from the addAll(SexpList) method being invoked with this Sexp as its argument.

N.B.: For each occurrence of a miscellaneous item present in this subcat bag, the returned list will contain the symbol Training.stopSym() augmented with the argument augmentation:

 Symbol.get(Training.stopSym().toString() +
            Treebank.canonicalAugDelimiter() +
            Training.defaultArgAugmentation());
 

Specified by:
toSexp in interface Subcat

writeExternal

public void writeExternal(ObjectOutput stream)
                   throws IOException
Specified by:
writeExternal in interface Externalizable
Throws:
IOException

readExternal

public void readExternal(ObjectInput stream)
                  throws IOException,
                         ClassNotFoundException
Specified by:
readExternal in interface Externalizable
Throws:
IOException
ClassNotFoundException

become

public void become(Subcat other)
Description copied from interface: Subcat
Causes this subcat to be equal to the specified subcat by copying the specified subcat's data to this subcat.

Specified by:
become in interface Subcat

Parsing Engine

Author: Dan Bikel.