Analysis: Managing the Customers’ Data for Marketing Effectiveness
There are two paradigms for managing and analyzing consumer data through the Internet, reasoning and learning. Reasoning is providing outputs against inputs based on a rule base (a set of pre-defined external rules). Reasoning is performed through inferencing systems. Learning is the modification of application/or a web site’s functional behavior as a result of experience about customers' behavior or customers' marketing response, often using sophisticated statistical analysis. These two ways of analyzing data are connected with each other.
Reasoning engines are most commonly seen in customer service-focused sites, as they can help to quickly streamline processes. Sites with significant direct marketing capabilities usually utilize a reasoning engine in conjunction with a learning engine.
Example of inferencing system: RAISE
RAISE, developed by IBM research, is an example of a rules-based system that was developed to analyze data collected on-line. Originally developed to automate a customer support tool called GlobeNet, RAISE has since been deployed in a wide range of applications. GlobeNet was an application that searched public bulletin boards for appends (comments) from users with problems or questions about their IBM products. When found the problems or questions were posted to a database, where they were periodically reviewed and routed to the appropriate IBM support resource. The problem with GlobeNet was that support personnel spent too much time manually searching through the comments and routing them to the appropriate person.
What the RAISE inference engine added to the system was automation of the routing process. When comments were discovered, they were forwarded to RAISE. These comments constituted the short-term events, or the real time perception, for the inferencing system. Each comment contained various fields that allowed the inference engine to classify the problem into a narrowly defined category. The comments were analyzed against the engine’s the rule base, or long-term knowledge, which was basically a set of rules which acted as traffic cop directing where specific types of problems should be sent. These rules are based on historical situations and information created by the designer of the system to route comments appropriately based on the same logic, if not better logic than the support personnel used. The system then forwarded the comments to the specific support person for study or response. The natural extension of this system would have been to create rules to tie common customer complaints to pre-defined answers that could be automatically forwarded to the customer, and indeed many customer service-focused sites are doing this today on the Internet.
How Reasoning Systems Work
Reasoning engines, or inferencing systems, are programs whose logic has been externalized and represented as a set of logical propositions. The logic processes of inferencing systems are represented by the rule base, or set of rules which guide how the program should react to a given data input. To provide a rough example, the rule base, at a high level, in the above example might have been:
If a comment about the AS400 machine, then route comment to the AS400 department support personnel, else if a comment about the IBM ThinkPad battery life, route comment to Thinkpad power sources support personnel…etc.
Reasoning engines use short-term facts and long-term knowledge. Short-term facts are new facts produced by the analysis of events or by deriving facts, which are the intermediate results of the inferencing process. We can think of short-term facts as the reasoning engine's real-time perception. In the above example the short-term facts were the comments themselves. Let’s say a comment said the following:
"My Thinkpad has been shutting off quickly when I unplug it, what is wrong with it?"
A short-term fact is the comment itself. Another short-term fact, or intermediate result, could be derived from the word "unplug" and equating that word and thus this comment to a battery problem.
Reasoning engines also use Long-term knowledge. Its designer provides the Long-term knowledge such as marketing knowledge or other business knowledge to the reasoning engine. Long-term knowledge usually takes the form of a customer or product database. Most long-term knowledge is encoded in the rule base. Long-term knowledge may be obtained through the analyses held by learning engines. The reasoning engine may modify the rule base itself or modification may be done by some external process. In the example above, the rule base, or long term knowledge may be the logic that the support personnel used to use when manually routing comments. Additionally, if combined with a learning engine, as the system routes more and more messages, it can monitor which comments get resolved in which departments best. From this monitoring and learning, it can adjust its rules to send the right comments to the departments that handle them best.
The rules in reasoning systems are called production rules. Each production rule expresses one or more conditions and possibly one action. The typical structure of a rule is below.
IF condition THEN action
Conditions and actions include symbols (variables). These symbols are assigned values of events or values from the long-term knowledge. These values make short-term facts as intermediate inference results. Intermediate inference results will be passed forward in the evaluation of the rule base, which is called forward chaining. After each of forward chaining, the new short-term facts are created. This ability to create short-term facts for forward chaining is one of the main advantages of a rule-based system. From the example above:
IF comment contains unplug THEN classify comment as battery
The condition is "comment contains unplug", the action is "classify comment as battery". The new intermediate inference result is that the comment is about batteries. This allows the comment to be forward chained to the next set of rules:
IF comment about battery THEN route to route comment to Thinkpad power sources support personnel…etc.
The control structure determines which rule from the rule base will be evaluated next. From the IBM example, the control structure guides the comment from one set of IF, THEN rules to a more specific set of rules until the comment can finally be routed.
Learning corresponds to the feedback loop. The learning engine extracts new knowledge from a stream of data of events and actions and finds emerging trends and correlation among the data of events and actions. A learning engine is necessary because the rule base is a static repository for long-term knowledge. Updates to the rule base, or long-term knowledge, usually must be done through human interaction. Most organizations’ "learning engines" today consist of a group of highly trained analysts using expensive and sophisticated statistical software. Increasingly however, programs are being developed that have the ability to analyze, develop and extend long-term knowledge automatically. This allows the rule base to be modified dynamically and quickly, without human interference.
Examples of automatic learning engines include Firefly and Open Sesame! Firefly uses statistical techniques to identify clusters of musical taste. Open Sesame! uses neural networks to identify patterns of behavior. (Open Sesame! was the first commercial user interface learning agent.)
Statistical analysis methods of data mining used in Learning Engines
Correlation and Regression: Reasoning engines process events in isolation, not noticing the sequential relationship between events. Learning engines may find a correlation between events.
Trend analysis: Learning engines can find some trends in time series of events.
Clustering: Learning engines can find the clusters of events or actions. The core technique in Firefly is the ability to recognize population clusters in a multi-dimensional space. New users are invited to submit a list of their favorite CDs to Firefly, which then respond with suggestions other CDs that the user will probably also like. Firefly does not have a complex rule base of factors in musical appreciation. Its knowledge is based on the identification of clusters of people who have similar taste or at least similar lists of favorite CDs.
Neural network: Neural networks were developed as analogs of biological brains. It has been demonstrated that learning in humans and higher animals results from the complex "wiring" among the brain's cells, so that the excitation of a given cell depends on weighted functions of the inputs from many other cells. It has been shown to be very effective at dealing with unstructured data or with noisy input data, such as handwriting and speech recognition and various forms image processing, which is very difficult to process with the rigid preset rules in an inference system. Neural networks could identify approximate sequences of events or user actions, which are approximate matches to known behavior. The rule based inference systems require exact matches, limiting their flexibility.
Tying together reasoning and learning engines
An effective direct marketing Internet site needs both a reasoning and learning engine. The learning engine, whether automatic through software, or manual through a team of analysts, is necessary to understand the relationship among various customers, and between certain customer attributes and buying behavior. As marketers look to their Internet sites to shorten direct marketing cycle time, automatic learning engines will become more vital. However, the learning engine must be tied to a reasoning engine, which contains the rules and criteria upon which an offer will be made.
Although the core statistical analysis techniques employed in Internet data mining are not new, the values in these tools lies in their ability to out perform blind searching in the discovery of unforeseen relationships. This importance rises as the scale of the database grows. In the world of electronic commerce, the amount of data to be used for analyses will be huge. The ability to effectively analyze this data automatically is the bottleneck that must be broken in order to realize the ideal of one-to-one marketing.
"Direct Marketing on the Internet", by Matsuda, Rosenstein, Scovitch, and Takamura
No portion of this paper may be reproducted or used without the permission of the authors