Currently web marketers can collect information about their customers via the Internet in two ways. The first classification, passive user data collection, primarily involves tools to collect data in which the user has little awareness of the data collection effort and which requires no explicit actions on the user's part. These tools make use of the interactions between web clients and servers to collect and store information about the user actions or reactions to information provided on a web site.
The second classification, active user data collection, involves explicitly asking users for information, preferences, and opinions. While users are often reluctant to enter information, due to time and privacy issues, if web sites can offer some service or product in return for the information, users may be compelled to provide some level of information to the site.
Passive data collection may involve sophisticated web technology or may simply involve a manual entry into a historical profile after a conversation, email or on-line chat with a customer service representative. The information collected helps to predict user preferences based on a historical profile of interactions with a company or site. A site may collect content on how long on you spend on a page, if you print or save, and which links you use. Some say that this passive data can often better predict user actions better than preference data explicitly entered by users themselves . Humans are poor at predicting themselves.
Data is also collected to determine the user's web experience for web site administrators. It can serve to provide valuable marketing information and be used to improve the quality of web sites. Administrators will look at traffic reports including number of visitors per day and per hour, popular times of day, quantity of data accessed, popular pages, downloaded files, the browsers and other technical properties of a user's computer, visitor's referencing site and rough global location of the visitor.
Two categories of web technology are currently used to collect information: HTTP logging and cookies.
Every transaction between a web client and server creates a record, stored in a file, of all of the transactions taking place between the two entities. These files are stored on a company's web server to be analyzed by any number of software packages offering log file analysis functionality. The data stored in these log files is parsed by these software packages and aggregated into summary statistics for the web site. Log Files alone can not always be tied to an individual user, inhibiting the ability to create and update an individual user profile. However, when used in combination with other web technologies described later, this is possible. Used alone or with other technology, valuable information can be collected about the effectiveness of an ad campaign or web site. To view the types of information collected in a log file click here.
Cookies are identifying files that are sent to a computer by a Web server. The web site then reads this uniquely identifying file each time the site is accessed in the future and identifies a user to a server. With this method of identification, a server can associate a computer with a user profile and update that profile each time that user visits the site. Each click, print out or other action is identified as coming from that user's computer and is recorded in a (or the computer's if you are in a lab) profile. Every action provides a record of what the individual user is interested in. These cookies exist in each browser's directory on each computer's hard drive as "cookies.txt" for PCs, and 'MagicCookie' on Macs.
Used in combination a log file and cookie can uniquely track user actions and create a historical view of the user. The real benefit of these methods is realized when history can be applied real-time to customize user views and react to users just in time. InfoSeek uses cookies to improve search results while Lycos uses them to place banner advertising based on your previous searches. Doubleclick uses cookies to track how many times a given user has seen a particular ad, so that it can limit the number of times an ad is shown with no reaction and replace that ad with a more effective one. This helps advertisers to avoid wasting money in any particular site or user. (Neil Randall, PC Magazine Oct. 7,1997).
A company using Aptex Software can even predict user demographics by monitoring their activities on a site with cookies and log files. Aptex uses mining and neural network technology to uncover demographic profiles. While previously this technology was used only to predict behaviors, it has now expanded to predict demographics such as age and gender. They are selling to more and more retailers to do personalized searching for other promotions based on demographic information previously unavailable in web marketing.
Even with all of these applications of the technology, there are several weaknesses to these methods that can limit their effectiveness:
Average Web site visitors look at seven pages and leave without providing any information. Active user data collection requires a user to spend time entering information and requires that a user feel comfortable with providing information over the Internet. These two barriers create a challenge for any site that wants to collect data beyond what cookies and log files can provide. They need to demonstrate to the user that they will get some equal value from the site for the cost of their time and sacrifice of personal information. The content that can be collected via this method is only limited by what the user will tolerate, although there are some regulations regarding collection of information from children. Once a site has demonstrated their value proposition and collected the information from the user, this information can be stored in a user profile in conjunction with other data from log files or cookies.
Many news and research information publishers can request that users fill out a registration form in order to gain access to their value-added services (Financial Times, Advertising Week). Other sites can gather information at the time of purchase, however, if they ask for too much information, they risk annoying the user and compromising the sale. Other services such as customized news services (Wall Street Journal's "Personal Edition" and Pointcast) and Firefly Networks solicit preferences from the user with the promise of using this information to customize the user's experience and better cater to their interests.
More and more market research, in the form of surveys, is being conducted over the web using traditional participation incentives such as payment for survey completion. This is a cheap and increasingly effective way to conduct market research, although, given the demographics of web users, it is not always a representative market sample. One method of research that does not work effectively is Spam Emails, surveys sent randomly through email networks. One company, CIC research, says that 5% of these return as flames, often offensive email denouncing the practice, and 1% return as letter bombs, emails which when opened, infect a server with a virus destroying the data resident there.
While very valuable, and individualized, customer information can be collected when a user volunteers it, the greatest weakness of this data collection method is overcoming the invasion of privacy sentiment of users. One survey found that 94% of 9300 respondents would not be comfortable giving retail, medical or financial information to firms with which they had no previous dealings. (Survey by Boston Consulting Group for Electronic Frontier Foundation and CommerceNet). Currently more than 41% respondents reported leaving a web site when asked for registration information and another 27% lie, creating misrepresentative data. Consequently, the TRUSTe program was created. This program places branded logos or "trustmarks" on Web pages to assure consumers their information will not be misused. Respondents of the survey would be 40-50% more likely to execute a transaction with this kind of assurance.
Another study from online research and database firm Cyber Dialogue reports that 92% of respondents will answer questions about hobbies and special interest in exchange for receiving customized content, however names, salaries and credit card numbers are less likely to be provided as willingly. ("Most surfers fear revealing too much on the Web", USA Today. )
Privacy concerns are an issue both in passive and active data collection. As the user community becomes more aware of the data collection techniques currently being used on the Internet, fears of misuse of data arise. With increasingly sophisticated data collection techniques, users fear discrimination and invasion of privacy. However, advertising and collecting data is vital to the support of much of the content on the web and the primary way to maintain free access. Use of the data collecting methods is necessary to protect the level of service and product that the web currently provides.
In order to reach a compromise, many of the leading thinkers in policy for the Internet, specifically the W3 consortium, are addressing the "Open Personalization Standard." This standard will serve to blur the lines of active and passive data collection allowing a user to actively define the information they are willing to provide and to whom. The W3 calls these standards currently in development, "P3: Platform for Privacy Preference." They are creating policy, operating standards and protocol and expect to be completed in 14 months. Standards will be provided for ways to express privacy practices and preferences. Sites that comply with acceptable privacy practices will have access to the user information seamlessly. Otherwise, the user will be notified of a site's information practices before releasing any information to them. To simplify the experience, users also will have the option to download recommended settings from a trusted source.
P3P Scenario (from W3 Consortium web site) 1. A user sets generic preferences, upon which her agent (browser) automatically acts. She can now browse the Web seamlessly. 2. She encounters a site with "exceptional" practices outside her generic preferences. Perhaps a sports news site wants to collect her favorite teams for a customized news page. 3. The user is prompted if she wishes to consider other alternatives, consent to the exceptional practice, or to go elsewhere.
This standard could prove to be mutually beneficial to both users and collectors. A user could choose to store personal information and interests on their drives and then decide whether to disclose additional information to a web site. This will satisfy users wish to provide information and customize their experience and will limit their hesitations about the time commitment of repetitively entering preference information and ease some of their privacy concerns. Users will understand the data policies of a site and will make informed choices about the release of their information. Those trying to market with the Internet will have access to better information leading to better analysis and better offers.
Using this emerging standard, Firefly Network wants to organize a consortium of companies to share information about consumers that they collect from their Web sites. Firefly allows users to create profiles of their music and movie preferences and allows vendors to use this information to sell products to them one-on-one based on their past preferences. Vendors might also use the data if community members approve to identify which current and potential customers are more valuable than others.
Firefly Network is shipping its Firefly 3.1 suite of personalization products, one of the industry's first platforms for creating, managing, and sharing user profiles. The platform is one of the first to take advantage of the privacy standards work that began with the Open Profiling Standard (OPS), and which has since been subsumed by the Platform for Privacy Preferences Project (P3P) work group in the World Wide Web Consortium
The company is hoping its proposed de facto standard for user profiling can benefit commerce and can also drive sales for its own server-side Passport Office, which will read client-side Firefly Passports that will identify unique users and carry some data about them. The product initially is priced "in the $4,000 range," Klein said.
The Passports are unique identifiers that can be kept in numerous kinds of client-side storage systems, from digital certificates to elaborate cookies. Firefly's server will issue the passports and read them. The idea behind Firefly's consortium is that sites in it can share selected aspects of users' profiles with each other via a database maintained by Firefly. Users also could choose how much of their profiles they want to share with each site.
Knowing what a consumer has done or bought on other sites could help site managers tailor the presentation of their information and enable Web advertisers to charge more because they can demonstrate that a particular type of person is seeing ads on their site.
"Direct Marketing on the Internet", by Matsuda, Rosenstein, Scovitch, and Takamura
No portion of this paper may be reproducted or used without the permission of the authors