|
MARK A.
FORMAN
ASSOCIATE DIRECTOR FOR E-GOVERNMENT AND INFORMATION TECHNOLOGY
OFFICE OF MANAGEMENT AND BUDGET
BEFORE THE
SUBCOMMITTEE ON TECHNOLOGY, INFORMATION POLICY, INTERGOVERNMENTAL
RELATIONS, AND THE CENSUS
COMMITTEE ON GOVERNMENT REFORM
UNITED STATES HOUSE OF REPRESENTATIVES
MARCH 25, 2003
Mr. Chairman
and Members of the Subcommittee,
Thank
you for the opportunity to appear before the Subcommittee
to discuss the Administration's views on data mining.
This committee
has defined "data mining" as a "technology
that facilitates the ability to sort through masses of information
through database exploration, extract specific information
in accordance with defined criteria, and then identify patterns
of interest to its user." While there are many definitions
of "data mining", the Committee's definition is
generally accepted and helpful in defining the issue and its
challenges. Additionally, data warehouses are being used as
the source of data for many data mining applications. A data
warehouse is a managed data repository of integrated, cleansed
data whose source is mainly transactional data. Data is aggregated
from various sources and structured for the use of analysis
and reporting.
Commercial
Types and Uses of Data Mining
The private
sector uses data mining to make sense of the wide breadth
of data that companies and industries have available. Some
examples of these uses:
- Customer
Relationship Management/ Segmentation Analysis
– Applied to Customer relationship management (CRM),
data mining is used to analyze disparate customer data and
provide insight into customer needs and wants. Data mining
is used to analyze and segment customer buying patterns
and to identify potential goods and services that are in
demand. Companies that use data mining shorten response
time to market changes, which allows for better alignment
of their products with their customers’ needs. They
do this to increase revenue performance and allocate investment
to products that meet consumer demand effectively.
- Fraud
Detection – Companies use software that provides
comprehensive, transaction-level financial reporting and
analysis to support automatic fraud detection and proactive
alerting. Software packages can also be used to detect anomalies,
variances, and patterns in databases. For example, BlueCross/BlueShield
and other health care payers use data mining tools to catch
and prevent fraudulent and abusive billing practices. BlueCross/BlueShield’s
solution can quickly search through millions of medical
claims and detect inappropriate billing practices with a
high degree of reliability.
- Retail
Analysis and Supply Chain Analysis – Companies
such as Wal-mart are broadly recognized for analyzing sales
trends. Retail analysis and supply chain analysis can be
used to predict the effectiveness of promotions, decide
which products to stock in each store, and help managers
understand cost and revenue trends in order to adjust pricing
and promotions in anticipation of changes in marketplace
conditions. Data mining also allows supply chain tools to
monitor and analyze inventory trends, forecast product demand
for replenishment, track vendor performance and identify
problems, analyze distribution network efficiency, and understand
supply chain costs and inefficiencies.
- Medical
Analysis/Diagnostics – The health care industry
uses analysis to predict the effectiveness of surgical procedures,
medical tests, and medications. High-risk segments of the
population can be identified and targeted for proactive
treatment. For example, American Healthways relies on predictive
modeling to identify patient types who trend toward high-risk
conditions, giving care coordinators a proactive approach
to healing. The result is improved quality of life for the
patients and reduced stress on hospitals and insurance providers.
- Document
Analysis (Text Mining) – Documents can be
searched for information and insights in a fraction of the
time an individual will spend locating one document. Document
analysis involves analysis of text and structured and unstructured
data, organized by categories, to determine trends, pattern
and relationships and organized by categories. This can
be highly effective in survey analysis. Content management
systems and software packages perform analyzes on an organization’s
information products to help companies control information
flows and work products. For example, Autonomy at BAE Systems
aggregates content from many sources in many different formats,
structured or unstructured, including their intranet and
10,000 news feeds per day. The goal is to personalize the
delivery of that information to each user, and to eliminate
work duplication and time-consuming searches. Autonomy automatically
alerts BAE Systems employees to documents in the system
that relate to what they're doing, or to other employees
in the company whose interests and expertise match their
own.
- Use
of Decision Support Systems (DSS) – Decision
Support Systems may use data mining to identify trends and
present the information in intuitively useful ways -- supporting
more informed and effective decisions for business and organizational
activities. For example, one DSS solution for HR management
is now providing essential insights into The Bank of Scotland
Group's HR activities worldwide, giving managers personnel
and staffing information needed to make hiring and placement
decisions. Managers can determine if job turnover in a particular
area or occupation classification is higher than expected
and investigate influences on loyalty such as the physical
working environment.
- Financial
Analysis – The insurance industry uses and
data mining algorithms to conduct risk analysis, such as
evaluating actuarial experience studies for mortality, withdrawal
and disability, dynamically calculating exposures and expectations
for period ranges. For example, Canada Life performs timely
and accurate actuarial studies using a data warehouse and
advanced data analysis methods; the Generali Group uses
data mining tools to manage financial market risk and customer
credit risk via a common analytical framework for rapid
and flexible analysis and reporting of risk exposure.
Government
Applications of Data Mining
The Federal
government analyzes data that has been collected from the
public for several purposes, including determining the eligibility
of applicants for Federal benefits, detecting potential instances
of fraud, waste, and abuse in Federal programs, and for law
enforcement activities. Some of this analysis is facilitated
by data mining. Here are a few examples of agency uses of
data analysis techniques and software:
- Financial
management – Poor management practices have
created opportunities for a wide range of fraud and abuse
in the use of government travel and purchase cards. Several
agency inspector general (IG) investigations have used statistical
sampling processes to document inappropriate purchases and
misuse of these cards. OMB is taking and will continue to
take substantive, affirmative steps to ensure agencies improve
their internal control systems to monitor expenditures properly.
- Human
Resources Management – One of the 24 E-government
initiatives, the Enterprise HR Integration under the Office
of Personnel Management, is leading the effort to provide
a government wide data warehouse of HR information to minimize
the workload as employees move from one department to another.
A key component of this is the E-Clearance project –
OPM and its partner agencies on the E-clearance project
are using data mining to more quickly access information
which speeds up the overall security clearance investigation
process. Given the backlog in clearances, this use of data
mining is critical to our ability to get staff for effectively
and rapidly through the human resources management processes.
- Reducing
Erroneous Payments and Fraud Detection –
Data analysis accomplished via the matching of electronic
databases between government agencies has been an important
and successful tool for identifying improper payments under
federal benefit and loan programs, as well as detecting
potential instances of fraud, waste, and abuse in Federal
programs. As highlighted in the FY 2004 President’s
Budget, agencies are now required to report the extent of
erroneous payments made in their major benefit programs.
In addition, the last decade has shown an increased reliance
and increased spending on non-discretionary social services,
such as Medicare and Medicaid. These expenditures -- and
therefore the potential for improper payments -- are likely
to increase unless appropriate steps are taken to protect
against errors and fraud. Through the President's Management
Agenda initiative for improving financial performance, we
are getting a handle on the problem of erroneous payments.
For example, Medicare's erroneous payment rate has fallen
from 6.8 percent to 6.3 percent and the Food Stamp program
reduced its national error rate from 8.9 percent to 8.7
percent. Just these small rate reductions prevented the
waste of almost $1 billion. Furthermore, the Administration
has proposed several pieces of legislation regarding the
Administration’s authority to share data that will
greatly improve efforts to reduce erroneous payments.
- Policy
Analysis – The quality of policy decisions
is a function of our ability to correctly analyze enormous
amounts of data that describe a problem faced by modern
society. For example, the Department of Education mines
data from a variety of its student financial aid systems,
including the Central Processing System, Pell Grant Payment
System and National Student Loan Data System, permitting
professionals to analyze Federal education programs quickly
and easily, without the time, expense, and burden on citizens
of paper-driven surveys.
- Law
enforcement and Homeland Security – Federal
agencies have found data mining techniques to be an important
tool for assisting law enforcement combating terrorism.
For example, system such as the Department of Homeland Security’s
Bureau of Customs and Border Protection operates the Automated
Commercial Environment (ACE) can utilize a series of data
mining tools to strengthen border security efforts. ACE
will provide the IT mechanisms for making quick evaluations
on whether particular people or goods should be deemed high-risk
or low-risk. Also, ACE will enable the Department of Homeland
Security and other Federal agencies to more precisely target
for inspection or investigation the highest risk people
and cargo crossing the border. Through tools such as ACE,
agencies have the ability to instantaneously analyze vast
amounts of data and intelligence to see links among businesses
and people, thus revealing security threats that might otherwise
have gone unnoticed.
- Citizen
access to government data – Search sites
such as the one available at the FirstGov website provide
a facility for searching vast amounts of unstructured data
across the Federal government by using publicly available
search engines. In addition, the Federal government conducts
its own data analyses for statistical purposes and facilitates
data user access to statistical data. For example, the Census
Bureau's “American FactFinder System (Advanced Query)”
uses a data mining tool to allow users to query Census 2000
detailed data files. The tool provides simplified access
to and extraction of data.
Benefits
and Pitfalls
As outlined
above, the government has found a number of ways to use collected
information to improve program effectiveness and to reduce
misuse of taxpayer dollars. While the use of data mining techniques
to access useful, timely data and to identify relationships
that were previously unknown is a powerful tool for identifying
errors, fraud, threats, etc., the application of such techniques
to personal information raises serious questions about privacy
and how it should be protected. In order for this to be accomplished,
the government must continue to act in several areas:
1.
Federal data analyses must be consistent with law
In the
federal arena, data mining activities must be implemented
consistent with the protections of the Privacy Act of 1974,
as amended by the Computer Matching and Privacy Protection
Act of 1988, and other privacy statutes. These statutes do
not address data-mining per se, but they outline privacy principles
the government must follow in data collection, including:
notice and reasonable disclosure; use and purpose limitations;
choice; access to government-held information, information
security; redress; and oversight. Agencies are well-versed
in the legal, policy, and technical requirements governing
access to and sharing of personal data. Agencies may aggregate
information by analyzing data across databases, a concept
known as “virtual data warehousing”; however,
when information can be accessed or exchanged at numerous
locations by many users, a potential exists for inadvertent
disclosure of personal information or misuse of personal information,
by alteration or for unauthorized purposes. Agencies that
adhere to the existing legal and policy structure including
OMB and NIST policy guidance can protect personal information
in their possession even as they participate in data-mining
activities. Furthermore, the E-Government Act of 2002 requires
that an agency conduct a Privacy Impact Assessment (PIA) when
agencies develop or procure information technology to initiate
a new online collection of information that involves personally
identifiable information changing hands, such as in the case
of matching.
2.
Ensuring the Security of Federal IT Systems
The Federal
Information Security Management Act (FISMA) provides a comprehensive
framework for ensuring the effectiveness of information security
controls over federal information resources, including resources
that result from data mining. FISMA requires the head of each
agency to periodically assess the risk and magnitude of harm
that could result from unauthorized access, use, modification,
or disclosure of information. The agency must then provide
information security protections that are commensurate with
the stated risk. Agencies are required to periodically test
their information security controls and techniques to ensure
that they are effectively implemented. The results of this
testing are reported to OMB on an annual basis.
Conclusion
“Data
mining” can have many uses. The Administration is strongly
committed to using available technologies like data mining
to serve citizens and protect citizens from other threats,
while the Administration is also strongly committed to protecting
the privacy of citizens when such tools are used. Through
data analysis and data mining, the private sector has improved
customer service and customer needs, and has been able to
help customers take proactive approaches to health care. The
federal government has reduced the number of erroneous payments,
and has been able to determine patterns in databases that
help predict both weather patterns and the spread of deadly
viruses.
We need
to use modern analytic tools, such as data mining, to improve
government performance, from policy analysis to fraud to homeland
security. We can maintain privacy and security while improving
government productivity, but we must employ tools like data
mining appropriately. We hope to work with this Committee
to ensure that the benefits of data analysis continue to help
Federal agencies to perform their missions, while protecting
against the problems that aggressive and abusive data mining
can cause.
|