WE WRITE CUSTOM ACADEMIC PAPERS

100% Original, Plagiarism Free, Tailored to your instructions

Order Now!

Issue of Data Quality in Banking

Table of Contents
1.0 Introduction. 3
2.0 Literature Search. 4
3.0 Methodology. 11
4.0 Data Collection. 11
5.0 Confidence Interval 12
6.0 Hypothesis testing. 12
7.0 Control Charts. 12
7.1 R-Chart 13
7.2 X-Chart 16
7.3 Data analysis using some of the 7 quality tools. 16
8.0 Capability Analysis. 17
9.0 Recommendations and Conclusion. 18
Reference List 19
 

List of figures
Figure 1: Error rate vs. costs incurred. 5
Figure 2: R-Chart 14
Figure 3: Adjusted R-Chart 15
Figure 4: X-chart 16
Figure 5: Line graph for the data quality rates. 17
 
 
 
 

List of tables
Table 1: Strengths and weaknesses of alternatives available for banks in data quality controls. 7
Table 2: Data quality rates for one year 11
 

Issue of Data Quality in Banking
1.0 Introduction
The banking industry has been growing tremendously because many people are using the services of banks. This industry plays a major role in offering monetary services such as financing individuals and institutions, being the custodian of finances for customers, offering financial advice to its customers and many other services. The retail sector in banking has used the Customer Information System (CIS) or Customer Information File (CIF) for years in an attempt of linking customer data that are necessary. These systems, however, help in keeping the data linked and clean, rely on end users (Insight Ecosystems 2009, 1). The Postal Service in the U.S, for instance, estimates over 40% of user keyed-in data to be either incomplete or incorrect. Many banks possess central processing systems like brokerage, credit card and mortgage not even connected into the CIS/CIF. This lacking data quality as well as the complex nature of relationships between customer and business, customer and household, customer and account, and between customers themselves causes a roadblock in business objective achievement. This may be referred to as a customer view that is incomplete. This report explores the issue of data quality, the aspect and accepted levels of data quality in the banking sector, monitoring data quality and benefits associated with the chosen method of monitoring data quality. The plan of the report will, therefore, begin with an analysis of the banking data quality issue, followed by some of the current patterns in data quality, an analysis of the appropriate data quality monitoring method, and finally, a conclusion on the present and future direction of the data quality problem.
2.0 Literature Search
Quality refers to the satisfaction of needs by delivering the required services and goods to the customers. In addition, when the needs of stakeholders to an organization are met, it is possible to say that quality has been maintained (Benson, 2003). Product or service quality presents both an opportunity and a problem for many industries – a problem due to superior quality being offered by competitors; and an opportunity since consumers are more and more concerned with service and product quality. This has thus resulted in an increased interest in quality management at various companies (Garvin 1986, 653). Quality problems may crop up from a variety of sources including equipment that is poorly maintained, shoddy workmanship, defective materials, and poor designs among many others. Many companies have enormous customer data amounts and even with sophisticated software applications, customer data quality decays with time (Insight ecosystems 2009, 1). The banking sector is one such industry faced with the issue of data quality. This may not be attributed to the bank’s applications but rather the nature and data itself. Typically, a bank will have data elements in the range of more than 500 million for every asset of 1 billion. People move they change names and telephone numbers. These frequent and at times, dramatic changes in business and consumer data, coupled with the changing rate of key customer data identification elements, the problem becomes worse.
A report by the Data Warehousing Institute in 2002 pointed out that there exists a considerable gap between reality and perception with regard to data quality in numerous organizations and that the problem of data quality costs businesses in the United States over $600 billion every year. Studies have also indicated that 25% of the CIS/CIF data of an average bank is incorrect, and such customer data errors usually contribute to several issues impacting the bottom line in any bank (Insight Ecosystems 2009, 2). Even smaller banks sending out approximately 200,000 pieces of promotional mail annually, an error rate of 25% would translate to mail costs in the amount of $50,000 (Figure 2). The lost revenue and expense, which results from poor quality data, is often understated. Data quality issues normally begin chain reactions in the business processes of banks. Privacy violations become more likely and even contacting a wrong person in the wrong manner, for example, may cost the bank $10,000 for every incident (Insight Ecosystems 2009, 2). On a broader scale, issues of data quality may lead to wrong decisions by bankers both strategically and tactically hence leading to decreased profitability and lost customers. The elevated profile associated with the issue of data quality may be attributed to the need by the banking sector to provide distinguished services to customers, better customer value and improved marketing practices.
Figure 1: Error rate vs. costs incurred
Source: Author
From the above figure, it is evident that when the error rate increases, the costs incurred by organizations increases. Data provides a cornerstone for building these capabilities as well as achieving business goals, and as such there is a need for the highest possible data quality. A study under the sponsorship of Reuters carried out by Vienna University of Economics and AIM Software in 63 countries covering over I700 banks, indicated that data quality improvement is termed as a significant risk management issue and regulatory requirements are driving significant investments in the field of Information Technology (IT). The study quizzed banks on risk management and management of reference data, which showed that considerable efforts are being made by financial institutions globally to deepen data management hence increase quality of data. High quality data has been termed a valuable asset for increasing customer satisfaction, improving profits and revenue, and can also be a tactical competitive advantage for any company (English 1999). Most of the efforts by banks are in the processing of corporate actions and reference data automation, areas accounting for the highest costs. Another extensive survey involving 629 executives at the CFO level indicated that improving the quality of data and information integrity was the critical technology most pervasive concern. Another finding showed that only about 1 out of 5 financial executives rated their company’s information as highly satisfying (English 1999). Also, another survey by Information Week of 500 professionals of business technology noted that 45% viewed problems of data quality as hindering adoption of Business Intelligence enterprise wide (English 1999).
The tolerance accepted levels for quality data in the banking sector lie in data that give a single or 360 degree view of a customer. The idea is presenting a correct, consistent and complete picture of businesses and customers to every area of the bank. This single customer view may be achieved only through clean data coupled by a mechanism linking different businesses and customers together. The view needs to be: complete containing all data relevant and pertaining to the customer; correct as regarding the person or business process; and lastly, consistent for everyone to have a similar view of the customer (Insight ecosystems 2009, 4). From channels of retail delivery to analytical processes, similar information concerning the customer needs to be used (Martin 2005a). Other commonly listed attributes of data quality forming the basis for excellent data include:

Integrity and validity which is concerned with how correct the data is,
Precision covering the ability of the data in question to reflect a customer’s full details,
Accessibility which defines the feature of data being readily available, and
Timeliness referring to whether data is available when required.

These features form the definition of statistical quality as outlined in 2003 by the ESS (European Statistical System) comprising member state institutes of national statistics and Eurostat –the Statistical Office of the European Commission (Lyon 2008).
The problem of data quality is not a new thing for banks and years third parties have been engaged to carry out “scrubs” on the bank’s CIS/CIF data files. This approach has, however, been termed by experts as only ensuring address and name quality at one point. More up to date solutions center on not only regular cleansing of address or name data, but multiple elements of data supporting downstream processes of business. Top banks in the United States are shifting toward continuous processes providing real time as well as daily cleansing and integration (Insight ecosystems 2009, 9).
Table 1: Strengths and weaknesses of alternatives available for banks in data quality controls

Alternatives
Strengths
Weaknesses

Service Bureaus
Experience and history in cleansing
Offsite processing of data introduces

name and address data for direct mail
long wait times in what should be a con

tinuous process. They only provide part of

the solution.

Data Integration
Sophisticated software tools for
The implementation and integration of

Tools
data quality.
the tool can be very expensive. Onsite

data integration experts are needed to

run the system on an ongoing basis.

Customer Hubs
These tools provide a modern CIF/
The root data quality issue remains.

CIS approach to house one “gold”
Modification of existing systems must

copy of customer data.
be made to retrieve customer data from

the hub.

Professional Services
Highly customized approach for
These solutions can be extremely expen-

large banks.
sive with a minimum cost of $5 million

per year.

Source: Insight ecosystems (2009, 9).
With so many available options, choosing the most appropriate method of data quality evaluation may be difficult. However, from a personal standpoint, the HDQ (Holistic Data Quality) framework would fit best in approaching the data quality issue. This framework will assist in addressing issues typically resulting in exposure of companies to serious risks (Friedman & Bitterer 2011). Quality must not be managed or evaluated in business silos that are vertical but rather in an approach that is holistically integrated and based on an HDQ framework. This approach includes consistent measures of quality, robust analytics and reporting that is exception based. Implementation of HDQ results in data of higher quality, transparency into outliers and hot spots, lowered quality issues remediation costs, and significantly reducing the resources and costs expended on sustaining external and internal audits. This may also come with the significant benefit of regulatory compliance. The data overflow over the past years, and increased data landscape complexity required no rapid fixes. Taking an approach that is strategic and implementing HDQ and the associated service capabilities that are shared in a systematic manner, will enable banks overcome the challenges of data quality (Friedman & Bitterer 2011). This is because, in comparison to other approaches HDQ displays the data issue in a holistic manner, and hence there is increased transparency into enterprise data quality and state. Most products or approaches deal with a single specific issue of data quality, which may not be able to support other kinds of applications, and also lack the complete functionality breadth expected of solutions of data quality today (Friedman & Bitterer 2011, p.7). The process is the end to end as opposed to just dealing with data quality management at individual levels.
Implementation of a HDQ strategy at the enterprise level requires program management that is strong expertise in business intelligence, data warehousing, architecture that is service oriented and the integration of systems (Friedman & Bitterer 2011). Data quality cannot be looked at as a single aspect; it may be measured in multiple dimensions and often viewed in a different way by different people (Wang, Reddy & Kon 1995). Data quality is a crucial topic for the banking industry today. Lacking integrated, quality data are a critical business problem bearing a significant impact on the financial performance of banks. It is for this reason that dollars in millions are spent by large banks towards data integration and quality efforts. Bank executives support such projects because they are aware of the effect of reliable data on key business processes (Insight ecosystems 2009, 10). With banks and other financial institutions perpetually seeking ways of improving their data quality, the HDQ approach may be lauded as providing the best alternative in management of data quality due to its transparency element. Information transparency enables gaining an understanding that is realistic of data users depend. This, in turn, helps producers in monitoring and becoming more liable to their data’s quality (The Information Difference Company 2009). Each company in supporting unique processes and workflows is reliant on data in various ways. While data in the universe may considerably vary, data quality patterns cut through departmental and functional lines. For all cases, data quality improvement is able to enhance business information value for both analytical and operational purposes (Wang, Reddy & Kon 1995). Therefore, when all is said and done, any company’s data need to be “proper for the use” which is particularly true for financial data. As quality of data remains a process, organizations need to set guidelines on how data quality is to be managed. This involves continually concerning themselves with the gaps or issues existing today, the person to develop rules and administer rule changes among many other areas relating to data quality within the bank.
3.0 Methodology
To prepare this report, managers of 12 banks were sampled and interviewed to identify aspects of data quality which affect their banking institutions. The data collected reflected data quality aspects for one year (12 months), and 8 observations were articulated. Tests on confidence interval and hypothesis testing were done on the collected data. R-Charts and X-Bar charts were developed. The 7 quality tools were applied in the presentation of the data. In addition, calculations were made to determine the process performance capability as well as the capability index.
4.0 Data Collection
On a scale of 1-10, the bank managers were required to rate the quality of data for a period of 12 months. Ten random data quality rates were provided by the respondents. The data collected is presented in the table below:
Table 2: Data quality rates for one year

Month
Jan
Feb
Mar
Apr
May
June
July
Aug
Sept
Oct
Nov
Dec

Quality of data as recorded by bank managers using a rate of 1-10.
2
5
7
4
6
9
8
9
7
8
8
4

5
6
8
5
8
8
8
10
9
7
8
8

1
3
2
4
3
5
7
9
8
8
7
6

2
5
6
8
6
8
8
9
7
6
6
4

3
6
7
7
6
9
8
10
9
8
6
5

2
5
6
4
5
8
6
8
8
7
7
8

4
7
8
5
5
10
8
9
8
6
8
6

3
6
5
8
6
7
9
10
9
8
7
7

4
4
4
6
5
9
10
8
6
7
6
5

5
5
6
5
5
6
7
10
8
7
5
6

The average monthly data quality and the average time spent was computed by the application of this formulae:
Where, n=the sample size,
5.0 Confidence Interval
A confidence interval of 95% was applied in getting the range of the data. The range provided the estimate of average data quality for the period under survey.
6.0 Hypothesis testing
The null hypothesis (Ho) was that data quality is less than or equal to 8. The alternate hypothesis (H1) was that data quality is more than 8. This survey was done at a confidence interval of 95%. The null hypothesis and alternate hypothesis can be represented using symbols as follows:
H0: P = 0.5
H1: P ≠ 0.5
7.0 Control Charts
The X-bar and R charts were applied in the analysis of the data obtained. The X-bar was applied in checking the mean of the collected data and its relationship with ideal data. The R-Chart was applied in checking the variation of the collected data. R-chart was developed before preparing X-Bar because X-Bar is dependent on the R-Chart.
7.1 R-Chart
This was applied in obtaining the variation of the range of the collected data from the average means. The R-Chart applies two control limits which are compared to a center line. The Y-axis shows the range of the measured sample while the X-axis indicates the sampling time.
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 2: R-Chart
 
 
 
 
 
 
 
 
Adjusted R-Chart
The adjusted R-Chart was created using a smoothing factor of 1.0171875. The R-Chart applies two control limits which are compared to a center line. The Y-axis shows the range of the measured sample while the X-axis indicates the sampling time.
 
 
 
 
 
 
Figure 3: Adjusted R-Chart

7.2 X-Chart
The X-bar was applied in checking the mean of the collected data and its relationship with ideal data. From Figure 4, it is evident that the mean of the collected data did not deviate too much from the ideal data. The mean obtained was 6.51, while the highest mean possible was 8.21 (UCL) and the lowest possible mean was 4.81 (LCL). From these observations, we can conclude that the null hypothesis is accepted. The data is accurate because the mean did not deviate from the ideal data.
Figure 4: X-chart
7.3 Data analysis using some of the 7 quality tools
The line graph below shows that data quality was at the lowest point in January, and was highest in August. The data quality has been fluctuating, and this indicates that there is need to improve on strategies of managing data quality in the sampled banks. The histogram below shows the fluctuation of data quality over the year (Figure 5).
Figure 5: Histogram for the data quality rates

8.0 Capability Analysis
To ensure that the banks provide quality services to the customers, the bank managers must ensure that data quality falls within a specific limit. In this survey, the control limits were set such that the lower control limit is 1 and the upper control limit is 10. In checking the capability of the process, the following equation was applied:
USL = 10, LSL = 0, = 4.56, = 7.11, d2=2.704
Cp = (USL – LSL) / 6(/d2)
Capability index
When checking the central limits of the process, the following equation was applied:
USL = 10, LSL = 0, = 4.56, = 7.11, d2=2.704
Cpk = min[( -LSL) / 3(/d2), (USL- ) / 3(/d2) ]
The capability index of 2.74 shows that thge process was reliable enough. The
9.0 Recommendations and Conclusion
Presently, many organizations are faced with significant problems relating to data quality, yet most of these organizations have no viable strategies for addressing the problems. A large number of organizations are struggling to evolve the scheme of their databases in a manner that is timely and in so doing, reducing the competitiveness of organizations. The current status also indicates that there lacks a unified data quality definition as well as a general data quality framework or model, and the existing data quality domain models differ in various aspects. There is also development and application of some standards on quality of data in particular domains. Future work requires a common model of data quality which specifies a data quality definition, the data quality categories or dimensions, the attributes for every dimension, how these attributes will be measured, and the ways of controlling and improving data quality.


Instant Quote

Subject:
Type:
Pages/Words:
Single spaced
approx 275 words per page
Urgency (Less urgent, less costly):
Level:
Currency:
Total Cost: NaN

Get 10% Off on your 1st order!