Wednesday, January 7, 2015

Five tips... #3 - Understanding and Exploring Your Data

This week's 3rd tip for advancing with audit and compliance analytics is to "Understand Your Data, and Explore it Fully Before Developing Exception Queries." One common mistake that we see audit and compliance professionals make with data analytics is that they sometimes dive right into searching for transaction exceptions before exploring their data fully. This limits the effectiveness of their analysis, because they are searching for something specific and can overlook other conditions or anomalies in their data. If you've not seen the selective attention (aka Gorilla and Basketball) videos from Daniel Simons, here's a fun link.

Selective attention on exception queries seems to happen due to the strengths of traditional analytics tools like Microsoft Excel and general purpose tools like CaseWare IDEA or ACL. It is less common with Visual Reporting tools like Tableau and Qlikview, in part because these tools are designed to specifically support data exploration and interaction with click and drill-through capabilities. Visual Reporting capabilities are very effective for data exploration, and some rudimentary visual capabilities can be found in Excel, IDEA, and ACL.

During data analytics brainstorming, we categorize analytics queries as Metric Queries, Outlier Queries, and Exception Queries. When prioritizing queries to be built for client assignments, we make sure that there some of each type of query, so that sufficient data exploration takes place before we jump into exception queries or begin researching exceptions.

Metric queries are those analytics such as "Top 10 Vendors by Vendor Spend" or "Top 10 Vendors by Number of Transactions", or "Top 10 Dates of the Year for Requisitions (or Purchase Orders)." Simply summarizing number and value of transactions by different dimensions (day of week, week of quarter, or by UserID) can identify anomalies that should be questioned further. On a recent Payroll Wage and Hour project, we found unusual patterns of when people punched in and out much more frequently on some minutes (e.g. 7 or 23 minutes past the hour, vs. 8 or 22 minutes past the hour). This condition called for further inquiry and analysis about whether time rounding was fair and equitable for certain types of workers. This condition is in fact a major compliance risk and should be considered for any employers with a significant number of hourly worker. See Corporate Counsel article for more information.  

Outlier queries are comparative analytics like "Largest Invoice to Average Invoice, by Vendor," "Most Expensive Airfare by Distance," or "Most Expensive Travel / Entertainment Event per Person vs. Average Event per Person." These outlier queries are also essential, in that they help identify patterns or relationships that should be investigated further. Digital analysis such as Benford's Law is a well-known audit example of an Outlier query, but there are many more techniques that can yield insight beyond only Benford's Law.

Example of exception queries are more traditional Analytics queries such as these listed below:
  • List if two (or more) invoices have been paid for the same amount to the same vendor
  • List any purchase orders created after their corresponding invoice
  • List any Vendors who share a Tax ID Number, Address, or Phone Number with an Employee
  • List any Vendors who have had transactions posted after being Terminated or made Inactive 
In short, we recommend spending at least an hour and as much as a day or more exploring and analyzing your data, before beginning any Exception Queries. A data exploration checklist follows - any additions or other suggestions to this list are welcome. 
  • Sort transactions from oldest to newest and from newest to oldest. Any unusual dates or times? Any gaps in date or time stamps? Why?
  • Sort transactions from largest to smallest and smallest to largest. Any unusual negative values?
  • Stratify by various status codes, reason codes, or transaction types. Are all values consistently completed. Any unusual relationships? What do each of the codes and values represent?
  • Stratify by dollar value ranges. Do 20% of the transactions make up 80% of the value? Should they? The Pareto Principle says yes, but your business may vary. 
  • Compute Relative Size Factor (largest to average and largest to second largest), and sort again. Do any of these RSF values cause you to want to drill into specifics? Consider whole numbers and large numbers. Why or why not?
What has been your most significant "aha" moment when exploring your data? Comments and feedback are welcomed below.

Joe Oringel
Managing Director
Visual Risk IQ
Charlotte NC

No comments: