Top

When bad things happen to good data

August 30, 2008

When bad things happen to good data
Hi, everyone. Before I begin my first post on the Alpha blog, I wanted to introduce myself. My name is Janet Park, and I’m a long-time Alpha developer, with a focus on data.

After knowing Richard and the Alpha community for some time now, I’m thrilled to have been invited to take part in the blog. So here goes nothing …

As a developer, I’ve come across a lot of “dirty data” stories. So I thought you, dear readers, would like some data cleaning tips.

Step 1. A quick way to find dirty data is to first do a sort, and then an inverted sort, on each field you intend to use. Usually the crap (excuse my French) floats to the top or bottom.

Step 2. Page through in browse mode, and if the file is large, jump through the records in larger batches to see if you notice any pockets of weird data. It’ll jump out at you.

Step 3. Once you’ve identified obvious errors, look for patterns you can fix. For example, Excel likes to decapitate leading zeros in east coast zip codes. Alpha can easily reattach them with a global update. Write scripts to fix simple, recurring problems (i.e., “Jr.” in Last Name field = “Dear Mr. Jr:”).

Step 4. Run crosstabs on addresses and phone numbers to ferret out the phonies. You’d be surprised at how many people live at 1600 Pennsylvania Ave., or have 555-555-5555 as a phone number.

Step 5. Have your data standardized, cleaned, and updated through a CASS-certified mail and National Change of Address System. This is the best way to begin a merge/purge for duplicate records, as it corrects and standardizes address info used in matching.

Step 6. If the data is really awful, go back to your source and ask for a new file. A transmission blip or zip creation error can render a file useless.

Step 7. If you regularly get lousy data from a particular source, think of it as an opportunity to build an Alpha Five front-end data entry system. Prevention is the best cure for bad data. Use field rules, lookup databases, masks, and validation expressions to keep bad data out and make it quick and easy to enter good data.

Stay tuned for more data management tips and tricks from me in future posts!

How to get the most out of QReportBuilder
We’ve had several inquiries about tutorials since introducing QReportBuilder. We have now scheduled a number of interactive online tutorials with a live instructor, Al Buchholz.

Al has years of experience in building custom reports for QuickBooks users, and is an expert with QReportBuilder. If you are interested in getting all you can from QRB, move quickly, as there are only two classes left. Sign up here.

Related posts

Comments

Got something to say?





Bottom