Clean messy data by providing examples in Excel

Clean messy data by providing examples in Excel

While Excel isn’t usually my tool of choice for manipulating or analyzing data (I prefer to use it as a data source for R), it has just learned a new trick that’s likely to prove useful from time to time. Extracting the useful information from complicated or inconsistent formats can often be a pain, but with Excel’s new Transform Data By Example add-in, you can start with a column of messy data, manually provide a few examples of transformations by hand (3 or so usually does the trick — just as many as you need to cover all the variations), and Excel will automatically figure out how to generate similar examples for the rest of the rows.

For example, given just three examples of "Lastname. Firstname" format, the add-in can extract formatted names from a mixture of name formats while discarding titles and suffixes:

Flash1   Flash2

Excel has had a system like this (called Flash Fill) for a while, which uses the automated rule-writing algorithm described in this paper. This new add-in extends the concept by searching not just the space of text transformations, but also the outputs from library of web-based services from Github, .NET and StackOverflow. You can even add your own transformations by publishing them as an Azure Function or web service. The video below shows an example from Python, but I imagine that R-based examples are equally possible.

For more, check out the link below.

Microsoft: Transform Data By Example


via Revolutions

May 18, 2017 at 08:47PM


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s