On one of my projects, I have to process data from lots of sources. One of these provides essentially a report in spreadsheet format. I need to strip out all the extraneous things – empty rows, data split into rows by line feeds – so I end up with one line containing a name and address so I can geocode it. To do this in Excel, I use these steps:
- Copy the column (B) containing the name and address to a new worksheet
- On the new worksheet, select the column just pasted, then click (on the Home tab) Find And Select –> Go To Special –> Blanks, then click OK
- Still on the Home tab, select Delete –> Delete Sheet Rows (the paste command brings lots of empty rows from the original worksheet, because of the report formatting; these commands will select and then delete the empty rows)
- Select the Data tab
- Select Text To Columns –> Delimited –> Next
- Step 2 of the Text To Columns Wizard, de-select Tab, select Other, and type Ctrl-J (control-J) in the box beside Other. You’ll just see a period in the box, but the sample display will show the data split into columns. Click Next and then Finish.
- To combine the address for geocoding, use the Excel formula =CONCATENATE(B2,”,”,C2)
For whatever reason, this is a satisfying exercise, watching Excel rearrange the data into a form that is usable for my needs. Especially the text-to-columns function.