Last updated on September 23, 2011
While working on migrating data for the upcoming MMBA Chapter Transition one of my tasks was to export membership data, split it into chapters, then sent it off to IMBA for import into their membership system. After getting the information out of the database I used my Mac to create one large spreadsheet with all the requisite fields covering the entire state (including a chapter column) and sent it around to all interested parties for confirmation that it looked fine. The next step was to break it out into per-chapter data so this time I sat down at my PC, opened the file, filtered on each chapter, selected everything displayed, copied the data, created a new file, pasted the data, saved the now per-chapter data, and sent it off to IMBA.
A couple weeks later, once the files had been pulled into IMBA’s systems, I received a few notes questioning the data as some chapters listed only single-digit counts of active members whereas they were known to have exponentially more. Checking over my data I found that the per-chapter spreadsheets were wrong, and that the date columns had been decremented by a good deal rendering most active memberships expired. For example, on my personal row the expiration date of 6/29/2012 became 6/28/2008.
After a bit of investigation I learned about the toggle shown above to set Excel documents to use the 1904 Date System, and this turned out to be a big key to figuring out the problem. As documented here in Microsoft KB article 180162, when I’d first created the spreadsheet on Mac the file was set to use the Office for Mac default of 1904 Date System and the populated with the accurate data. Opening the file on PC retained this 1904 setting and displayed the data correctly, but when the new file was created with the PC default the 1900 Date System and data pasted into it became offset by roughly four years. I’d presumed that the copy/pasted data would be the same and with the first few columns in each sheet looking correct I didn’t check the rest, and files with incorrect expiration data were thus sent off to IMBA.
The real reason for this goes a bit deeper and involves how Excel stores dates. Specifically, dates are stored as a numerical offset (called a serial number in Excel) from a starting point in time (an epoch). Toggling between the 1900 Date System and 1904 Date System changes the epoch, but as date fields copied and pasted between sheets are copied as their serial number, if the epoch changes the displayed date will be different. The data behind the scenes doesn’t change, but the information presented to the user does. See Microsoft KB article 180162 for a more detailed explanation of why this occurs.
I understand all of the reasons why this works as it does (legacy compatibility with Mac vs. PC, the use of a serial number + epoch so that all international date formats are easily handled, etc) but in some edge cases (such as what I ran into) the result for the end user is maddening.