Friday, January 11, 2008

What is your MS Office Metadata Telling You???

So you are given a couple of word documents and the person who gave them to you wants to know what you can tell them about the files. You tell them no problem and start to analyze them. You can get the files here. Now they all look like Word Docs, they open like word docs but some of them smell kinda funny. The reason some of them smell funny is that they have no normal word metadata. Now the first file has all the usual metadata but the rest of them seem to have lost their metadata. Now to cut to the chase every document after test-1.doc was opened in Word Perfect and saved in a MS Word Document format. I have not really heard any discussion about this until I came ac cross a file just like the ones I will be discussing (how I find this stuff sometimes I will never know).

The first file, test-1.doc was created in Microsoft Word 2003 and saved. If you run Harlan Carveys WMD.pl program you will see that it comes back with a whole slew of metadata. Every file after this one was opened in Word Perfect (WP) and saved in MS Word 97/XP/2003 format. You really need to look at these files in a hex editor to appreciate what is going on here.

In test-2.doc everything looks like test-1.doc except that towards the end of the file you can see where the body of the text document I typed in resides with the changes I made. This is very interesting because each time I save the file it switches between the top text and the bottom text. If you compare the 2 areas you can see one is the newly edited text and the other one is the last saved text (I numbered each sentence I types so you can tell what order I saved them in). Kinda cool how you can start to see the changes in the file. Now after the first save in WP if you search for the hex values FEFF00 you should find 2 spots in the file where the word metadata resides (my name, company, title, etc..). Now after you save the file again that first section of metadata disappears (if you look at the difference between test-2.doc and test-3.doc you will see what I mean). Now after the third save the next set of word metadata is gone (test-4.doc). Now you understand why there was no metadata. Files 5, 6 and 7 are just to show how the text of the file goes back and forth between the 2 areas. Also in the file you will see the words Corel Corporation which leads you to believe that it was edited in WP.

Now lets say that you have files test-1.doc, test-2.doc and test-3.doc what can you really say about them? Well here is what I would state about these files:

Test-1.doc was created in word, you can tell by the way the file looks and all the metadata (a word document has the same fundamental look).

test-2.doc was edited and saved in word at one time because of the presence of the 2 sections starting with FEFF00. With the words "Corel Corporation" in the file and the exact same text in 2 spots in the file I can say that the file was last saved with Word Perfect.

test-3.doc was edited and saved in word at one time because of the presence of 1 section starting with FEFF00. With the words "Corel Corporation" in the file and the there are 2 areas of edited text and they do not match then I can say that the file was saved with Word Perfect the last 2 times it was saved.

Does this make sense and do you come to the same conclusions I have?

Now one thing to note if you are using the wmd.pl program mentioned above is that after a couple of saves in WP the metadata will show that the file was created on a mac and not windows. I have told Harlan about this so he is aware of it.

Now the question to ask your self is what other programs that do a "save as" another format exhibit this type of behavior.

Now I hope I was clear in what I was saying. If not then download the files and check them out and I think it will be clearer.

Questions/Thoughts/Comments???

3 comments:

web designer said...

nice post

seo expert said...

Thanks ur information



Small business website design

Suparna said...

Great article...good research! Thank you for sharing this with us.
Web Design  |   Web Designer