Monday, February 19, 2007

Ever need to know what words were in what emails? Ever need a cross reference for those words to what emails they came from. Don't want to spend a lot of money to get this done but want to be able to do this with many mailbox types and do it quickly? Well do I have some good news, with some Perl scripting, a sqlite database (I told you I love databases) and 2 programs from Fookes Software, Aid4mail and MailBag Assistant (both are also part of Paraben's email examiner).

So here is what you need to do. I will use a Outlook pst as an example. First open up Aid4Mail and export your pst file to a directory into eml format (make sure you recreate the directory structure of the mailbox). Next open up Mailbag Assistant and import all the eml files including the subdirectories. You will need to create the following script and template to use (I will put all the files in a zip archive and put them on my webserver for you).

Script: Save_Body_As_Text

IfEmpty End
MergeData Save_Body_As_Text

Template: Save_Body_As_Text

>>>Files ?\{Mailbox}\{Subject}.txt
{Body}

The script will take all the selected emails (alt-a) from the "Grid View - Main" and run the template unless no emails were selected. The template will save the text body of the eml file to a directory you will be prompted for with a structure of <Directory Specified>\<Mail Box, IE: Inbox, Deleted, etc..>\<subject line>.txt. Once all the files have been extracted, run the get-word.pl Perl program passing the top level directory of where the email bodies were extracted to you will extract all the words and put them into the database ( I am not include a listing of the program but will have it available for download). Now you can run sql against the database to find the keywords that you want, you can also run the following sql against the database to create copy statements for you so that you can copy the emails you want out to another directory (If you want to get even fancier then include a table with the keywords you are looking for and add a subselect to the query, if you don't know what that is email me and I will explain it further)

select 'copy "'||directory_found_in||'/'||filename_found_in||
'" "c:/stuff/test/test/'||filename_found_in||'"'
from word_file_xref a, words b
where b.word_seq_num = a.word_seq_num and word = 'Oracle';

You can also make a slight modification and add a table with words you do not want to see (IE: and, if, or, not, etc..).

I will package all the code and database create statements up and also include a exe of the Perl program in case you do not have Perl but still want to test out the program (I know the code is not the neatest but it is functional). It can be found here.

One interesting thing to note is that this could be the beginning of an open source e-discovery email production package. Any takers for a project like this?

Questions/comments/suggestions?

1 comment:

Anonymous said...

How well does Generic Viagra work? Studies show that Generic Viagra UK improves erections in more than 80% of men taking Generic Cheap Cialis 100 mg versus 24% of men taking a sugar pill.No other ED tablet is proven to work better.