HE · EN

Importing Problematic PST Files to Outlook

Importing problematic PST files to Outlook. Overcoming errors when importing archives from Enterprise Vault into Exchange mailboxes. The solution using a script.

· 1 min read · Updated June 21, 2024
Importing Problematic PST Files to Outlook

This time I want to share with you the path I took to solve a problem of importing problematic archives into live mailboxes.

You can also read the follow-up article on removing duplicate items from mailboxes.

This article is intended for those who find technical topics interesting, those who want to enrich their knowledge of PowerShell scripting, and also for those who have encountered a similar problem and would like to use the script or parts of it.

This article covers the following topics:

  • Exchange
  • Enterprise Vault
  • Outlook
  • PST files
  • PowerShell

Contents:

  1. Description of the problem I encountered, the client’s requirements and approaches to the solution - in general terms and with relatively limited detail.

  2. The script I wrote to implement the solution.

  3. Instructions and tips for using the script.

  4. Detailed breakdown of the script’s operations and the functions it contains - intended for those who want to understand this lengthy script in a relatively easy way.

Opening Notes:

  1. A follow-up article on deleting duplicates from mailboxes will come later.

  2. I assume that the reader is familiar with the topics. At least a general level of Help-Desk knowledge in operating and managing a Microsoft mail server, or handling Outlook including mail profiles, cache levels etc. Overly detailed explanations of these topics would have made this article considerably longer than it already is without them.

Background:

Mail servers and services are a world unto themselves. The importance of email is enormous in every form of work, and therefore the solutions and options on the subject are many and varied.

One of the most common and largest interfaces in the field is Outlook, which comes with Microsoft’s Office suite. Microsoft also has their mail server package known as Microsoft Exchange. Many manufacturers and service providers adapt add-ons, packages and solutions to fit this very common working method.

Email. Everyone uses it, everyone knows it, everyone needs it. What is an email? A few words and a subject line. Maybe a signature, sometimes an attachment.

Most emails weigh a few KB. Those containing a file weigh more. But just as a stack of A4 pages weighs a lot, emails that accumulate also take up enormous storage space. Anyone who is somewhat familiar with the industry knows that the larger the organization/office, the greater the number of emails each employee receives and sends.

Such a volume of emails takes up huge storage space, and that is significant in terms of the resources the system consumes.

Archive Management:

One of the existing market solutions for this matter is called Enterprise Vault. It is a type of vault that collects emails from the mailbox and leaves a shortcut in their place. On one hand, systems like Outlook and Exchange can work more easily without all that load. On the other hand, the data is accessible to the search index or when the user browses their mail - and the moment they want it, with one click the email pops out of the vault and opens for them in Outlook.

All well and good - except that in our case the system became outdated until support for it was discontinued. Over time, mailboxes lost synchronization with the vault, and some emails remained in the vault and some in the mailboxes.

In place of the emails in the vault, only shortcuts remained in the unsynchronized mailbox.

Therefore, it was decided to end the system’s life. The vault contents need to be exported to PST files, which can then be imported into the mail servers. This part is relatively straightforward - at the Enterprise Vault interface level, export of mailboxes to files is performed, and then a way is found to import the PST.

…And then the problems started.

PST Import:

PST import procedures into mailboxes are designed to prevent duplicates and minimize errors. This option exists both in the desktop Outlook and as a command in the Microsoft mail server command interface - what’s called Exchange Shell. But in certain cases it simply doesn’t work, and the process fails again and again and again. Each failed attempt creates more and more duplicates in the mailboxes. The mailboxes balloon from a few emails that go through the process, but everything breaks in the middle and the data doesn’t transfer.

In some places, every email is of great importance and no data can be lost. And here we have a problem.

The Enterprise Vault server needs to be killed because it’s no longer useful or supported, and also consumes enormous resources. The data must return to the mailboxes and be available to the user.

The task: find a way to import the data into the mailboxes. The import must maintain existing order without losing any data.

Problems to Deal With:

Emails that were in a certain Outlook folder at the time of synchronization to the vault, now reside in a completely different path. If it involves an Outlook shortcut, we must also verify that the import process won’t identify the shortcut-to-vault as a real email. Incorrect identification would prevent the import due to fear of duplication. And we also need the import to retrieve the emails to the exact same location where their shortcut resides - to preserve the current order created by the user.

At every stage - data loss is absolutely not allowed!

The Task:

And so my team leader called me and asked if I could write a script to handle this.

Initially I tried to approach the matter through the mail server command line. But I saw that Microsoft fell short in this area and it doesn’t work. For example, mailbox searches through Exchange Shell don’t allow collecting search results into a variable. If so, it’s impossible to dig into objects properly - they only allow directing results to another mailbox. Reading the content of a PST through Exchange Shell is also not possible. So I didn’t really find an optimal way to collect details and attributes from the content of a PST file. Collecting details is essential in order to perform the necessary comparisons between emails in the live mailbox and the PST archive.

I eventually discovered the MAPI option, which essentially loads an Outlook process through PowerShell. Through this process, it’s possible to work from within PowerShell on every item, folder and object from the mailbox, and process its data.

So after intensive work, I managed to create a process that works and meets all the requirements.

Process Stages:

Preparation:
  1. Create a test/dedicated user for this operation.

  2. Create a Windows server with a reasonable amount of resources - 8 CPU cores, 32GB memory and disk space of up to 2 TB.

  3. Install Office on the server.

  4. In the mail server management interface, grant full permissions to the dedicated user on the mailbox into which we want to import the PST archive.

  5. On the new server, create a mail profile from the old control panel (so it doesn’t try to open Outlook with the dedicated user’s mail). Attach the mailbox being imported to the profile.

  6. Set the profile to full cache mode - the entire mailbox will be loaded into the server via Outlook. Not one year back, not three months - the entire mailbox.

  7. Start Outlook and wait for the complete loading of the entire mailbox into the computer. This can take several hours, until the status bar at the bottom of Outlook shows the message that all folders are updated.

  8. In Outlook settings, go to the Trust Center. There, configure it to never pop up a warning about external applications trying to access Outlook data.

  9. Go to calendar settings and advanced settings, and disable all calendar alerts. No calendar alerts at all.

  10. Copy the PST archive that needs to be imported into the mailbox. Place it in the Documents folder of the user you’re working with (the dedicated user created for this process). The filename should be changed to the user’s name or another convenient name.

Running the Script:
  1. Open PowerShell ISE, and use it to open the script that follows below.

  2. Verify that the script takes an index of the entire mailbox and the entire archive and compares them.

  3. After the comparison, begin transferring all items in the archive for which a matching shortcut was found in the mailbox. Each item moves to the location of its shortcut. Each item is moved not copied, so the archive slowly shrinks and empties.

  4. Next, all items in the archive that have no matching shortcut in the mailbox are transferred. These move to a parallel location in the mailbox - parallel to the location they had in the archive, at exactly that same location on the folder tree. If that path doesn’t exist, the process creates it until the email sits at exactly the same path it had in the archive - except that it now resides in the mailbox rather than the archive.

  5. Run the indexing process again to verify nothing remains. If something remains, run what’s needed again until everything has moved to the mailbox.

  6. Delete the mail profile, the emptied archive file and the files the process created on the desktop. Then repeat the process for another user.

  7. The duplicate removal process will be performed separately.

I worked this way with three servers in parallel and three dedicated users for the process. I managed to import 26 archives with an average size of 15GB in about a week and a half. Most of the mailboxes were large mailboxes containing hundreds of thousands of items.

Summary:

It’s possible the process has gaps. Certainly experienced programmers and scripters will be able to offer improvement and efficiency suggestions to someone as junior as me, and I’ll be happy to receive constructive criticism.

Now I’ll explain the script’s operation and its structure, and why each part was built the way it was. In addition, in the code section that includes almost 300 lines of code plus documentation and explanations, there are also minimal explanations of the operation.

Now, the script follows. After it come instructions for those who want to use it to import problematic PST files.

Finally I will provide detailed breakdowns explaining the script operations and its functions. Recommended for those who want to enrich their PowerShell code knowledge, or for those who want to run the script effectively.

The Script:

<# This script intended to add an archive like PST file, into a Mailbox
In manner to avoid duplicates, the script should add into the Mailbox only items which is not there.
The script is working via outlook, add the PST and compare it with the MailBox.
The script is going through the folder tree of the PST file.
In each folder, going through all items.
For each item, the script checking if it's existing in the parallel folder in the Mailbox. 
If it's not there, check performed to see if it's in deleted items or junk emails. 
If the item isn't it thus 3 locations, it will be copied to the parallel folder in the Mailbox. #>


# Here you put the MailBox of the user, and the path for the PST which compared and loaded into the MailBox.
# =========================================================
$user = 'user@domain.suffix'
$PSTPath = "C:\Users\$env:USERNAME\Documents\username.pst"


<# This function does logging of the procces. printing datails and timestamp on screen and into log file.
Parameters: 
$a, $b: index of the string to print, inside the messages array.
$res: the value described by the messege printed above. #>
function Print-Output {
    param ($a, $b,[Parameter(Mandatory = $false)]$res = $null)
    
    $global:output[$a,$b,$a]
    $res
    Get-Date
    
    $global:output[$a,$b,$a] | Add-Content -Path C:\Users\$env:USERNAME\Desktop\log.txt -Encoding UTF8
    $res| Add-Content -Path C:\Users\$env:USERNAME\Desktop\log.txt -Encoding UTF8
    Get-Date | Add-Content -Path C:\Users\$env:USERNAME\Desktop\log.txt -Encoding UTF8
}


<# This function going through all folders in the folder tree, and for each folder check if there ar subfolders.
If there are subfolders, the function call recursive version of itselfe to run on each subfolder.
If there are no subfolders or the subfolders had been scanned already, than function list all items in the folder.
Lists of folder items is added to global index, according to the root store of the folder.
Parameters:
$currentfolder - represent the current folder which the action conducted whithin, to run on that object.
$rootstore - if the proccess conducted in the PST or in the MailBox. #>
function Index-Folder {
    param ($currentFolder, $rootstore)
    
    if ($currentFolder.name -in $global:exlude) {
        break 
    }
     
    if ($currentFolder.Folders.Count -gt 0) {        
        foreach ($x in $currentFolder.Folders) {
            Index-Folder -currentFolder $x -rootstore $rootstore
        }
    }


    if ($rootstore -eq 'mail') {
        $list = $currentFolder.items | ?{$_.MessageClass -EQ "IPM.Note.EnterpriseVault.Shortcut"} | select senton, subject, MessageClass
        $list | Add-Member -MemberType NoteProperty -Name 'Path' -Value $currentFolder.folderpath -Force
        $global:indexmail +=  $list
        
    } else {
        $list = $currentFolder.items | select senton, subject, MessageClass
        $list | Add-Member -MemberType NoteProperty -Name 'Path' -Value $currentFolder.folderpath -Force
        $global:indexpst +=  $list
    }    
}


<# This function finds the object of specific folder, according to the folder path.
Parameters:
$folder: represent the root of the folder required, wethere is the PST or Mailbox.
$path: hold the entire path of the required folder.
The function return the actual required folder as an object. #>
function Find-Folder {
    param ($folder, $path)


    if ($Folder.Folders.Count -gt 0) {        
        foreach ($x in $Folder.Folders) {
            if ($x.folderpath -eq $path) {
                return $x
            }            
            Find-Folder -folder $x -path $path
        }
    }
}


<# This function verifying the path of an item to transfer. Sometimes an item exists in PST only,
and therefore, doesn't exist the parallel path to transfer the item from it's location in PST to indentical location in the MailBox.
The function going through all folders in the path hyrarchy of the target. if it's not exist, the function will create it.
Parameters:
$folder: the root MailBox folder as an object to explore.
$path: the full path of the target to transfer.
The function return an object of the actual folder, which it's path is the terget path required. #>
function Verify-Path {
    param ($folder, $path)


    $path_arr = $path.split("\")
    $path_arr = $path_arr[3..($path_arr.count)]


    foreach ($x in $path_arr) {
        $check = $folder.Folders | ?{$_.name -eq $x}
        if ($check  -ne $null) {
            $folder = $check
        } else {
            $folder.folders.add($x)
            $folder = $folder.Folders | ?{$_.name -eq $x}
        }
    }
    return $folder
}


<# This function loops over all items remained in PST, when the copy of items parallel with shortcuts in the mailbox is done.
The function running over all folders, copy all items to the parallel path in the MailBox.
Parameters:
$folder: the root MailBox folder as an object to explore. #>
function Copy-Remaining {
    param ($folder)
     
    if ($Folder.Folders.Count -gt 0) {        
        foreach ($x in $Folder.Folders) {
            Copy-Remaining -folder $x
        }
    }


    Print-Output -a 0 -b 11 -res  $folder.folderpath
    $target = Verify-Path -folder $global:mailboxRoot -path $folder.folderpath
    foreach ($x in $folder.items){
        try {
            $x.Move($target)
            $global:pstmoved +=1
        } catch {
            $_.tostring() | Add-Content C:\Users\$env:USERNAME\Desktop\log.txt
            "Can't move item" | Add-Content C:\Users\$env:USERNAME\Desktop\log.txt 
            $log = $x | select senton, subject
            $log | Add-Member -MemberType NoteProperty -Name 'Path' -Value $target.folderpath
            $log | Add-Member -MemberType NoteProperty -Name 'Error' -Value $_.tostring()
            Get-Date | Add-Content C:\Users\$env:USERNAME\Desktop\log.txt
            "========================" | Add-Content C:\Users\$env:USERNAME\Desktop\log.txt 
            $global:logs += $log
        }
    }
}


<# This function copy items from PST, to the location of their shortcut in the MailBox.
Sometimes the general proccess to copy EV items, missing few object and can't find them. 
This function closing the gap.
Parameters:
$ev: generaly represents the ev index, which contains the PST items which had shortcuts in the MailBox.
But in case of errors, you can use $logs veriable to reproccess the copy of all items that failed to copy. #> 
function Copy-RemainingEV {
    param ($ev)


    $shared = @()
    foreach ($x in $ev) {
        $y = $global:indexpst | ?{$_.senton -in $x.senton} | ?{$_.subject -in $x.subject}
        $y | Add-Member -MemberType NoteProperty -Name 'Target' -Value $x.path -force
        $shared += $y
    }


    $count = 0
    foreach ($x in $shared) {
        $folder = Find-Folder -folder $global:pstRoot -path $x.path
        $target = Find-Folder -folder $global:mailboxRoot -path $x.target
        $message = $folder.Items | ?{($_.senton.GetDateTimeFormats())[71] -in $x.senton} | ?{$_.subject -in $x.subject} 
        $message.Move($target)
        $count ++
    }
    "Moved: "
    $count
}


<# This function copy the last few items from PST into the Mailbox, those which the general proccess bypassed.
It's for the remaining items - those which are not in the Mailbox. #>
function Copy-FewPST {
    $count = 0
    foreach ($x in $global:indexpst) {
        $folder = Find-Folder -folder $global:pstRoot -path $x.path
        $target = Verify-Path -folder $Global:mailboxRoot -path $x.Path
        $message = $folder.Items | ?{($_.senton.GetDateTimeFormats())[71] -in $x.senton} | ?{$_.subject -in $x.subject} 
        $message.Move($target)
        $count ++ 
    }
    "Moved: "
    $count
}


# Configuring Outlook MailBox and PST as an available object to work with.
$outlook = New-Object -com Outlook.Application
$namespace = $outlook.GetNamespace("MAPI")
$namespace.AddStore($PSTPath)
$mailbox = $namespace.Stores | ? {$_.displayname -like $user}
$pst = $namespace.Stores | ? {$_.FilePath -like $PSTPath}
# Veriable points to the root folder of the MailBox.
$global:mailboxRoot = $mailbox.GetRootFolder()
# Veriable points to the root folder of the PST Archive.
$global:pstRoot = $pst.GetRootFolder()


#################################### For testing, you can refer the process to a single subfolder


#$global:pstspec = $Global:pstRoot.Folders[6].Folders[1]
#$global:mailspec = $Global:mailboxRoot.Folders[2].Folders[1]


##########################################


$global:exlude = 'RSS Subscriptions', 'Quick Step Settings', 'Sync Issues', 'Conversation Action Settings', 'Yammer Root', 'Recipient Cache'
$Global:count = 0
$evmoved = 0
$global:pstmoved = 0
$global:indexmail = @()
$global:indexpst = @()
$global:logs = @()
$global:final = $null
$failed = @()
$notinpst = @()


# Contains output messages To print on screen and log in file, for monitoring purpes.
$global:output = "===================================", "         Indexing Mailbox", "     Mailbox Indexing Completed", "           Indexing PST", `
"       PST Indexing Completed", "           Index result", "             Mail Items", "             PST Items", "     Archive Sortcuts Number", `
"    Begining Copy EV Proccess", "Target: ", "Copy from: ", "    EV Items Copied Into MailBox", "Begining Copy Remain-In-PST Proccess", `
"    Remained Items Copied Into MailBox", "           Total Copied" 


# =======================================================
#                    The mean process
# =======================================================


# Index the Mailbox, getting the list of all items, and all shortcuts in the Mailbox.
# If the process failed in the middle for some reason, can save the time of indexing and import this data from files.
# In this case, the indexing lines shuld be marked.
Print-Output -a 0 -b 1
#Index-Folder -currentFolder $global:mailboxRoot -rootstore 'mail'
#$global:indexmail | Export-Csv -NoTypeInformation -Encoding UTF8 -Path C:\Users\$env:USERNAME\Desktop\mail.index.csv
$global:indexmail = $null
$global:indexmail = Import-Csv -Path C:\Users\$env:username\Desktop\mail.index.csv
Print-Output -a 0 -b 2
Print-Output -a 0 -b 3


#Index-Folder -currentFolder $global:pstRoot -rootstore 'pst'
#$global:indexpst | Export-Csv -NoTypeInformation -Encoding UTF8 -Path C:\Users\$env:USERNAME\Desktop\pst.index.csv
$global:indexpst = $null
$global:indexpst = Import-Csv -Path C:\Users\$env:username\Desktop\pst.index.csv
#$ev = $Global:indexmail | ?{$_.senton -in $Global:indexpst.senton} | ?{$_.subject -in $Global:indexpst.subject}
#$ev | Export-Csv -NoTypeInformation -Encoding UTF8 -Path C:\Users\$env:USERNAME\Desktop\ev.index.csv
$ev = Import-Csv -Path C:\Users\$env:username\Desktop\ev.index.csv


Print-Output -a 0 -b 4
Print-Output -a 0 -b 5
Print-Output -a 0 -b 6 -res $global:indexmail.Count
Print-Output -a 0 -b 7 -res $global:indexpst.Count
Print-Output -a 0 -b 8 -res $ev.Count
Print-Output -a 0 -b 9  
#<#
# Loop over all EV shortcuts in the mailbox, find the full parallel mail in the PST and move it into the mailbox.
$pathgroup = $ev | Group-Object -Property path
foreach ($group in $pathgroup) {
    
    Print-Output -a 0 -b 10 -res  $group.Name
    $target = Find-Folder -path $group.name -folder $global:mailboxRoot
    $shared = $global:indexpst | ?{$_.senton -in $group.Group.senton} | ?{$_.subject -in $group.Group.subject}
    $pstgroup = $shared | Group-Object -Property path


    foreach ($location in $pstgroup){
        Print-Output -a 0 -b 11 -res  $location.Name
        $folder = Find-Folder -folder $global:pstRoot -path $location.Name
        $count = 0
        While ($count -lt ($location.Count -1)) {
            $portion = $location.Group[$count..($count + 4999)]
            $messages = $folder.Items | ?{($_.senton.GetDateTimeFormats())[71] -in $portion.senton} | ?{$_.subject -in $portion.subject}       
            foreach ($x in $messages){
                try {
                    $x.Move($target)
                    $evmoved += 1
                } catch {
                    $_.tostring() | Add-Content C:\Users\$env:USERNAME\Desktop\log.txt
                    "Can't move item" | Add-Content C:\Users\$env:USERNAME\Desktop\log.txt 
                    $log = $x | select senton, subject
                    $log | Add-Member -MemberType NoteProperty -Name 'Path' -Value $target.folderpath
                    $log | Add-Member -MemberType NoteProperty -Name 'Error' -Value $_.tostring()
                    Get-Date | Add-Content C:\Users\$env:USERNAME\Desktop\log.txt
                    "========================" | Add-Content C:\Users\$env:USERNAME\Desktop\log.txt 
                    $global:logs += $log
                }
                $Count ++ 
            }   
        } 
    }
}   


# Get the Remaining Items in PST which arn't in the mailbox, and move in into the mailbox.
Print-Output -a 0 -b 13
<#
while ($global:pstmoved -lt ($global:indexpst.Count - $ev.Count)){
    Copy-Remaining -folder $global:pstRoot
}#>
Print-Output -a 0 -b 12 -res $evmoved
Print-Output -a 0 -b 14 -res $global:pstmoved
Print-Output -a 0 -b 15 -res ($global:pstmoved + $evmoved)
#>
#Copy-RemainingEV -ev $ev
#Copy-FewPST.

Instructions and Tips:

Correct Setup and Workflow:
  1. At the start, always remember to enter the email address of the mailbox you’re working on, and the path to the archive file being imported.

  2. As can be seen in the code, in addition to the documentation lines there are actual code lines marked with a hash sign or block comment markers (<# #>) that tell the computer to ignore them as code. This all stems from the fact that the process is too long and complex to run all at once. In the vast majority of cases this won’t work - the process will get stuck in the middle and not continue.

  3. The process is essentially a workaround that rides on top of Outlook through PowerShell. Both programs were not designed to process such amounts of data at once. Therefore the code needs to be run in sections, and sometimes re-runs are needed.

  4. On the first run it’s recommended to only collect indexes. Collect an index of the mailbox, the archive, and the processed index that produces an index of the mailbox’s shortcuts that have a corresponding implementation in the archive. All of these go out to files sitting on the desktop. Remove the hash at the beginning after the output function call with the call to print string number 9. This will mark the entire main process as lines to be ignored, and the process will stop after collecting the indexes and exporting them to files.

Process Stalls:
  1. After every run cycle it’s recommended to restart the server before another run. This resets and clears the memory - which is the first source of process stalls.

  2. So, after collecting an index and exporting, restart. After the restart, verify that the lines generating the index are marked with hash signs. In their place, only import from the files already sitting on the desktop.

  3. If the process gets stuck in the middle, check if the disk is full. I expanded the servers I worked with to 2TB disk size. The process spreads out the objects it traverses, from the archive on the way to the mailbox. These are stored in a very expanded form - temporarily - in Outlook’s cache file inside the Appdata folder. If you see that space has run out, expanding the disk size will allow it to continue running.

  4. If the process stalls due to a RAM issue or something else that’s not visible, restart the server. Then, before another run, it’s advisable to delete the archive and EV index files. When recreating them you’ll see that the numbers are smaller, because all the files that managed to transfer have been subtracted from the count.

Saving Runtime:
  1. There’s almost never a reason to take a fresh index of the mailbox. Once is enough and sufficient - don’t waste time.

  2. Each time you run the process, pay attention to which indexes you’re importing from the existing files and which you’re regenerating. The markings on these lines are important, and you need to know what you’re doing.

  3. The section calling the function that moves items without shortcuts in the mailbox is not marked for nothing. Even if the process finished running without stalling, it’s possible that not everything was copied. So another run is needed. Copying items that have a shortcut in the general copy doesn’t necessarily bring them to the folder where their shortcut is located.

  4. At the end of a process, the archive file should be empty (sometimes a few runs and marking changes are needed to get there). When running the process again, the archive index should show the number 0 - because it couldn’t find any item to add to the index.

Code Analysis:

First Function - Output and Control:

Any process somewhat longer than a 50-line script needs some level of control - to verify that everything is working properly. If the process is long and takes time, it’s useful to occasionally get an indication of what stage the process is at. Additionally, when there are errors, you need to know what they are and where they got stuck.

Therefore the first function is a function that the process calls at various stages. At each such stage it throws a certain output - both to the screen and into a log file. The output gives a timestamp and details about what’s happening.

Second Function - Index:

For every comparison and check between the mailbox and the archive, we need to know what’s in the mailbox and what’s in the archive. Therefore the first stage is to create a lightweight index of the mailbox and archive contents - so that we can locate emails by certain attributes.

The process is carried out using a recursive function that traverses all folders in the mailbox (or archive). Recursive, because a check needs to be performed in every folder whether there are sub-folders, and if there is a sub-folder, to verify that it too has no sub-folder. After the check for whether sub-folders exist or not, all items in the folder are traversed and several of their attributes are collected. The attributes to collect:

  • Sent date - accurate to hundredths of a second, and can serve as a unique identifier for each item.
  • Subject/title of the message/item - adds another attribute for checking uniqueness and identification of the item.
  • Type of item - to distinguish between a real item and a shortcut to the vault.
  • For each item, an attribute is added - the full path of the folder where the item resides.

Initially the recursive function was calling an additional function at each folder to perform the actual indexing process. As part of optimizing the process and shortening runtime, I simply inserted the indexing process at the end of the recursive function - after the part where it checks if there are sub-folders.

Additionally, I decided it was unnecessary to collect data on all items in the mailbox. It’s sufficient to only collect an index of vault-shortcuts in the mailbox. After all, the mailbox data only serves me to know where to transfer the items that have shortcuts.

The recursive function receives two parameters, and can emit (not return) one of two global variables. The parameters are:

  • The folder in which the indexing process is being conducted.
  • The primary source being indexed - whether it’s the mailbox or the archive.
    According to the source, the function feeds the collected index data into one of two global index variables: archive index, or mailbox index.

Third Function - Finding a Folder:

The process operates using an index. Otherwise the process duration would be endless - if it managed to work at all, without crashing when loading all objects into memory.

Therefore in many cases the process knows where a specific item resides - according to the index. Then the item itself needs to be handled, and for that the actual folder object in which the item resides needs to be found.

The third function receives two parameters:

  • A primary source to search in - mailbox or archive.
  • A path along which the search is conducted until the folder itself is found.

The function returns the requested folder as an object.

Fourth Function - Path Verification:

As mentioned above, the archive also contains components that don’t exist as shortcuts in the mailbox - either because they were deleted in the past or for other reasons. Since there’s no shortcut in the mailbox, these items move to the mailbox at the same location they were in the archive. And if that location path doesn’t exist, the function creates the path up to the last folder in it - the folder to which the item should be transferred.

The function receives as parameters:

  • A specific folder to start searching within.
  • A path at the end of which the target folder must be found - or the path created until reaching the folder at the end of it.

The function returns as an object the folder at the end of the path - from within the mailbox.

Fifth Function - Copying Remaining Items:

Another recursive function whose purpose is to traverse a folder tree - usually the archive. The operation checks every folder in the tree and finally performs an operation on that folder.

The function traverses the folder tree and at each folder checks whether there are sub-folders. If there are, the recursive function is invoked on each sub-folder. After the check, a call is made to the fourth function - path verification. The folder returned from the function is marked as the target for transferring the items from the current folder. The folder returned from the fourth function should be the counterpart in the mailbox to the current folder from the archive.

Sixth Function - Remaining Shortcut Items:

The general item transfer process works in groups and at large scale. For various reasons it misses individual items here and there. In the end, a few items remain (this could also be 100, but not something that would pass through the regular process), and this function ensures those get transferred too.

The function receives an index as a parameter - usually the index of shortcuts in the mailbox. It then goes through every item among those remaining in the shortcut index, finds its target in the archive, and copies it to the location of the shortcut in the mailbox.

Warning!

Do not attempt to copy the entire archive through this function - it will take too long. The general process is designed to do this efficiently.

Seventh Function - The Few Remaining in the Archive:

The regular process for transferring items from the archive to the mailbox should also go through every item in every folder. In practice it sometimes misses a few. The seventh function is for those.

Its method differs from the general process. Instead of going through every folder and emptying it of items, the function goes through the archive index. For each item in the index it finds the actual object in the archive and transfers it to the mailbox.

Warning!

Do not attempt to copy the entire archive through this function - it will take too long. The general process is designed to do this efficiently.

The General Process:

Preparation:

  • Initially two variables are defined - the user to whom the target mailbox belongs, and the path to the archive file.
  • The general process creates a minimized Outlook process as a variable. The target user’s mailbox and the archive file are loaded into this process.
  • The process sets two variables as the root folders - the mailbox root and the archive root. These folders serve as the source for accessing the folder tree and objects.
  • Additional variables required for the process are defined - such as folder exclusions from the work process, various counters, global variables for the index, for error collection.
  • A string system is defined for output messages. These appear on screen and go to the logs, to mark the various stages of the process.
  • A call to the second function - creating an index of all shortcut items in the mailbox.
  • A call to the second function - creating an index of all items in the archive.
  • Creating a variable containing an index of all shortcuts that have a corresponding implementation in the archive. From this point on, this index serves as the measure of the mailbox contents; the rest is not relevant.
  • Each index goes out to a CSV file and then is re-imported back into the process. The reason: the date format changes when exported to a file. There’s no reason to repeat the mailbox indexing process more than once. And if the process stalls, the index is still available. On a subsequent run, import the index from the file and save the time it takes to create it. But the date format of an index returned from a file doesn’t match the format in a live index just collected. Attempting to compare them will produce a negative result every time, even though it’s the same point in time. Therefore all indexes go through the file-import format, to create a uniform template. The gap between the index template and the live items template in the folders is accounted for when looking for items in the folders.

Sorting and Processing:

  • Sorting into groups of all shortcuts in the mailbox, sorted by path.
  • Going through each of the groups created. Each group essentially creates one transfer target. To that target, all items whose shortcuts reside in the same folder need to be copied. This saves searching for the target folder for each item again - called once to the third function and the folder is obtained.
  • In each such group of shortcuts, comparing against the archive index to find all items representing the implementation of the shortcuts.
  • Dividing all matching items into groups according to the path where the implementation resides in the archive - so each group contains one source folder.
  • For each such group, calling the third function to set a source folder from which to copy all the requested items.
  • Collecting all items in the folder represented by the group in question, and transferring them one by one to the target folder.

Dividing Large Groups:

  • The issue is that in large folders there may sometimes be tens of thousands of items - for example, folders like inbox, outbox or deleted items. This burdens the memory and stalls the process until it stops in the middle without warning. Therefore a mechanism was created that divides the items in a group into fixed portions - can be 5,000 or even 1,000.
  • Collection of the actual items parallel to the index items in that fixed portion is performed, and those are transferred.

Error Management:

  • The process is performed in a try-catch fashion. If the try succeeds, good. If not, the catch defines an action in case of error.
  • The catch saves the index of the item that failed to transfer, then adds to its attributes and the operation log file the error that prevented the transfer. In addition there is a timestamp.
  • If the transfer succeeded, the transfer counter marks that another shortcut transfer completed successfully.

The Last Part of the Code:

  • The loop thus goes through all target groups, and within each target group, through all source groups. Within each large source group, there’s a loop that goes through all the portions dividing the group - until the process ends.
  • After handling the shortcuts is complete, a call is made to the function that transfers items that don’t implement shortcuts.
  • Providing output to the screen and documentation about the items that were copied.
  • Option to call the functions that transfer the remaining items.
  • Enterprise Vault
  • Exchange
  • Outlook
  • PowerShell
  • System