Modularity and Scalability

And a script to update a distribution group based on criteria

In the last post, Before you automate, I had promised that I will give an example of how modularity is important for scalability. I had mentioned that modularity leads to simplified scalability. I had also said that applying a little thought to the automation solution well in advance would save us a lot of effort. This way, we would be writing less and doing more.

For instance, I once received a request create a script to update a distribution group every day, based on data from a system that the client used, in order to automate some parts of user account provisioning, called Oracle Identity Manager, or OIM. This application spoke to the application used by the Human Resources team, and created a CSV feed file with a bunch of necessary information. The requirement was simple:

  1. Open the feed and check which employees are active in the system.
  2. See which employees are inactive in the system.
  3. Add the mailboxes which got added to the system newly.
  4. Remove the employee mailboxes that are terminated.

Very straightforward, right? The group was to be called, say, All Contoso. The CSV feed was situated at \\server\e$\share\HR_OIM_Update.csv. Here is the basic flow you would use to handle this request:

  1. Import the AD module.
  2. Import the contents of the feed file into a variable.
  3. List out active members using the column, say, CNT_Status; active accounts will contain the value, Active.
  4. List out the inactive members—everyone whose CNT_Status value is not Active.
  5. List out the current members of All Contoso.
  6. Check if the active members are part of the group; if not, add them to the group.
  7. Check if any of the members of the group is not part of the active accounts, or if any of the members of the group does not exist in the list at all; remove these members from the group.

No rocket science1. You would be tempted to write a script like this:

try {
    Import-Module ActiveDirectory -ErrorAction Stop
}
catch {
    Write-Error 'Unable to import the ActiveDirectory module'
}
try {
    $Members = Import-Csv -Path '\\server\e$\share\HR_OIM_Update.csv' -ErrorAction Stop
}
catch {
    Write-Error 'Unable to work the input file'
}

$FeedActive = ($Members | Where-Object CNT_Status -eq 'Active' | ForEach-Object { Get-ADUser $PsItem.User_Name }).SamAccountName
$FeedTermed = ($Members | Where-Object CNT_Status -ne 'Active' | ForEach-Object { Get-ADUser $PsItem.User_Name }).SamAccountName
$CurrentMembers = (Get-ADGroupMember -Identity 'All Contoso').SamAccountName

foreach ($MemberAdded in $FeedActive) {
    if ($MemberAdded -notin $CurrentMembers) {
        try {
            Add-ADGroupMember -Identity 'All Contoso' -Members $MemberAdded -Confirm:$false -ErrorAction Stop
        }
        catch {
            Write-Error "Error adding $MemberAdded to All Contoso"
        }
    }
}

foreach ($MemberRemoved in $CurrentMembers) {
    if (($MemberRemoved -notin $FeedActive) -or ($MembersRemoved -in $FeedTermed)) {
        try {
            Remove-ADGroupMember -Identity 'All Contoso' -Members $MemberRemoved -Confirm:$false -ErrorAction Stop
        }
        catch {
            Write-Error "Error removing $MemberRemoved from All Contoso"
        }
    }
}

Thirty-eight lines. Fifteen minutes to write it. You write it, explain the script to the requestor, and implement it using a change request.

A little twist to the story

All right: one feed, one group, one script. Problem solved. However, what if there were four groups that were to be updated using four different feeds? It is quite fair to think you could go to the end of the current script, add a couple of empty lines, copy-paste the contents of the entire script file, make modifications to the name and the UNC path of the feed, and you are done. You could do this two more times for the two other feed – group combo. Of course, now you have 152 lines of script, but hey, only a fourth of it is original. The rest simply follow the same model.

There is nothing crazy about that approach. I have seen scripts like that, which many enterprises seem to accept, of course, primarily because most clients do not have administrators who actually know PowerShell. And let’s face it, many of those who are “comfortable” with PowerShell are those who have a basic working overview of PowerShell, and don’t want to touch a block of code that works perfectly. Therefore, many would be more than happy to copy-paste the code block thrice in the script and make little changes to the path and the group name.

Those who understand PowerShell—and know how to create modular code—would probably write a script like this:

function Update-DlFromFeed {
    param(
        # The path to the OIM file
        [Parameter(Mandatory=$true)]
        [String]
        $FeedFilePath,

        # The name of the DL you'd like modified
        [Parameter(Mandatory=$true)]
        [String]
        $GroupName
    )
    begin {
        try {
            Import-Module ActiveDirectory -ErrorAction Stop
        }
        catch {
            Write-Error 'Unable to import the ActiveDirectory module'
            break
        }
        try {
            $InputPath = "$env:TEMP\input.csv"
            Copy-Item -Path $FeedFilePath -Destination $InputPath -ErrorAction Stop
            $Members = Import-Csv -Path $InputPath -ErrorAction Stop
        }
        catch {
            Write-Error 'Unable to work the input file'
            break
        }
    }
    process {
        # Listing active employees
        $FeedActive = ($Members | Where-Object CNT_Status -eq 'Active' | ForEach-Object { Get-ADUser $PsItem.User_Name }).SamAccountName
        # Picking members who are not active (are termed)
        $FeedTermed = ($Members | Where-Object CNT_Status -ne 'Active' | ForEach-Object { Get-ADUser $PsItem.User_Name }).SamAccountName
        # Listing current members
        $CurrentMembers = (Get-ADGroupMember -Identity $GroupName).SamAccountName

        # Listing out the members to be added
        foreach ($MemberAdded in $FeedActive) {
            if ($MemberAdded -notin $CurrentMembers) {
                try {
                    Add-ADGroupMember -Identity $GroupName -Members $MemberAdded -Confirm:$false -ErrorAction Stop
                }
                catch {
                    Write-Error "Error adding $MemberAdded to $GroupName"
                }
            }
        }

        # Listing out the members to be removed
        foreach ($MemberRemoved in $CurrentMembers) {
            if (($MemberRemoved -notin $FeedActive) -or ($MembersRemoved -in $FeedTermed)) {
                try {
                    Remove-ADGroupMember -Identity $GroupName -Members $MemberRemoved -Confirm:$false -ErrorAction Stop
                }
                catch {
                    Write-Error "Error removing $MemberRemoved from $GroupName"
                }
            }
        }
    }
}
Update-DlFromFeed -FeedFilePath '\\server\e$\share\HR_OIM_UpdateOne.csv' -GroupName 'Test All Contoso One'

You may say, ‘That is an additional 28 lines of code; you almost doubled the size of the script!’

Yes, however, the majority of these 28 lines2 are either almost second nature to us, or are snippets, such as defining a function, defining its parameters, writing hint text, etc., and strategic blocks such as begin and process. Maybe some comments here and there. The difference is not major.

Here is a gist of how we do it: First, we define the function. We specify the parameters—basically, the function needs two things (these are the two things we changed when making the four copies): the UNC path to the feed, and the name of the group. Therefore, these would be the parameters.

Next, we write two important statements in the begin block: importing the AD module, and importing the contents of the CSV. These are written in the begin block with a break in the catch block, so that the function execution stops if either of these tasks end in an error. The rest of the body is almost identical, with the last line being the most significant difference: the function is called along with the UNC path and the group name.

Now, if you wanted to perform the same action on three more groups using three more feeds, all you would have to do is add three more lines to the end!

Update-DlFromFeed -FeedFilePath '\\server\e$\share\HR_OIM_UpdateTwo.csv' -GroupName 'Test All Contoso Two'
Update-DlFromFeed -FeedFilePath '\\server\e$\share\HR_OIM_UpdateSix.csv' -GroupName 'Test All Contoso Six'
Update-DlFromFeed -FeedFilePath '\\server\e$\share\HR_OIM_UpdateTen.csv' -GroupName 'Test All Contoso Ten'

The next level

Now to the litmus test.

Imagine that you received a request to extend the capabilities of the script, in such a way that the script uses the same feed file, but updates eleven distribution lists, based on what value a certain column contains for a certain user.

For instance, let us consider the column is called Loc_Code, which may contain values such as UTAH, or MSCT, or TEXS—these should be three different groups. Imagine seven more of these. Finally, one of the eleven groups should contain all the active members that are part of the feed file.

Try handling this request with the monolith (now, 440 lines). What happens when the UNC path of the file changes? What happens if one of the functionalities used by the function is changed at some point in the future? How would you fix errors?

If your script were modular, all you would have to do is:

  1. Create two parameter sets: one with the filter string and one without.
  2. Assign parameters to the parameter sets.
  3. Add a single branching block to perform the conditional filtration.

Seriously, that is all you would have to do. No Ctrl+V, no Ctrl+H. The best part is, your script will be backward-compatible, meaning, nothing changes to the current functionality—you can continue to call the script with just the UNC path and the name of the group, and things will work as smoothly as before. However, if you choose to also throw in the column name and the filter string, that will work as well.

Here is the modified script:

function Update-DlFromFeed {
    [CmdletBinding(DefaultParameterSetName = "NoFilter")]
    param(
        # The path to the OIM file
        [Parameter(Mandatory=$true, ParameterSetName='NoFilter')]
        [Parameter(Mandatory=$true, ParameterSetName='Filter')]
        [String]
        $FeedFilePath,

        # The name of the DL you'd like modified
        [Parameter(Mandatory=$true, ParameterSetName='NoFilter')]
        [Parameter(Mandatory=$true, ParameterSetName='Filter')]
        [String]
        $GroupName,

        # Column name
        [Parameter(Mandatory=$true, ParameterSetName='Filter')]
        [string]
        $ColumnName,

        # Filter string
        [Parameter(Mandatory=$true, ParameterSetName='Filter')]
        [string]
        $FilterString
    )
    begin {
        try {
            Import-Module ActiveDirectory -ErrorAction Stop
        }
        catch {
            Write-Error 'Unable to import the ActiveDirectory module'
            break
        }
        try {
            $InputPath = "$env:TEMP\input.csv"
            Copy-Item -Path $FeedFilePath -Destination $InputPath -ErrorAction Stop
            $Members = Import-Csv -Path $InputPath -ErrorAction Stop

            if ($FilterString) {
                $Members = $Members | Where-Object $ColumnName -eq $FilterString
            }
        }
        catch {
            Write-Error 'Unable to work the input file'
            break
        }
    }
    process {
        # Listing active employees
        $FeedActive = ($Members | Where-Object CNT_Status -eq 'Active' | ForEach-Object { Get-ADUser $PsItem.User_Name }).SamAccountName
        # Picking members who are not active (are termed)
        $FeedTermed = ($Members | Where-Object CNT_Status -ne 'Active' | ForEach-Object { Get-ADUser $PsItem.User_Name }).SamAccountName
        # Listing current members
        $CurrentMembers = (Get-ADGroupMember -Identity $GroupName).SamAccountName

        # Listing out the members to be added
        foreach ($MemberAdded in $FeedActive) {
            if ($MemberAdded -notin $CurrentMembers) {
                try {
                    Add-ADGroupMember -Identity $GroupName -Members $MemberAdded -Confirm:$false -ErrorAction Stop
                }
                catch {
                    Write-Error "Error adding $MemberAdded to $GroupName"
                }
            }
        }

        # Listing out the members to be removed
        foreach ($MemberRemoved in $CurrentMembers) {
            if (($MemberRemoved -notin $FeedActive) -or ($MembersRemoved -in $FeedTermed)) {
                try {
                    Remove-ADGroupMember -Identity $GroupName -Members $MemberRemoved -Confirm:$false -ErrorAction Stop
                }
                catch {
                    Write-Error "Error removing $MemberRemoved from $GroupName"
                }
            }
        }
    }
}

How much did I actually change in the function? Fourteen lines, most of which were used up by the param () block. The benefit, though, is that a new filtration has been added to it, which is also equally scalable. Here is a visual of all of the modifications:

Diff between the two stages of scaling

Now, structure the function calls the following way. And of course, it is not necessary that you make manual calls like this. You could create a CSV or a JSON for the purpose, with this information, and write a wrapper function to call Update-DlFromFeed.

function main {
    $FeedFilePath = '\\server\e$\share\HR_OIM_Update.csv'

    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'All Contoso'
    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'Contoso MSCT' -ColumnName 'Loc_Code' -FilterString 'MSCT'
    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'Contoso UTAH' -ColumnName 'Loc_Code' -FilterString 'UTAH'
    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'Contoso WICN' -ColumnName 'Loc_Code' -FilterString 'WICN'
    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'Contoso CHGO' -ColumnName 'Loc_Code' -FilterString 'CHGO'
    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'Contoso PNVN' -ColumnName 'Loc_Code' -FilterString 'PNVN'
    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'Contoso TEXS' -ColumnName 'Loc_Code' -FilterString 'TEXS'
    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'Contoso MNST' -ColumnName 'Loc_Code' -FilterString 'MNST'
    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'Contoso NYRK' -ColumnName 'Loc_Code' -FilterString 'NYRK'
    Update-DlFromFeed -FeedFilePath $FeedFilePath -GroupName 'Contoso OHIO' -ColumnName 'Loc_Code' -FilterString 'OHIO'
} # DO NOT MAKE CHANGES beyond this point

function Update-DlFromFeed {
  # Function body from above
}

main # Call the main function - DO NOT DELETE THIS LINE

I have defined a main function in the beginning, and the call to main has been placed at the end of the script. This way, the configuration appears right in the beginning. Any and all changes that non-PowerShell-savvy sysadmins or non-sysadmins would need to make would be here. The actual function does need not be touched at all, nor would anyone have to scroll to the bottom to make changes to the function calls. If more groups need to be added with or without filters, just add more lines with the appropriate parameters and their values in the main function.

If this function is packaged in a module file, administrators could simply call the function like a cmdlet and use tab-completion or Get-Help to use the function without the need for your intervention.

I hope this post helps demonstrate the importance of loosely-coupled modular way of writing scripts. Remember, one of the reasons you automate things is because you care about efficiency. Efficiency is all about getting more done by doing less. Your scripts should exemplify this principle.

  1. At this point, it may seem like a no-brainer to empty the entire group and add the active members alone to the group. However, understand that such feeds contain thousands of employee records, but the number of changes in a day would be one or two. It makes more sense to only perform update operations to it rather than nuke-and-pave it. 

  2. These include empty lines as well. 

Want to learn PowerShell?

The award-winning book, PowerShell Core for Linux Administrators Cookbook, which I co-authored, uses the recipe-based learning approach to give you a deep understanding of PowerShell. And the best part is, the concepts discussed work across platforms!

The best new PowerShell books

powered by TinyLetter