Pipeline processing
Another way that functions in PowerShell are different from other languages is how they interact with the pipeline. Functions in PowerShell output objects that can be picked up as input to subsequent functions or cmdlets. This in and of itself is not unique, but the timing of the output is different from other languages.
Consider the process of obtaining a listing of all of the files on a drive. For some drives this operation will be very lengthy, possibly taking several minutes to complete. In PowerShell though, the list of files begin to appear almost instantly. The process can take a while to be complete, but the function (Get-ChildItem
) will output file and folder objects as each directory is scanned rather than waiting to have all of them collected in a list and returning the list all at once. This feature of built-in cmdlets is something that might easily be taken for granted, but when writing functions, the concept of a return value needs to be carefully considered.
Although PowerShell includes a return
keyword, the use of return
is optional and omitting it is even considered by some to be a best practice. PowerShell functions will return a value that is included after a return statement, but that is not the only kind of output in a function.
Note
The rule for function output is simple. Any value produced in a function that is not consumed is added to the output stream at the time the value is produced.
Consuming a value can be accomplished in many ways:
- You can assign the value to a variable
- You can use the value in an expression, as an argument to a cmdlet, function, or script
- You can pipe the value to
Out-Null
- You can cast the value as
[void]
With that understanding, consider the following PowerShell functions, which all return a single value:
function get-value1{ return 1 } function get-value2{ 1 return } function get-value3{ 1 } function get-value4{ Write-Output 1 }
The first function uses the traditional method to output the single value and end the execution of the function. The second includes the value as an expression that is not used and is thus added to the output stream. The return statement in the second function ends the execution of the function, but does not add anything to the output stream. The third function outputs the value just as the second did, but does not include the superfluous return statement. The final version uses the Write-Output
cmdlet to explicitly write the value to the output stream. The important point to understand is that values can be output from a function in more places than just the return statement. In fact, the return statement is not even needed to output values from a function.
When writing a function, it is extremely important to ensure that the values that are produced in the process of executing are consumed. In most cases, values will be consumed by the natural activities in your function. However, sometimes values are produced as a side effect of activities and make it into the output stream inadvertently. As an example, consider the following code:
function Write-Logentry{ param($filename,$entry) if(-not (test-path $filename)){ New-Item -path $filename -itemtype File } Add-Content -path $filename -value $entry Write-Output "successful" }
The intent of the code is to create a file if it doesn't exist, and then add text to that file. The problem comes in when the file is created. The New-Item
cmdlet writes a FileInfo
object to the output stream as well as creating the file. Since the code doesn't do anything with that value, the FileInfo
object from the New-Item
cmdlet is part of the output of the function. It is very common on Stack Overflow to see PowerShell questions that involve this kind of error. Using New-Item
(or the mkdir
proxy function for New-Item
) is often the source of the extraneous object or objects. Other sources include the Add() methods in several .NET classes, which in addition to adding items to a collection, also returns the index of the newly added item.
Pinpointing this kind of error in a function is often confusing because the error message will almost never indicate that the function is the problem. The error message will be downstream from the function where the output is used. In the Write-Logentry
example function, instead of a value of "successful"
, the output could be an array of objects containing a FileInfo
object and the value "successful"
. Trying to compare the result for a good value might look as follows:
At first look, errors will seem like nonsense. What does PowerShell mean that it can't find a method called StartsWith()
? Looking at the type of the variable shows that instead of the string that is expected, it contains an array, as shown in the following screenshot:
Similar errors will occur when expecting a numeric result and trying to do calculations on the value. Doing calculations with arrays is probably not going to work, and if it does, it will not work as intended.
It is possible to collect all of the values that are to be output in a variable and wait to output the values until the end of the function, but this is not recommended. One simple reason is that this requires the function writer to keep track of all of the objects and to have memory allocated for the entire collection. Using the output stream naturally, that is, writing to the output stream as objects become available, allows downstream PowerShell cmdlets to work with them while the rest of the objects are being discovered.
The following is a practical example:
function find-topProcess{ param([string[]]$computername) $computername | foreach{ Get-WmiObject Win32_Process | sort-object WorkingSetSize -Descending | Select-Object -first 5 PSComputerName, Name, ProcessID, WorkingSetSize } }
This function takes a list of computer names and outputs the five processes on each computer that use the highest amount of memory. It could have been written as the following:
function find-topProcessBad{ param([string[]]$computername) $processes=$() $computername | foreach{ $processes+=Get-WMIObject Win32_Process | Sort-Object WorkingSetSize -Descending | Select-Object -first 5 PSComputerName, Name, ProcessID, WorkingSetSize } return $processes }
The final output would be the same, in that the same values would be returned in the same order. On the other hand, the way the output is seen by downstream pipeline elements is very different. In the first case, as each computer is scanned, the list of processes is sent to the output stream and then the next computer is considered. There is no local storage in the function at all. In the second case, the processes from each computer are appended to a list. The downstream pipeline elements won't see any output from this function until all of the computers have been scanned. If only a few computer names are being passed in, there is little difference. But if the list is hundreds or thousands of names long, or if the network latency is high enough that it takes a long time to get each set of results, it may be several minutes until any output is delivered. If the function is being called at the command line, it may not be obvious that anything is happening.
Another implication of the second example is that all of the objects need to be stored in a list in memory. This example used a small list of small objects (five) so the effect might not be seen. If the function returned all processes with all of the properties associated with those objects, the memory usage would be quite high. Memory allocation times will also factor into execution time as well.
Tip
To help keep memory usage and execution time lower, try to write objects to the output stream immediately rather than storing them in a collection to be returned all at once.