Wednesday 13 April 2011

System.Net.HttpWebRequest and PowerShell, part 1

Consider this scenario: You have a system written in .NET which produces reports in some well defined format: Excel, Access, etc.  To download these files, you fire up an IE window via the Internet.Application API, login to the application, navigate to the report and use the Click() method of the Button or Image element to fire off a post request.  This ensures that the hidden fields a .NET application may produce, such as the VIEWSTATE, as well as cookies, are preserved.

This works well, with the caveat that the user of the machine gets prompted for where to save this file.  You want to automate the download of the file to save it in an already defined location, to save the time waiting for the user to save the file and close the window.

You check the documentation for IE's automation API, only wind up disappointed.  You cannot automate the saving of the file via the InternetExplorer.Application interface, or any other interface that allows access to the mshtml API.  What then to do?

Well, luckily Microsoft's .NET API contains a class known as System.Net.HttpWebRequest, which allows you to send GET or POST requests to a server, and to manage cookies.  So if you can parse the HTML returned by this API, and extract the hidden input fields described above to repost to the server, you can read back the file that the server returns and save it.

To do this in Powershell however, requires some understanding of .NET. Lets take a look at a simple function to return a HTML response from a server.

function Get-HTML-Response {
  param($uri,$cookiejar);
  $req=[System.Net.HTTPWebRequest]::Create($uri);
  $req.CookieContainer=New-Object System.Net.CookieContainer;
  $req.Method='GET';
  if ($cookiejar -ne $null) {
    $req.CookieContainer=$cookiejar;
  }
  $req.UserAgent='Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)';
  $resp=$req.GetResponse();
  $strm=$resp.GetResponseStream();
  $sr=New-Object System.IO.Streamreader($strm);
  $output=$sr.ReadToEnd();
  return @($req,$resp,$output);
}

The .NET 2.0 API allows you to create a HTTPWebRequest object directly from a URI via the Create method.  All we need to do is setup the User-Agent, method, and cookie store; then read back the response and return all the information to our script.

Note that this function does not deal with exception handling... if you want to do this (and you should care about exception handling), the documentation for the exceptions the above functions can return are available via the MSDN website.

To ensure that we can provide continuity between requests, we provide the option to pass a CookieContainer containing Session Info to the function, to communicate through to the server.

The response returned by GetResponse contains header information, including the status code returned by posting to the server.  You'll want to check this to ensure 200/OK status.  The request object's cookies are automatically updated by the process of submitting the request and returning the response.

Calling GetResponseStream on the request object, on checking it for consistency, will return a Stream object containing any textual output from the server (the server response or page content).  You  need to read this stream in using a Streamreader object or pass the stream to some other object that will deal with the information contained in it.  In this instance, we just return the output.

In the next installment, I'll discuss POST requests, and finally, HTML parsing and interpretation of the data.

No comments:

Post a Comment