Friday 4 November 2011

Shell/Internet Explorer Automation in C#

It seems to me one of the great travesties of programming that a language with the power of C#, with all its LINQ goodness, nullable types and features such as the yield keyword, has such poor support for the COM architecture as compared to say, Visual Basic or Powershell. This lack of support is, of course to some extent by design: Visual Basic developed alongside the COM architecture while C# is relatively speaking a newcomer to the scene. Powershell of course manages its rich support of COM objects thanks to deep voodoo such as the use of IUnknown, type libraries, and so on. But what about C Sharp?

One of the recommended ways of COM interop in C# development involves using Interop assemblies generated by programs such as tlbimp or AxImp. However, for people working on a restricted environment where such tools are not available, no way exists to translate the functions the COM interface provides into .NET classes. So what to do then?

The incomplete answer, of course, is: use reflection. Our intent being to provide information to the development community at large, it is worthwhile providing a little insight into how this can be done.

The first thing to do is to create the COM object.  Because we will manipulating this via reflection, we start by assigning it to a simple System.Object, like this:
object ShellApp=Activator.CreateInstance(Type.GetTypeFromProgID("Shell.Application"));
Activator is in the System namespace, like Type.  If we look at the definition for CreateInstance we see the Type definition returned from GetTypeFromProgID contains the information required to instantiate the COM object.  This is essentially the C# equivalent of VB's CreateObject functionality.  From here, reflection is used to access the methods.

It is useful to use Powershell in exposing the COM interfaces, and COM members can be found via the Get-Member cmdlet.  But do not rely on it for everything: for instance, the FireEvent method on several classes (such as mshtml.HTMLSelectElementClass, etc), should actually be called as fireEvent (with a lower-case f). However, Powershell exposes the member as FireEvent.  Attempting to call this with reflection results in a name not found exception.  So when in doubt, check the relevant msdn documentation for the interface in question.

With that out of the way, lets begin.  The majority of your calls to the COM interface will look like one of the following functions (which I recommend you re-use and bundle within a class if you're going to be using them constantly):


        object GetCOMProp(Object sourceObj,String propName,object[] Param) {
            try {
                return sourceObj.GetType().InvokeMember(propName,BindingFlags.GetProperty,null,sourceObj,Param);
            } catch (Exception e) {
                Console.WriteLine("Could not get property " + propName + " on " + sourceObj.GetType().FullName + ": " + e.ToString());
                return null;
            }
        }

        object SetCOMProp(Object sourceObj,String propName,object[] Param) {
            return sourceObj.GetType().InvokeMember(propName,BindingFlags.SetProperty,null,sourceObj,Param);
        }

        object CallCOMMethod(Object sourceObj,String methodName,object[] Param) {
            return sourceObj.GetType().InvokeMember(methodName,BindingFlags.InvokeMethod,null,sourceObj,Param);
        }


You'll notice these methods are very similar.  Firstly, they call GetType() on the object  to obtain type information.  This could be a COM object, or a CLR class exposed via COM, or something else.  Then, InvokeMember is used - we do not use GetProperty or GetMethod or any other such methods for COM, since usually your object type will be System.__ComObject - a type which has no method or property information exposed on a per COM basis.

The method or property name is usually case sensitive.  To return a property, the binding flags are set to GetProperty; to set a property, SetProperty; to call a Member which has parameters, use InvokeMember.

Once you have your object returned, you need to cast it to whatever type you require.  Often, this will be another object, in order to call a subproperty.  However it can also be a String, or some other value.

If you are expecting a function to return a non nullable type such as int or bool, you should cast to the related nullable type, ie bool? or int?.  This is because it is possible for InvokeMember to return null for various reasons.

Now that you have this out of the way, your implementation becomes a simple matter of invoking methods within try blocks, checking for nulls, and utilizing the very ample API documentation available both online and through the use of Powershell.

Here is an example function:
        public void ClickElementByName(object myWin,String elementName) {
            if (myWin == nullreturn;
        
            object[] noparm=new object[] {};
            object myDoc=GetCOMProp(myWin,"Document",noparm);
            object searchObj;
        
            if (myDoc == null) {
                Console.WriteLine("ClickElementByName: myDoc is null");
            } else {
                searchObj=CallCOMMethod(myDoc,"getElementsByName",new object[] {elementName});
                if (searchObj != null) {
                    searchObj=CallCOMMethod(searchObj,"namedItem",new object[] {elementName});
                    if (searchObj != null) {
                        CallCOMMethod(searchObj,"fireEvent",new object[] {"onclick"});
                    }
                }
            }
        }

Lets take a look at how this works.  It receives an object of a window you have already identified, perhaps by iterating the Items method on the Windows() IDispatch of a Shell.Application object.  You can identify the interfaces called in Powershell by creating such an object and then assigning the return value of each function to a variable, and running ",$var|gm" (note the comma, which prevents Powershell from iterating any interface that the object may provide).  As you will see by following the above instructions, all we are doing is ensuring that we call the appropriate interfaces to iterate down the COM object's properties and methods chains.

Note that you can also work with such COM objects using the new .NET 4.0 dynamic variable type, but I'm sure you already have found that online.  When you're looking for backward compatibility and don't want to rely on interop libraries, Reflection is a satisfying, if cumbersome solution.