Ideas and essays on code development: 2015

Monday, December 07, 2015

Bare Minimum Software

With the financial year end fast approaching it is time to consider any kit purchases in anticipation of activities during 2016 (and I don’t get that list right very often in truth). A laptop upgrade to something smaller and lighter looks a good idea – particularly with one of the teenagers having a dead machine and thus more than grateful for my cast off (actually not a bad spec - certainly going to outperform most current low end offerings).

Buying hardware is just a few clicks in a web browser these days but the software component list promises a long session just running installs.

What would be on your list for a basic set-up?

I started with:

Chrome browser
FireFox browser
Visual Studio 2015
Android Studio
SQL Server
Paint.NET
Notepad++
SourceGear diff/merge tool
SQLiteBrowser
Bruce Schneier ‘s PasswordSafe
Hex Editor
IPScanner
Dropbox
LinQPad
Putty

Plus some homemade tools and utilities.

Then there is the question of MS Office – will Gmail’s ability to display (say) Word documents fill in enough on incoming attachments to get by?

Something to read PDF’s I suppose.

Any suggested additions?

Factor in a few system updates to add to the confusion and slow package download speeds and this already looks like a day’s work

Edit:

Cracked and added a copy of MS Office. With the release of Office 2016 there are a lot of bargains around - particularly with earlier versions. In fact by a series of stock shortages I ended up with a copy of 2016 Pro for the price of a 2013 Home & Office on the residual market.

Tuesday, December 01, 2015

Raspberry Pi music

Somewhat off topic.

I am the very happy owner of a five year old Brennan JB7 digital music system which includes a couple of their speakers sitting on some carefully positioned stands. Recently Brennan sent me an email to introduce the new “B2” model – and very nice it looks but I wondered if the price at £579.00 for the top model was strictly warranted (with all due respect for the undoubted product quality).

As the new device seemed to be based upon a Raspberry Pi I ran through a few numbers for building something equivalent:

1tb USB drive – retails at around £40
Raspberry Pi 2 – around £30 (often less)
Micro sd card - £5
Amplifier - £50
Power supplies- £20
CD drive - £8
Sundries (cables, case etc.) - £25

The Brennan box also has digital SPDIF and Line Out plus input options (all of which add a little to their costs) but I can’t see me making use of those in these minimalist times. I would just like to add digital radio and streaming to a disk based music playback capability.

In going through my shopping list I started to get enthusiastic and wondered just how good a device I could put together. Now I am a software man – so what am I doing with what looks like a hardware project? Well not much in the way of electronics – I just need to hook the main components together – so perhaps just a little soldering. The nitty gritty all looks to be in the software.

First task was effectively a side project. I felt that testing with speakers would be clumsy within the relatively confined area of my personal workspace – but I could manage headphones. So I decided to build a (near) empty box to hook some speaker terminals up to a headphone jack socket. With this I could provide an amplifier with something to drive and use the headphones to check the output without disturbing the peace.

The hard drive is one recovered from a retired laptop sitting in a box that adapts the SATA interface and makes it a USB drive – cost less than £3 on Amazon if I recall. The Pi can’t supply enough power for the drive through the USB port so there is an additional power supply that takes the 12 volts arriving at my “box” and converts it to the 5 volts required delivered through a USB connection and a standard “Y” USB cable.

The main power supply is a butchered mains/car adapter but an old laptop power supply would probably do – depending upon the voltage limits of any amplifier used. The HifiBerry Amp+ used here can happily operate between 12 volts and 18 volts and in turn powers the Raspberry Pi which helps reduce the complexity of the wiring rat’s nest you can see below. The idea was to start with longish hook-ups and then shorten them as required when the components get fixed into the case.

The case already has a power lead entry and an on/off rocker switch drilled and fitted. I have also drilled the back plate for the speaker connections and super-glued the main part of the loudspeaker terminal block to the outer face.

I will have to decide if I am going to go with wifi or add an RJ45 fitting to the back plate for a wired Internet connection. The box will certainly need a ‘power on’ led indicator and I would like to add an infra-red remote control for volume and track skipping at least (although most control will probably be via a web browser interface). Given that the pi can join the network and expose the music storage location (so I can add albums) the inclusion of a CD drive is still a choice to be made (one that might tax my ability to neatly tackle the case modifications).

I first started the Raspberry Pi with a copy of the standard Raspbian Linux variant. This booted just fine but (of course) was not aware of the amplifier so a rather quiet initial test. I then downloaded and installed the distro supplied by HiFiBerry and tried that. Now I quickly ran into the limits of my Linux knowledge and had some difficulty in getting sound to the amplifier. In the end, I pointed the web browser at Amazon, found the latest Enya album page and clicked on the track samples. That initially got some loud clicks from the headphones but after adjusting the (xwindows) volume control I heard some very nicely re-produced music. That at least confirmed that the amplifier worked and that my speaker/headphone adapter also functioned as expected.

It was now time to widen the testing and this meant entering (at least initially) the sometimes murky world of “open source” software. I might have the odd moan in this section but I would not want to imply any criticism of any of the projects mentioned or the teams that labour on them. Without open source and free projects we would all be the poorer (financially and culturally). In the past I have published open source software myself and that’s all I am saying (I have shared the pain guys).

A few minutes Googling showed that the Pi has inherited a solid core of key music related components from Linux and that there were a number of live projects looking to deliver the ‘ultimate’ solution. Two projects (RuneAudio and Volumio) seemed to be forks of the much respected RaspyFi project that itself now looks defunct. These projects currently sport PHP web interfaces that in turn make use of the MPD (Music Player Daemon) music playing server. A few posts suggest that the RuneAudio/Volumio divide was a little bitter.

I tried the RuneAudio project first. This was partly because the main internet page showed a picture of the HiFiBerry amp I had selected with the implicit promise that this and other selected Raspberry Pi (HiFi) add-ons were supported. I was also slightly put off by the way the Volumio project announcement was ‘camped out’ on top of the RaspyFi site. First I had to download the RuneAudio distribution (based I think on ArchLinux) and pop it on the micro SD card. It booted and I quickly located the web interface from the browser on my PC. It was clear the RuneAudio had located the albums stored on the hard drive and so I had a stab at selecting one to be played. Not a lot could be heard. A Google search found a commitment to support my amplifier from early 2014 and some suggestions on how to manage the trick short of that support eventually being delivered. This involved installing some software and settings. I logged into the distro and started trying to follow the instructions. The first irritation was that my keyboard was not supported (symbols all over the place and inexplicably the <y> and <z> keys swapped) but I struggled on. Then the installations failed with 404 errors and so I decided to try my luck elsewhere.

Next up was Pi MusicBox which has a web site like something from the early 90’s but – it’s the software that counts. Unfortunately (for me at least) the software sort of booted and then hung. To be fair, this might have been because my Pi was the recent Pi2. So I tried again.

I tried Volumio after all. Another distro (based on Debian this time I think) to install on the old SD card. This distro booted and the web app quickly became available from my PC’s web browser. The web page presented was nearly identical to the RuneAudio one which probably means both are inherited in turn. A quick trip to the settings page located the HiFiBerry amp and well - it just worked, playing albums and tracks from the hard drive. I turned to the Web radio options and soon had the BBC World Service coming through load and clear (sitting there with my headphones it could have been a clip from an old 1960’s spy film – although there was no short-wave crackle and I was not wearing a hat).

So with a working sound source it was time to stop and take stock (actually I should try Mopidy soon as that is Python based and thus potentially susceptible to some constructive hacks).

On the face of it Volumio might look like a good start point for further development. I am no fan of PHP but how bad could it be? However, the Volumio project has now started a complete re-write using Node.js. Now that is probably the best current start point for a new project of this nature but this does imply that future releases might be “feature incomplete” and that not much attention is going to be paid to moving the current release forward (this is not a criticism – just a comment).

Everything tried thus far has been based upon a custom distro which presumably greatly simplifies the task of distributing software updates (as well as integrating any developments with core OS changes). This does make the process of adding custom additions (like a CD “ripper” and IR interface) a teensy bit problematic. I am assuming that the additional power of the Pi2 will compensate for any extra “drag” from any software mods and additions I make – but that we will have to see.

I will next dig out a pair of old Mission monitor speakers I have in the garage and give them a try – cranking up the output to see just what this little rig can deliver.

One sad note – I had hoped that this little project could deliver BBC radio output (2, 4 and 5 anyway) and thus preclude the medium term purchase of a DAB radio for my office space. Turns out that earlier in the year the Beeb retreated into their iPlayer and websites so they could control (read stop) who got to hear the output based upon a proxy for location (IP address). I am not at all sure how to view this – particularly as I though the BBC had a mission to communicate with the world and not just those who pay the UK license fee. The continued availability of the World Service channel just sort of underlines this parochialism.

Speaker Update:

The amp drove the speakers very well indeed. Loads of power - going to be as loud as anyone would want even in a large room. Not set up for a more discerning listening test yet but the "sound stage" was clearly evident and I could not detect any obvious amplifier induced distortion.

Monday, November 16, 2015

And This Week’s Language is F#

What? Why? Are you going to endlessly post indecipherable and incomprehensible snippets and rants on functional programming like those Haskell types? (Just joking guys – honest.) Well at least I can’t bang on about Monads as F# does not have them (‘cos the close equivalent are called “Computation Expressions”)

Continuing on my “meta” travels I started to look at what was available to the .NET developer in the general field of “numerics”. Linear Algebra, Regression, Optimization, Time Series Analysis and beyond – hopefully with a liberal license. There are a lot out there but an interesting subset were clearly developed in F# - time to clone the odd GitHub repository (and also see what Nuget could provide as packages) and see if I could make sense of anything.

You might be interested by Microsoft Research’s “Tabular” http://research.microsoft.com/en-us/projects/tabular/

Or Deedle from BlueMountain Capital http://blogs.msdn.com/b/fsharpteam/archive/2013/11/13/announcing-deedle-data-frame-and-time-series-package-for-exploratory-data-programming-with-f-and-c.aspx

Or indeed

Cronos http://cronos.codeplex.com/

Math.NET http://numerics.mathdotnet.com/

Meta Numerics http://metanumerics.codeplex.com/

Of course R.NET http://nugetmusthaves.com/Package/R.NET.Community to “wrap” the R statistical package https://www.r-project.org/

And lots more…

Please don’t get the impression these are all F# projects but it does look like F# has a distinct appeal to the people that build them and those that use them.

You may already have F# as an optional programming language available within your copy of Visual Studio but VS2015 installed the option “on the fly” for me after I cloned “Tabular” from GitHub. After that I did a quick search on my C: drive for fsi.exe (the F# interactive console), located it among the Microsoft SDKs and then created a desktop shortcut. Once double clicked, this allowed me to try out some of the tutorial stuff on the web – just to get familiar with the basics of the language. Yes, in that respect at least it does feel a bit like Python – but with the comfortable option to compile code and include it in a C# app. Did I say comfortable? F# is distinctive – say that for it.

At first glance the variable assignment statements above look like those of many other languages – the double semi-colons by the way are a feature of the interactive interpreter as they indicate that a statement is complete – which is a marker for some of what is to come.

First off, variables are not variables but immutable values and are thus referred to as values. Well except when they are not (because sometimes you might need to switch to an imperative style or do stuff like populate and save data records) and that is when you use the mutable keyword:

> Let mutable x = 8;;

With a special mutable assignment operator:

> X <- 23;;

> x;;

Val it : int = 10 \\the fsi output

But I am wandering from the point – stick with values are immutable.

Functions are simply defined:

> let addfunction x y = x + y;;

Which defines a function that takes two integers and returns their integer sum. Did I mention that F# is (very) strongly typed? Probably not. I did not define any types in the code statements above but F# inferred a type – that defaults to integer. I could have written

> let addfunction (x : float) (y : float) = x + y;;

But what I could not have done (unlike most languages) is write

> let addfunction (x : float) ( y : int) = x + y;;

error the type ‘int’ does not match the type ‘float’

F# does not support implicit casts even safe ones like int to int64.

> let addfunction (x : float) ( y : int) = x + (float y);; // works though, with the explicit cast.

Here is a longer function – our old friend Fibonacci – and no I have not yet figured out how to memoize it:

let rec fib = function
                | 0 -> 0
                | 1 -> 1
                | n -> fib (n – 1) + fib (n – 2)

The “rec” keyword indicates the function is recursive (despite that being obvious) and you can clearly see something similar to a switch statement and “arrow notation” indicating the values to be returned.

Going back to a simpler form – say:

let addString x y = (x + y).ToString();;

the ToString() bit looks .NET normal shall we call it but if you type it into fsi.exe you get the following response:

val addString : x:int -> y:int -> string

and this needs exploring. The implication is that the second parameter is passed to a function that returns a string and the first integer parameter is passed to a function that returns an integer. This would (broadly) be correct as F# functions only take a single argument and only return a single type. So how is that? F# automatically curries functions. See https://en.wikipedia.org/wiki/Currying for a quick dip into currying if you have not played with the technique. Then pause and thing about how that might impact upon a language where functions are not just first class citizens but rather the whole point.

To quote the Wikipedia article “F# is a strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming techniques.” Which maybe understates things a bit as there does not seem much at the cutting edge of language development left out – and all encapsulated within a functional programming paradigm.

For the moment I will be content when I have taken on board enough of the language syntax to be able to follow the relevant math library code but I will be coming back to try and write something non trivial in F#. The syntax is superficially daunting but starts to make sense as the underlying functionality becomes a little clearer. F# feels like it could give you (like Oddball's Sherman) a very nice “edge” (in some of life's more complex coding challenges).

Addendum:

If you are a user of Visual Studio it is easier to run Fsi.exe withing that environment by pressing <ctrl><alt><f> with an F# project open.. The text is clearer as well.

F# is more that somewhat absorbing - although it does suffer from the "this is the only way" crowd banging on a bit. I can see my next side project might be C# for the UI and F# for all the functionality which would be a neat solution solving some of the issues raised by Python - and hey guess what? F# borrowed the concept of "significant white-space" from that very source (plus a few other bits I think).

Friday, October 16, 2015

Constructing and compiling C# using Roslyn

The optimal path for my current project requires the ability to create and use new classes at runtime. The alternatives all look rather clumsy while this approach looked elegant.

.NET has had some capacity to compile arbitrary code since the beginning but a lot of effort has recently gone in to a new compiler – known as Roslyn. On the face of it there are two ways towards my goal – use the CodeDom which is language agnostic or try out the newer (Microsoft) open source Roslyn technology that is designed to compile C# and VB.NET.

CodeDom and Roslyn are only partly related and after some research it looked like Roslyn was the way ahead as it provides a feature complete compiler for my language of choice C#. The other major plus is that Roslyn runs “in process” while CodeDom runs compiles through another executable with (perhaps) a rather clunky API.

If you fancy going the well tried CodeDom way then MSDN has a good code example towards the bottom of this article.

To add the Roslyn tooling to a project use the NuGet Package Manager option on the Tools menu and Choose “Manage NuGet Packages for Solution…” Select nuget.org as the package source and then search for “Microsoft.CodeAnalysis”. Click on Microsoft.CodeAnalysis.CSharp , click the “Install” button and when offered, accept the license.

The actual development task turned out to be pretty steady.

I started with a class to define the properties of the class to be constructed


internal class ClassProperty
{
    public string PropertyName { get; set; } = "";
    public string DataType { get; set; } = "";
    public ClassProperty() { }
    public ClassProperty(string propertyName, string dataType)
    {
        PropertyName = propertyName;
        DataType = dataType;
    }
}

And supplied a populated List<ClassProperty> to a new class that I was going to use to construct and compile the code. In fact I created a little method to loop through the fields of a database table and populate the list (as that was the direction I was going in and I am lazy) but a List can be quickly populated with just a few lines of code.

The new class also had properties exposed for a namespace, class name, using directives, inheritance and interfaces. (The latter two are part implemented in the following code in that they are added to the class declaration but any related methods remain unimplemented at this stage.)

I backed up the property creation by creating a Dictionary of default values and you can see how it is applied after this.


private Dictionary<string, string> defaultValues = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{
    ["string"] = " = String.Empty;",
    ["Int64"] = " = 0;",
    ["Int32"] = " = 0;",
    ["Int16"] = " = 0;",
    ["DateTime"] = " = new DateTime(1970,1,1);",
    ["double"] = " = 0;",
    ["single"] = " = 0;",
    ["Decimal"] = " = 0;",
    ["bool"] = " = false;",
    ["Boolean"] = " = false;",
    ["int"] = " = 0;",
    ["long"] = " = 0;"
};

The method to create the code is pretty short.


private string buildClassCode()
{
    StringBuilder sb = new StringBuilder();
    foreach(string uses in usings)
    {
        sb.AppendLine("using " + uses + ";");
    }
    sb.AppendLine("namespace " + nameSpace);
    sb.AppendLine("{"); // start namespace
    string classInherits = (inherits.Length > 0 || interfaces.Count > 0) ? " : " + inherits : "";
    foreach(string inface in interfaces)
    {
        classInherits += (classInherits.Length > 3) ? ", " + inface : inface;
    }
    sb.AppendLine($"public class {tableName}{classInherits}" );
    sb.AppendLine("{"); // start class
    sb.AppendLine($"public {tableName}()"); // default constructor
    sb.AppendLine("{}");
    foreach (ClassProperty newProperty in classProperties)
    {
        sb.AppendLine($"public {newProperty.DataType} {newProperty.PropertyName} {"{ get; set;}"}{defaultValues[newProperty.DataType]}");
    }
    sb.AppendLine("}"); // end class
    sb.AppendLine("}"); // end namespace
    return sb.ToString();
}

I tested this by popping the output into a multi-line textbox. while the code lacks indentation it is perfectly clear as the image below shows

Now to get that code compiled and run an initial test using VS Debug mode to step through the code and watch the results.


private void buildinstance()
{
    string instance = buildClassCode();
    SyntaxTree syntaxTree = CSharpSyntaxTree.ParseText(instance);
    // inspect the tree
    //SyntaxNode root = syntaxTree.GetRoot();
    //foreach (var node in root.DescendantNodes())
    //{
    //    var x = node.GetText();
    //}
    string assemblyName = tableName; //Path.GetRandomFileName();
    MetadataReference[] references = new MetadataReference[]
    {
        MetadataReference.CreateFromFile(typeof(object).Assembly.Location),
        MetadataReference.CreateFromFile(typeof(Enumerable).Assembly.Location)
    };
    CSharpCompilation compilation = CSharpCompilation.Create(
        assemblyName,
        syntaxTrees: new[] { syntaxTree },
        references: references,
        options: new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary)); // you can also build as a console app, windows.exe etc
    Assembly assembly = null;
    var ms = new MemoryStream();
    EmitResult eResult = compilation.Emit(ms);
    if (eResult.Success)
    {
        
        ms.Seek(0, SeekOrigin.Begin);
        assembly = Assembly.Load(ms.ToArray());
        Type type = assembly.GetType(nameSpace + "." + tableName);
        object newObject = Activator.CreateInstance(type);
        // now we can prove that worked by stepping through the code while iterating over the properties
        foreach (PropertyInfo propertyInfo in newObject.GetType().GetProperties())
        {
            string pName = propertyInfo.Name;
            string dataType = propertyInfo.PropertyType.Name;
            // and test we can assign a value
            if (dataType.Equals("String", StringComparison.OrdinalIgnoreCase))
            {
                propertyInfo.SetValue(newObject, "foo", null);
            }
        }
    }
    else
    {
        IEnumerable<Diagnostic> failures = eResult.Diagnostics.Where(diagnostic =>
            diagnostic.IsWarningAsError || diagnostic.Severity == DiagnosticSeverity.Error);
        string msg = "";
        foreach (Diagnostic diagnostic in failures)
        {
            msg+= $"{diagnostic.Id}: {diagnostic.GetMessage()}" + "\r\n";
        }
        // do something useful with the message
    }
    ms.Close();
    ms.Dispose();
}

Ran first time, although I went back and added the diagnostics you can see just to step throght things as they happened.

The SyntaxTree built a structure for the code elements from the string. The CSharpCompilation object took that syntax tree and compiled the code as C#. The compiled assembly was saved into a memory stream and the compilation results checked for errors. Assuming no errors, the code was loaded into an Assembly object and a new instance of the compiled object (class in this instance) created and inspected.

Next I checked that I could execute a method on the compiled class. I added the following line to the class code creation method:


sb.AppendLine("public string getBar(string foo) {return foo + \" bar\";}");

and a code line to call the method after the object is created in the buildinstance() method tested above


string res = (string)type.InvokeMember("getBar", BindingFlags.Default | BindingFlags.InvokeMethod,
    null, newObject, new object[] { "foo" });

which returned the string “foo bar” to the variable res.

Job done.

Addendum

Thought I would check that the default class constructor was executed when the compiled object instance was created. Thought it must be but…

Adjusted the class constructor to set one of the properties and then read that back when looping over the properties. Worked as expected.

The Activator.CreateInstance() method can be called with a parameter list to actuate alternate constructors with their parameters thus:

public static object CreateInstance(Type type, params object[] args)

which probably covers that issue as an Interface can’t contain a constructor but can contain any other methods I want to expose. Clearly, there is a wee bit more code to write to get a practical implementation but it is also clear that the overall approach is very practical.

And Then The Real World

In the real world your compiled class will probably reference one or more System/.NET dll, a custom library or your new class may inherit from another class already defined in the main program assembly.

You can add such references but you need to supply the path to the relevant dll or exe file. Apparently this changed from early Roslyn releases so (like me) you might find many obsolete methods if you Google around the subject. You also need to include the relevant "using" statements in the code itself of course (see above).

In my first stab at this the CSharpCompilation constructor was supplied with a default array of MetadataReference objects. This array can be extended to include whatever is required.

The default location for the main System dlls can be found with this line of code:

var assemblyPath = Path.GetDirectoryName(typeof(object).Assembly.Location);

You can then construct a path to (say) System.dll and Syatem.Data.dll like so:


string sysTm = Path.Combine(assemblyPath, "System.dll");
string sysData = Path.Combine(assemblyPath, "System.Data.dll");

You can similarly construct the path to the classes in your own program as well (assuming they are public):

string mePath = Path.Combine(Application.StartupPath, "MyProg.exe");

The paths can then be used to add the references like so:


MetadataReference[] references = new MetadataReference[]
{
    MetadataReference.CreateFromFile(typeof(object).Assembly.Location),
    MetadataReference.CreateFromFile(typeof(Enumerable).Assembly.Location),
    MetadataReference.CreateFromFile(sysTm),
    MetadataReference.CreateFromFile(sysData),
    MetadataReference.CreateFromFile(mePath)
};

And that is how to add those vital references to your Roslyn compiled assembly.

Tuesday, October 13, 2015

Interfaces and Unknown Classes

I wanted to make use of the ExcelReader class I created as a source of data to create class instances.

Generally the classes in question inherit from a base class that also knows how to save itself to a database. I could, of course maintain a list of such classes within my Excel reading code and call an appropriate routine as required. However I had in mind to create something that was generic enough to handle a whole raft of classes – including some created and compiled “on the fly” and thus unknowable to the source code.

The normal method for dealing with a range of classes through a common function would be to specify an interface common to all and then to interact through that interface.

Interfaces (if this is an unfamiliar term)

An interface is a list of public properties and methods (also indexers and events) that ~~should~~ must be exposed by a class that implements the interface. Interfaces themselves can be inherited but contain no actual code – just the signatures (if you like) of the components being specified.

One of the simplest ways of creating an interface is to (VS2015) right click on the class name in the code window and select “Extract Interface” from the “Quick Actions” context menu. Earlier versions of Visual Studio offer a “Refactor” menu option that includes extracting an interface.

Requiring a specific Interface is an excellent way of ensuring that objects calling the functionality of a class you provide supply full support. You can rely upon a method or event handler being available – even if it just a “stub”. IComparable would be a good example of just such an interface.

I have used interfaces in the past for testing alternate algorithms. An interface allowed me to construct a standardised software socket into which I could plug different classes at will. I could thus test performance and function of alternate approaches without having to change the code “consuming” the service/process.

If I come across a great in depth web page on Interfaces I will post a link here but for a couple of helpful examples you can do a lot worse than this C# Station post

Just about the only thing my target classes have in common is an Update method. This limits the scope of any Interface a bit. The properties could be addressed through reflection and it was a reasonable bet that a class Clone() method could be used to create new class instances. So that was beginning to sound like a workable strategy.

So my interface ended up being defined like this:


interface ObjectClone
 {
     object CloneAsObject();
     void UpdateClass();
 }

Each class implementing the interface needs then to add a CloneAsObject method alongside an UpdateClass method. The CloneAsObject() method is very simple:


public object CloneAsObject()
{
    return (MyClassName)this.MemberwiseClone();
}

There is an assumption here that all such classes expose public properties for all of the key data elements.

I defined an ExcelDataLoader class that will use an instance of the ExcelReaderclass to gather data from selected columns in a spreadsheet and use this data to create instances of a class that implements that very simple ObjectClone Interface.

The ExcelDataLoader class needs a local definition and a public property:


private ObjectClone targetClass = null;
public ObjectClone TargetClass
{
    set { targetClass = value; }
}

To pass in a new instance of the required class.

Plus I also needed a way of passing the required instructions to the ExcelDataLoader. A simple class to define the required columns and the associated data name and type seemed the thing and that could be as simple as:


public class ExcelColumnData : IComparable
{
    public string ColumnName { get; set; } = "";
    public string DataName { get; set; } = "";
    public ExcelColumnData()
    {
    }
    public ExcelColumnData(string ColumnName, string DataName)
    {
        this.ColumnName = ColumnName;
        this.DataName = DataName;
    }
    public int CompareTo(object objx)
    {
        ExcelColumnData x = (ExcelColumnData)objx;
        int res = this.ColumnName.Length.CompareTo(x.ColumnName.Length);
        return (res == 0) ? this.ColumnName.CompareTo(x.ColumnName) : res; // AA after Z
    }
}

Although you will note that this class implements an interface – Icomparable so that List<T>Sort() can be used to ensure the columns are presented in order.

These column definitions could then be passed as a List<ExcelColumndData> to the ExcelDataLoader class.

The ExcelDataLoader class loadData() method uses an instance of the ExcelReader class to extract some columns of data from a specified sheet and then creates an instance of the class to be created. It then uses reflection to loop through the class properties and if one of these matches a specified data name associated with an Excel data column then the expected data type is checked and an attempt made to cast (or parse) the data extracted from Excel to the required type. If the parse works then the property is set to the excel value.


private int loadData()
{
    int recCount = 0;
    excelReader = new ExcelReader(excelFilepath, sheetName);
    List<string> colNames = new List<string>();
    if(excelColumns.Count > 1) { excelColumns.Sort(); }
    foreach (ExcelColumnData eColData in excelColumns)
    {
        colNames.Add(eColData.ColumnName);
    }
    List<List<string>> selCols = excelReader.GetSelectedColumns(colNames, skipFirstRow);
    foreach(List<string> excelRow in selCols)
    {
        ObjectClone newObject = (ObjectClone)targetClass.CloneAsObject();
        foreach (PropertyInfo propertyInfo in newObject.GetType().GetProperties())
        {
            if (propertyInfo.CanWrite)
            {
                string propertyName = propertyInfo.Name;
                string dataType = propertyInfo.PropertyType.Name;
                for(var eci=0; eci< excelColumns.Count; eci++)
                {
                    if (excelColumns[eci].DataName.Equals(propertyName, StringComparison.OrdinalIgnoreCase))
                    {
                        // the strategy is to set the Property value if the incoming data is valid for the Property type
                        // otherwise the value is left as the class default setting
                        if (dataType.Equals("String", StringComparison.OrdinalIgnoreCase))
                        {
                            propertyInfo.SetValue(newObject, excelRow[eci], null);
                        } else if (dataType.Equals("Int64", StringComparison.OrdinalIgnoreCase))
                        {
                            long intVal = 0;
                            if (long.TryParse(excelRow[eci], out intVal))
                            {
                                propertyInfo.SetValue(newObject, intVal, null);
                            }
                        } else if (dataType.Equals("Int32", StringComparison.OrdinalIgnoreCase))
                        {
                            int intVal = 0;
                            if (int.TryParse(excelRow[eci], out intVal))
                            {
                                propertyInfo.SetValue(newObject, intVal, null);
                            }
                        }  else if (dataType.Equals("Int16", StringComparison.OrdinalIgnoreCase))
                        {
                            Int16 intVal = 0;
                            if (Int16.TryParse(excelRow[eci], out intVal))
                            {
                                propertyInfo.SetValue(newObject, intVal, null);
                            }
                        } ... etc
                          ''' 
                          ...
                }
            }
        }
        newObject.UpdateClass();
        recCount++;
    }
    return recCount;
}

And you can pump data from Excel (or anywhere else) to create classes that are anonymous as far as the process is concerned and (in this instance) call a method to retain the data in a database for future use.

Usual caveats as this is prototype (although working) code and my production version will inevitably grow "horns" over time as I deal with edge cases and extra features.

Friday, October 09, 2015

Reading Excel with the Open XML SDK

A while back I posted Look Mum no Office on the subject of the Microsoft Open XML SDK that I had used for writing an Excel file without the need for a copy of Office on the machine running the code. This proved to be very straightforward and there was some excellent samplecode available

Yesterday I needed to read data from an Excel workbook and decided to write a class that I could use in multiple scenarios to grab data from this source. Using the Open XML SDK looked the way to go as this should work with all Excel files produced by versions after Office 2003.

First use the Package Manager to grab the support libraries

PM> Install-Package DocumentFormat.OpenXml

And then add WindowsBase.dll to the project references (it is in the Framework Assembly list so easy to spot using the “Add Reference” dialog.

My first Excel reading class function was to read the list of Sheet Names in a workbook.


private List<string> getSheetNames()
{
    List<string> sheetNames = new List<string>();
    try
    {
        var document = SpreadsheetDocument.Open(excelFilePath, false);
        var workbookPart = document.WorkbookPart;
        var sheets = workbookPart.Workbook.Descendants<Sheet>();
        foreach(Sheet sheet in sheets)
        {
            sheetNames.Add(sheet.Name);
        }
        document.Close();
    }
    catch (Exception ex)
    {
        ...
    }
    return sheetNames;
}

This was a good indicator that this process was going to be pretty steady although there were a couple of twists coming up (with all making sense when thought about).

I wanted my class to have two further main features. One to take a sample of rows from a specified sheet and another to extract defined columns from a specified sheet. As both cover similar ground, let’s look at the development of the sample row extract.

The process is 1. Identify the worksheet, 2. Grab the sheet, 3. Extract the sheet data and then 4 Loop through the Rows in the sheet data. Within each row we can loop through the cells and read the string content.


private List<List<string>> initialGetSampleRows()
{
    List<List<string>> samples = new List<List<string>>();
    try
    {
        int selCount = 0;
        var document = SpreadsheetDocument.Open(excelFilePath, false);
        var workbookPart = document.WorkbookPart;
        var sheets = workbookPart.Workbook.Descendants<Sheet>();
        foreach (Sheet sheet in sheets)
        {
            if (sheet.Name == sheetName)
            {
                var workSheet = ((WorksheetPart)workbookPart.GetPartById(sheet.Id)).Worksheet;
                var sheetData = workSheet.Elements<SheetData>().First();
                List<Row> rows = sheetData.Elements<Row>().ToList();
                foreach (Row row in rows)
                {
                    var thisRow = new List<string>();
                    foreach (Cell cell in row.Descendants<Cell>())
                    {
                        var cellContent = cell.CellValue;
                        thisRow.Add((cellContent == null) ? cell.InnerText : cellContent.Text);
                    }
                        samples.Add(thisRow);
                    selCount++;
                    if (selCount >= sampleRowCount)
                    {
                        break;
                    }
                }
                break;
            }
        }
        document.Close();
    }
    catch (Exception ex)
    {
        ...
    }
    return samples;
}

Does it work? Well there are a few snags. Dates are (perhaps not unexpectedly) showing as a number, numbers themselves are fine but cells with a string content are being read as an integer of some sort.

Turns out that strings are held in a shared string table so we need to change the code to read that table when required based on the cell DataType ("s").


foreach (Cell cell in row.Descendants<Cell>())
{
    var cellContent = cell.CellValue;
    var cellText = (cellContent == null) ? cell.InnerText : cellContent.Text;
    var dataType = cell.DataType ?? CellValues.Error;
    if (dataType == CellValues.SharedString)
    {
        cellText = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(Convert.ToInt32(cell.CellValue.Text)).InnerText;
    }
    thisRow.Add(cellText);
}

Excel dates are stored as an integer number of days since 31^st December 1899 – so the Excel date for 1^st January 1900 is 1. There is a snag inherited for compatibility reasons from Lotus – where the year 1900 was assumed not to be a leap year when in fact (unlike the year 2000) it was. So the numbers are almost always out by 1. Dates are represented as human sensible dates on an Excel spreadsheet by using a “style”. It is possible to detect a style value for a cell although this is normally null.

Something like:


uint styleIndex = cell.StyleIndex ?? uint.MinValue;
if (styleIndex == 3)
{
    Double serialDate;
    if (Double.TryParse(cellText, out serialDate))
    {
        DateTime cellDate = DateTime.FromOADate(serialDate);
        cellText = cellDate.ToString(dateFormat);
    }
}

Where we can use the inbuilt DateTime.FromOADate() function to handle the conversion, applying string formatting to suite.

There is another snag. It would not make any sense for an XML based Excel file to include empty cells. This means that when a row of cells is read the row only contains cells with content. This would present a problem where empty cells appear (perhaps now and again) before columns of cells of interest. Fortunately each cell object contains a CellReference that can be used to extract the Column+Row identity we are familiar with on Excel sheets. This can be used to detect a cell being presented earlier than expected in a row.

One solution (now deprecated - see below) is to create an Enumerator that can manage the detection of missing cells and supply empty ones as required. The class code was re-structured a bit at this point in time as well – partly to make the code units available to functionality to be added to the class later.

The Enumerator looks like this:


private IEnumerator<Cell> excelCellEnumerator(Row row)
{
    int currentCount = 0;
    foreach (Cell cell in row.Descendants<Cell>())
    {
        string columnName = columnNameFromReference(cell.CellReference);
        int thisColIndex = convertColumnNameToNumber(columnName);
        while(currentCount < thisColIndex)
        {
            var emptyCell = new Cell()
            {
                DataType = null,
                CellValue = new CellValue(string.Empty)
            };
            yield return emptyCell;
            currentCount++;
        }
        yield return cell;
        currentCount++;
    }
}

With support from:


private string columnNameFromReference(string cellReference)
{
    var regex = new Regex("[A-Za-z]+");
    var match = regex.Match(cellReference);
    return match.Value;
}
private int convertColumnNameToNumber(string columnName)
{
    char[] colAlphas = columnName.ToCharArray();
    Array.Reverse(colAlphas);
    int colNumber = 0;
    for (int i = 0; i < colAlphas.Length; i++)
    {
        char alpha = colAlphas[i];
        int tAlpha = i == 0 ? alpha - 65 : alpha - 64;
        colNumber += tAlpha * (int)Math.Pow(26, i);
    }
    return colNumber;
}

And with the nitty gritty of the sample extract function being changed to use the Enumerator thus:


foreach(Row row in rows)
{
    var thisRow = new List<string>();
    var cellEnumerator = excelCellEnumerator(row);
    while (cellEnumerator.MoveNext())
    {
        var cell = cellEnumerator.Current;
        var contentString = readCellContent(cell, workbookPart);
        thisRow.Add(contentString);
    }
    samples.Add(thisRow);
}

And the reading of the cell content (all code earlier in the post) being moved to a separate function.

Addendum

A little experimentation indicated that it might also be a good idea to make provision for Boolean values in an Excel sheet. Where Booleans exist they are presented in Excel as TRUE and FALSE but the underlying values are a (reasonable) 1 and 0 respectively. You can detect Boolean values using the cell DataType which (in this instance) would be set to CellValues.Boolean.

There is also a CellValues enum CellValues.Date but strangely this is not set in any of my sample data so the original technique of inspecting the StyleIndex still stands.

You might however wish to add the following code snippet to deal with Booleans:


if(dataType == CellValues.Boolean)
{
    int bVal;
    if(int.TryParse(cellText, out bVal))
    {
        if(bVal == 0)
        {
            cellText = "false";
        } else
        {
            cellText = "true";
        }
    }
}

Parsing the value rather than just casting it may just be paranoia but…

Further Thought

Of course Yield return is sheer flummery in this context – although the Iterator was lots of fun.

Yield should be reserved for computationally expensive processes where you want to spread the load a bit. In this instance most or all cells are going to be available in the Row object and so we would be better writing a function to return the whole list of cells in one go.


private List<Cell> getExcelRowCells(Row row)
 {
     List<Cell> theCells = new List<Cell>();
     int currentCount = 0;
     foreach (Cell cell in row.Descendants<Cell>())
     {
         string columnName = columnNameFromReference(cell.CellReference);
         int thisColIndex = convertColumnNameToNumber(columnName);
         while (currentCount < thisColIndex)
         {
             var emptyCell = new Cell()
             {
                 DataType = null,
                 CellValue = new CellValue(string.Empty)
             };
             theCells.Add(emptyCell);
             currentCount++;
         }
         theCells.Add(cell);
         currentCount++;
     }
     return theCells;
 }

And then consume the list as before but with conventional List<T> iteration:


foreach (Row row in rows)
{
    var thisRow = new List<string>();
    List<Cell> rowCells = getExcelRowCells(row);
    foreach(Cell cell in rowCells)
    {
        thisRow.Add(readCellContent(cell, workbookPart));
    }
    samples.Add(thisRow);
    selCount++;
    if (selCount >= sampleRowCount)
    {
        break;
    }
}