Monday, October 05, 2015

C# Memoization

Memoization is a program optimisation technique that stores the results of a computationally expensive function in a cache and then returns values from that cache if the function is called again with the same input(s).

Wikipedia has the term coined by (a personal hero) Donald Michie in 1968 although I can’t find the word in the original paper (dated 1967) [link: ] so it probably came into being as the idea was applied.

I had previously used the technique in JavaScript but that was a while ago and it took me a few moments to even remember the precise term – let alone work out how to do the trick in C#.

The archetypal example of memoization in JavaScript is a function to calculate Fibonacci numbers and I found this one on

var fibonacci = (function () {     var memo = [0, 1];     var fib = function (n) {         var result = memo[n];         if (typeof result !== 'number') {             result = fib(n - 1) + fib(n - 2);             memo[n] = result;         }         return result;     };     return fib; }());

The var fibonacci contains a function that can be used to calculate a result. It will return a cached value for any previously requested calculation and will recursively call itself to calculate a new value – which will often be returned from the cache (well certainly if a sequence of values are requested). 

The cache takes the form of an array (memo[]) and this is wrapped in a “closure” so that it remains accessible to the inner function fib(). 

Closure, recursion and memorization in one function – has all the dangerous signs of an interview question.

I wanted a function in C# to look-up a subset of values previously stored in a database (in this instance identity integers) with a high likelihood that individual values would be repeatedly requested over the run of the process but without being able to sort the requests in any meaningful way. So assuming that database reads are relatively slow, memorization looked like the technique to try.

My requirement is a simple case – I have a function into which I pass a string as a key and the function does a database lookup and retunes a long integer. So let us start there

private long lookUpDimension(string dkey) {     long idVal = 0;     try     {         // code to access the databse and do the lookup     }     return idVal; }

Just a simple function (well after all the database access code is removed for clarity). Point being – it is just an ordinary function – takes a single argument and returns the output.

The class containing that look-up function also has two static functions to work some magic.

private static Func<TArg, TResult> Memoize<TArg, TResult>(Func<TArg, TResult> function) {     return Memoize(function, new Dictionary<TArg, TResult>()); } private static Func<TArg, TResult> Memoize<TArg, TResult>(Func<TArg, TResult> function, IDictionary<TArg, TResult> cache) {     return key => cache.ContainsKey(key) ? cache[key] : (cache[key] = function(key)); }

So – what is this Func<TArg, TResult>  stuff?

Func is a predefined Delegate, this one takes a single argument in and outputs a single result. There are more defined (up to 8 inputs) and we could express the second one (2 inputs) thus:

public delegate TResult Func<in T1, in T2, out TResult>(
        T1 arg1,
        T2 arg2
But Func<T1, T2, TResult> is syntactic sugar of the right sort.

My code needed to declare something like this

private Func<string, long> memLookup = null;

which declares a delegate instance with a string input and a long integer output
and then the following code needs to be executed:

memLookup = Memoize<string, long>(lookUpDimension);

sometime when the class is being created perhaps or certainly before the memorized function is required. The class then exposes a method like:

public long LookupDimValue(string dkey) {     return memLookup(dkey); }

Because memLookup() now “contains” a memoized version of the original lookUpDimension() function taking a string input and returning a long integer.

One note of caution about the sort of memorized function I have used in this example. If the function fails to locate a database record corresponding to the given key then it returns a value indicating that (a zero). One option might be to then insert a new record but the memorized function will “remember” that the key is not valid and continue to return the original response. The insert would therefore have to be managed from within the memorized function and the relevant value for the new record returned.

Friday, September 25, 2015

More .NET monospace font nonsense

Just to make the point. Wikipedia defines a monospaced font as follows:

A monospaced font, also called a fixed-pitch, fixed-width, or non-proportional font, is a font whose letters and characters each occupy the same amount of horizontal space.
Now not everything you read on Wikipedia is necessarily true but I am confident that a technical reader would be happy with that definition.

You might be less happy to find that the .NET Graphics.DrawString() method does not adhere strictly to this approach. My earlier experience while trying to measure a monospaced font character width was a clue that issues might arise in actual usage. Arise they did.

You can’t see it at first but as printed lines become longer then it becomes clear that the line length is less than it should be. Pixels in the X axis are being “lost” along the way. My test data was printed at the correct register for the first 7 characters but the 8th was one pixel to the left of where it should have been. Somewhere around 90 plus characters into a line the positioning is out by 10 pixels (a single character width for the font in question).

Each character in every line is correctly positioned relative to the other lines but incorrectly positioned based upon any rational definition of a monospaced font.

The image below shows the 97th character (H) in the first line of text showing correctly in position while the second line rendered by the DrawString() method being passed the whole line of text shows the “H” well to the left of the correct location.

In case you wondered the blue “divider” is placed by an algorithm that looks for probable field positions by analysing transitions between character types (in this case a white space prior to a “letter”)

So if (like me) you are working in the .NET environment and you want to know where the nth character is on a “painted” surface of a window/control using (for crying out loud) a monospaced font – then you need to print each character individually at the correct location. On that basis, you could do just as well with a nominally proportional font of course.

At least this showed that, at longer string lengths, there was some consistency between MeasureString() and DrawString() even if both (in this instance) are wrong.

Programming in .NET is sometimes all about fighting .NET (or GDI+ or whatever is actually responsible for the rendering under the covers). On reflection, this is true of every software platform I have ever worked on.

Monday, September 21, 2015

Accurately measure monospace font widths

I was building a custom control in C# to allow a user to identify the positions of data fields in a flat text file. So one of the things the control needs to know is where a given character position is in a line of text.

Easy peasy you may say – make the font Courier New or GenericMonospace (of course) and use the Graphics.MeasureString() function.

Here is some test code:

private void Dummy2_Load(object sender, EventArgs e) {     Font mFont = new Font(FontFamily.GenericMonospace, (float)12);     this.Font = mFont;     Graphics g = this.CreateGraphics();     label1.Text += Convert.ToString(g.MeasureString("A", mFont));     label2.Text += Convert.ToString(g.MeasureString("AA", mFont));     label3.Text += Convert.ToString(g.MeasureString("AbCdEfGhIj", mFont));     g.Dispose();     mFont.Dispose(); }

Which gives a result like:

So is a character 15.2 pixels wide or 12.5 pixels wide or 10.4 pixels wide or...?

If I grab a screen shot and count the pixels I can see ALL of the characters are drawn with a width of 10 pixels. [I am currently ignoring the supposed character height as I assume that is a notional line height (but w.t.f?).]

So if you or I want to determine the character width on an unknown device with unknown settings we have a bit of a problem.

One might wonder if the function returns a notional “overhead” for any given string. But if we subtract the width returned for the single character from the width of 10 characters and then divide by 9 – we get 9.89 pixels. So any overhead is not consistent.

The reducing “error” returned by MeasureString() as the length of the initial test strings increased might make you wonder if a very long string would give an accurate result. Sadly a string of 100 characters measures at 9.94 pixels per character which at least could “round up” to the correct value but raises new questions about just what is being measured.

Perhaps there is an optimal string length that could be applied?

A bit more code came up with a value of 48 characters for an optimal result on my development system – but I have no idea if that is fully “portable” for the full range of font sizes and screen resolutions.

N.B. Just in case you were wondering, and even though you know the code is using a monospaced font – you get identical results substituting lower case i's into any or all of the test strings.

After posting this I built a little class to measure the true font size. The class creates a bitmap and then draws the characters q and j onto the bitmap and scans the pixels to find the first and last rows of pixels that contain part of one of the characters - this gives the true height. Then the bitmap is cleared and two j's drawn next to each other. The class then measures the number of pixels between the first column containing a pixel for the first character and the first column of pixels used to draw the second. This gives the true width.

The code is more than a bit scrappy so feel free to correct/optimise. Please treat it as a prototype (which it is) and not production quality code.

class MeasureMonoFont : IDisposable {     private Graphics g;     private Bitmap bm;     private SolidBrush br;     StringFormat drawFormat;     private const int defBitmapSize = 100;     public MeasureMonoFont()     {         bm = new Bitmap(defBitmapSize, defBitmapSize);         g = Graphics.FromImage(bm);         br = new SolidBrush(Color.Black);         drawFormat = new StringFormat();     }     public SizeF GetCharSize(Font withFont)     {         SizeF cSize = new SizeF();         g.Clear(Color.White);         PointF cPoint = new PointF(0, 0);         g.DrawString("qj", withFont, br, cPoint, drawFormat);         Color tPixel;         int firstRow = -1;         int lastRow = -1;         for (var yp = 0; yp < defBitmapSize; yp++)         {             for (var xp = 0; xp < defBitmapSize; xp++)             {                 tPixel = bm.GetPixel(xp, yp);                 if (!(tPixel.ToArgb() == Color.White.ToArgb()))                 {                     if(firstRow == -1) { firstRow = yp; }                     if(yp > lastRow) { lastRow = yp; }                 }             }         }         cSize.Height = ++lastRow - firstRow;         g.Clear(Color.White);         g.DrawString("jj", withFont, br, cPoint, drawFormat);         firstRow = -1;         lastRow = -1;         bool secondChar = false;         bool firstChar = false;         bool emptyColumn = false;         for (var xp = 0; xp < defBitmapSize && lastRow == -1; xp++)         {             emptyColumn = true;             for (var yp = 0; yp < defBitmapSize && lastRow == -1; yp++)             {                 tPixel = bm.GetPixel(xp, yp);                 if (!(tPixel.ToArgb() == Color.White.ToArgb()))                 {                     if (!firstChar)                     {                         firstChar = true;                         firstRow = xp;                     }                     if (secondChar) { lastRow = xp; }                     emptyColumn = false;                 }             }             if(firstChar && emptyColumn) { secondChar = true; }         }         cSize.Width = lastRow - firstRow;         return cSize;     }     public void Dispose()     {         g.Dispose();         bm.Dispose();         br.Dispose();         drawFormat.Dispose();     } }

Wednesday, September 16, 2015

Reading a delimited text file in C#

You can of course write your own string parsing code or you can take the easy way and add a reference to Microsoft.VisualBasic within your Visual Studio project and make use of the TextFieldParser

This little snippet loads the content of a delimited text file into a ListView

using Microsoft.VisualBasic.FileIO;          private void loadCSVFile(string filePath, string delimiter) {     TextFieldParser tfParser = new TextFieldParser(filePath);     tfParser.TextFieldType = FieldType.Delimited;     tfParser.SetDelimiters(delimiter);     string[] curRow;     ListViewItem lvi;     while (!tfParser.EndOfData)     {         curRow = tfParser.ReadFields();         lvi = listView1.Items.Add(curRow[0]);         for (var ci = 1; ci < curRow.Length; ci++)         {             lvi.SubItems.Add(curRow[ci]);         }     }     tfParser.Close();     tfParser.Dispose(); }

Sometimes VB is your friend.

Usual caveats about production code (try/catch etc.)

Monday, September 14, 2015

Wot no Python Executable?

So how did my first foray into Python go? 

Well partly I cheated and wrote Iron Python code in Visual Studio – so obviously I write a lot of .NET code but using Python syntax. The idea was to get a running start on how to structure my Python and I wrote a bunch of classes with methods, adopting the MVC pattern to push that beyond where I would normally go. So far, I am still enthusiastic about the language. There are odd puzzles to do with variable scope but I am confident those will work out nicely as I write more code. The short and snappy syntax bodes well for productivity although debugging minor spelling mistakes (character case and all that) counteracts that a bit.

You might think that building a .NET Python app was a quick route to having something I could distribute as an executable to others but (and this is a big BUT) – it is possible to encapsulate an Iron Python program as an executable but it is a big hill to climb.

Python looks like a programming language for YOU to use but the infrastructure lacks more than a little when it comes to sharing. A given program can (perhaps must) be built on a plethora of pre-existing libraries that may not be apparent to your neighbour when it comes to sharing. Worse for me ‘cos I generally think command line programs stink (and I do understand why you might think otherwise) – perhaps I had better try and explain where I am coming from. Some (or maybe all) of it is a history.

My first programs were COBOL encoded in EBCDIC and punched onto 80 column cards. These cards were fed into a card reader’s “hopper”, compiled and (if all went well) the program ran. The program caused other punched cards containing data to be read and the resulting output was printed (astonishingly quickly by today’s standards) on a line printer – or just possibly output as newly punched cards ready for another process. If I got my program and data into the computer room “in-tray” by 17:00 I usually got the output the following day. The ultimate “command line”.

Then I started in a new job and was assigned to try out something called BASIC on a DEC PDP 11/70 as this was a platform that was seen as suitable for “end user” computing and someone was needed to be able to help out with development work within various departments of the business. I was handed the language specification for DEC’s Basic+ and left to my own devices for a bit. It was a pivotal moment because BASIC contains two very important keywords – INPUT and PRINT. I realised that my future programs could interact with the user while they were running. The other “tools” were a bit primitive – VT52VDUs and printer terminals  hooked up to the computer room via 300 baud modems. Clunky kit but the software (in comparison to what had gone before) could soar.

At about the same time the Apple ll and Commodore PET micro computers were also introducing a new and interactive model to a wider audience (here too BASIC was a key component). A common anti-pattern of that era was to get the user to type in some data, validate it and then display the result. The “anti” bit came when the validation failed – typically this resulted in user punishment with a GOTO command taking the program back to the beginning with all data previously entered lost. This was lazy programming and (despite the miniscule available RAM) it was a personal fundamental that user data was corrected with minimal effort on the part of the human running the program.

As clerical tasks were “computerised” it was also crucial that the user remained in control. We needed to preserve the knowledge of how things were done – as otherwise the next edge case could become a major problem. Software has a duty to educate as well as undertake the drudgery of so many tasks. Fortunately terminals (and micro-computers) evolved and improved. Every advance was applied to the task of improving the quality of the interaction between user and software.

To cut to the chase, it became second nature for me to ensure that the UI (User Interface) was a critical part of every bit of software. So much so that the tools I write for myself typically have a strong UI even as this adds perceptibly to the development time.

So even if I had to explain to you why (say) I have a utility that takes an SQL table definition and outputs boilerplate class code (in C#, VB.NET or Java at choice) together with support stored procedure definitions (with variants for SQLite if required) you could copy the executable to your machine and just run it. You might not appreciate the “why” but the “how” should be obvious.

So given my fervour for strong user interfaces I have some difficulties with the current Python infrastructure. I have in mind a project that would make very effective use of all that Python list handling goodness. My problem would be in sharing any resulting software with others who might make use of it – particularly if it became a commercial proposition.

For the moment Python has to be just for me and is not for sharing. Probably confined to the Raspberry Pi – actually it will be interesting to try out the Kivy library to interact via a touch screen on that platform. So I am not finished yet that's for sure.

Tuesday, September 08, 2015

This week’s language is Python

Warning – these notes on Python are based on very short acquaintance and may well be misleading and/or factually incorrect.

OK – coming late to the table most people would think – but then again. The last time I took a look at Python was around the time Python 3 came out and the status and functionality of the mass of C support libraries was in some doubt. Turns out that is still true in many instances with a lot of effort still seemingly going into Python 2.n and it is not clear that the world at large has collectively taken the big step to Unicode.

Anyway, I wanted to do some stuff with one of those new Raspberry Pi 2 machines and Python looks to be the lingua Franca for that platform when running Linux. So I had another look. **

The specific feature that figures in any introduction to Python is that white space (specifically line indentation) is significant and part of the language syntax. Most introductions wax on about there being no need for {curly braces} or semicolons but then again you do have to frown a bit at those unmentioned colons at the end of significant lines

>>> def myfun(arg):
. . .
>>> for w in something:
. . .

Then my reading got to len(something). 

What? I thought. Surely the length of something (number of characters in a string or the count of entries in a list) would be an attribute of the relevant object. I even found Guido van Rossum’s explanation in an old Python FAQ but also found it less than convincing. I looked further and found there were at least 60 language features implemented as methods. Wow! I thought this is BASIC born again. There are even Input and Print methods. Then I looked at how such methods as len() were implemented – and then I think I reached an understanding.

Each object type (tricky to get the terminology right) where a length measurement makes sense implements a method called __len__() that will be used automatically by the language len() method. Programmers can implement the __len__() method on their custom objects confident that the language’s len() method will act as a standard interface to that functionality. In a language envisioned for code sharing (and multiple standard libraries) this is actually a powerful concept. You don’t have to know how (say) an unknown object implements a readable string representation of itself – you can just use str(object) with confidence.

So it looks like the BASIC style language methods can be extended to include new types of object which is a massive step forwards.

Python is also big on Lists. I suspect that languages influential in the design of Python included Prolog, Smalltalk and Lisp (I might have said a Lispy JavaScript but the languages while not quite contemporaneous were both born around the same time). No surprise then that Python is a very popular language in AI research. There is nothing in Python Lists that you can’t do with (say) C# but the language is rich in list manipulation functionality “out of the box”.
I love the fact that you can use expressions to create and populate a list:

>>> squares = [x**2 for x in range(10)]
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

And range() itself is a fascinating and flexible tool for controlling loops.

Tuples look interesting as immutable collections of heterogeneous elements although I have yet to work out how a mutable element (like a list) might fare inside a tuple.

Sets are another specialised form of list with support for set operators like union and intersection. Takes me back to college maths. Can’t wait to see what I can do with these. Curley braces do turn up in the language here and then also with dictionaries.

Dictionaries are available and look well supported and I have always liked the “associative array” concept

Functions are first class objects and I greeted the huge range of options for passing arguments into a function with a lot of interest. There are potential traps for the unwary here I think but also a flexibility not available anywhere else. Lambda’s as well of course.

Built in JSON serialisation and de-serialisation supports the storage and transfer of objects and the capacity to execute strings as code opens up the opportunity to implement a DSL (Domain Specific Language) among other opportunities. Then there are Generators (see to help scale data presentation and high data volumes.

Classes have similarities to the equivalent JavaScript objects. They can contain the equivalent of static variables, instance variables, attributes and methods. Instance variables can be created and manipulated on individual instances of a class. Scope and the order that objects are defined or assigned can impact upon the meaning of the code.

Classes support inheritance (including multiple inheritance) with the ability to override base class methods. Operator overloading is also supported.

The construction and importing of code modules (which then act as name spaces) is an effective mechanism for code structure and code sharing.

Python runs into a bit of a steep incline (not a wall) when it comes to complex interactions with the user. It is fundamentally a scripting language and is not primarily intended to maintain a GUI (Graphical User Interface). KiVy is going to be bigger on the Raspberry Pi with the newly announced touch display. For PCs Iron Python is an open source community project that integrates the Python language with .NET might well be one workaround for more complex projects and with Visual Studio 2015 (community edition) being free for a lot of projects and developers it is a viable option.

When you have been working in JavaScript for a few days and the code is really starting to flow then this is partly because you have erected a good mental model of the variable and method scope in play. I strongly suspect that constructing and maintaining a similar mental model for a large Python codebase would be essential – there are some similarities but also some key differences. Getting “in the groove” might take a little effort.

None of this is  intended to act as any sort of Python introduction – rather the intention is to communicate a little of the enthusiasm I am feeling for the language now I have taken a proper look. Python is different (a big bonus), clever and very capable. Now all I have to do is learn how to write effective code in this language – a big step on from just assembling an overview of the functionality after all. So the next side-project is going to be in Python.

** plan is to have two SD cards so I can “dual boot” one of the Linux variants available for the Pi and Windows 10 – should be fun. I also see Python runs on Windows OIT Core as well as on the usual Linux variations.

Tuesday, September 01, 2015

Eating the Browser cake

That side-project again. One of the primary outputs of the app is data and text collected from web sites in (mostly) HTML format. This can most simply be wrapped to create an HTML document and displayed in the WebBrowser control made available by Visual Studio. This works well up to a point – the WebBrowser control is a “wrapper” for the Internet Explorer COM object but only exposes a limited set of events and properties. There is also an issue with multi-threading – using the WebBrowser control makes it very difficult to update the UI thread from (say) a BackgroundWorker.

I was happy with the HTML presentation within the WebBrowser control but wanted to handle any user clicks on links by pushing the link off to the default browser on the relevant machine. An external link could require a whole host of support facilities not enabled by the WebBrowser control so this made sense as well as (probably) reflecting the expectations of any user.

So, how to push responsibility to those external links off to the user’s preferred browser. Turns out to be easy to handle in the Navigating event:

private void webViewer_Navigating(object sender, WebBrowserNavigatingEventArgs e) {     if (!hasStarted)     {         hasStarted = true;         return;     }     e.Cancel = true;     var startInfo = new ProcessStartInfo     {         FileName = e.Url.ToString()     };     Process.Start(startInfo); }

The bool pageShown is set false before loading the original page in the WebBrowser control and this is set true by the first Navigating event resulting from that page load. Subsequent Navigating events before the pageShown value is reset are pushed off to the default application (a web browser certainly) that handles URLs.

This worked fine up until the HTML content being displayed contained an <iframe> tag. While loading graphics and scripts from remote servers caused no issues filling the content of an <iframe> triggered a Navigating event. So I needed to know if a Navigating event was being triggered by an <iframe> requesting content as this may happen after the main content had loaded and the DocumentCompleted event fired (so I could not reliably re-set the pageShown Boolean within that event handler.

StackOverflow is your friend here of course. Questions like “possible to detect whether an iframe is loaded in the navigating event” hit the nail on the head. This starts a trail that initially leads to a March 2006 Code Project post by Jeroen Landheer but also mentions the BeforeNavigate2 event thus exposed and the opportunity to insert a test into the BrowserExtendedNavigatingEventArgs class.

There is also this CodeProject article from May 2007 which is worthy of investigation if you need to follow a similar path.

It looks like the Windows System SHDocVw.dll exposes a whole range of additional interfaces and events for IE not exposed by the WebBrowser control but getting a handle on them takes a little effort to say the least – hence the projects to sub-class the WebBrowser in the two CodeProject articles mentioned.

Anyway – the only snag remaining is that the standard WebBrowser Navigating event fires before the BeforeNavigate2 event which is slightly counter-intuitive. So the trick is to move the functionality from the Navigating event into the WebBrowserExtendedEvents class BeforeNavigate2 event by adding a Boolean property to that class for the main form to notify the start (and end) of the main document loading.

As an aside, I did wonder if there were alternatives to the provided WebBrowser control. Turns out there have been projects to “wrap” WebKit and FireFox browsers but what looks like the most able and up-to-date project provides a wrapper for the Chrome browser and the GitHub repository is here . However if you were expecting a nice Visual Studio control to “drop onto” a form then you might be disappointed. I think the idea is that you will want to wrap the functionality in a custom control of your own perhaps supporting tabs and other browser flummery. The simplest way to add the CefSharp control to your form is programmatically in the form class constructor thus:

private CefSharp.WinForms.ChromiumWebBrowser mBrowser; public Form1() {     InitializeComponent();     mBrowser = new CefSharp.WinForms.ChromiumWebBrowser("")     {         Dock = DockStyle.Fill,     };     this.Controls.Add(mBrowser); }
It works and seems very fast and responsive. The documentation currently takes the form of “read the code” so it was not immediately obvious I could solve my problems by switching the underlying browser but at first glance it did look possible. What stopped me spending too much time here was the requirement to build your project against the 32bit or 64 bit version. While there are fewer 32bit PCs out there it still represents an additional issue should this project ever be shared with others even informally.