Slides From Software Transactional Memory Talk

Below are my slides from my STM talk this past Thursday. I have posted both pptx and pdf. The links don’t seem to export to the pdf. I will be posting a much longer post that incorporates the talk plus more content in the next couple of weeks.

PDF Slides

PPTX Slides

The Trading Show Presentations

Here is an interesting set of presentations from The Trading Show. Checkout the first presentation from Blair Hull. For those who don’t know he founded a little firm called Hull Trading.

http://www.terrapinn.com/2013/trading-show-chicago/presentations.stm

AHGTH: Comments on Snap/Yesod

I spent some time last night looking at Snap, which is a web development framework for haskell. I have seen repeated comments that Snap is superior to Yesod so I finally poked around myself. We use Yesod at Alpha Heavy Industries for internal status pages and tools. I have found it to be a bit kludgy to use and Snap was advertising these Snaplets which I figured held some promise. My frame of reference for comparison is ASP.Net around version 3.5 and 4.0 which I also used to build a lot of tools.

First off I am not really interested in the technical merits of either platform. Both advertise very high request throughput and have made design choices to support it. I have no need of such a solution. Our webserver is lucky if it has two people making requests at the same time. I am mostly concerned with ease of development. In ASP.Net I could build a quick webapp to display data or do data entry (my most common tasks) in a couple of hours. In Yesod it often turns into a day long affair. I have realized how much ASP.Net had insulated me from the mechanics of HTML, building forms, etc… Yesod is a much lower level affair, and in some ways higher.

At the risk of inciting your wrath I will admit that I have repeatedly compared Yesod to many half baked Microsoft technologies like WCF and WWF. Both were very easy to get up and running with. Both demoed fantastically. Everything was great, until a scenario that was not considered comes up. I feel the same in Yesod. Its easy to get going, especially now, but then you need to do something a bit complicated and you have to dive into the details of the framework. Admittedly as my haskell wizardry has increased I have been able to devise simpler solutions to complex problems in Yesod.

Snap doesn’t seem to be much different. I will leave it to others to argue about what the right templating syntax is. It seems to have most of the same features although quite a bit less documentation. I was disappointed that most of the snaplets were built to interface with databases. I was really hoping for some powerful controls. I may be missing something and if I am I invite anyone to fill me in on what that is.

ASP.Nets controls are one thing I miss from the old world. It was super easy to build a datagrid or a series of charts bound to arbitrary data sources. Now many will say that to build highly scalable websites you can’t use these things and they are right. But most of the websites out there are not serving thousands of pages per second. What I really think either framework could benefit from would be a layer on top of what exists now. A set of full featured controls for charts, data grids, calendars and things I have never thought of.

In Yesod I am still not sure what this would look like. Is a chart a Widget? My chart control today is a Widget. In fact most of my widgets are controls and they are hacky and nasty since they are javascript powered. I have done more javascript code generation than I care to mention. Wait aren’t I supposed to be writing haskell?

All that being said both seem to have solid technical foundations. More work just needs to be spent on the higher levels. I look forward to seeing both evolve.

P.S. A reader may ask why use Yesod if I don’t like it? I don’t want to have to rewrite blocks of functionality in another language. It requires less time overall to just suck it up and make the best of it.

P. P. S. Is anyone knows how reusable controls in Yesod should be developed so they can just be linked in I will develop any future controls in such a manner and release them.

AHGTH: The Unhandled Exception Handler

I originally wasn’t going to write this post as I thought it was too trivial but Max Cantor talked me into it so here it is.

One of the unfortunate things about GHC is that if an exception occurs on a non-main thread it is silently swallowed by the runtime. The behavior on most runtimes such as the JVM and CLR is that an exception on any thread that is not in a catch block will take down the process. Fortunately GHC provides a hook for providing a global handler for receiving these exceptions. In GHC.Conc.Sync down at the bottom with no docs you will see setUncaughtExceptionHandler. This allows us to pass a handler that GHC will route any exceptions that we don’t catch to. Code sample below.

AHGTH: Exception Scoping

A languages exception mechanism is important to understand as they are vital to building software which can work with the outside world. Haskell’s laziness creates some problems for dealing with exceptions that you need to be aware of. Specifically it can be difficult to reason about exactly when a code will be evaluated and when it can escape the scope of an exception handler. Consider the following:

Now of course since haskell is lazy the evaluation of iThrowExceptions, which can throw exceptions is going to be deferred until we actually use its value. Another example is assigning records in a field:

In this case since mrField is not a primitive decodeUtf8 is once again not going to be evaluated within the exception handler. Even if we are in IO we will only evaluate into Weak Head Normal Form so our list is still going to be thunks.

I am going to show two practices that we use in our code to help with proper exception scoping. The first is safeCatch. The problem with catch is that it does not guarantee that the evaluation will take place within the exception handler. Why the decision to have exception handlers travel with thunks was not made I do not know, but we live in a world where they do not. Fortunately Control.DeepSeq provides us a mechanism to ensure that our evaluation is done at a specific point and our data structures will have no thunks in them. The core of DeepSeq is a typeclass called NFData. The implementation of the rnf function fully evaluated our data type and then recurses through its children to evaluate any thunks they contain. Now implementing NFData instances for all our types would involve lots of boilerplate code so we can use a package like deepseq-th to make instances for us. Internally we have our own implementation for historical reasons. I would like to see a move made to Generics and away Template Haskell in the future.

As an aside using NFData is also very important in parallel programming to ensure that operations are done on specific threads and not on a thread that is later viewing their results. If you are not very careful then you be sticking thunks representing large amounts of work into an MVar or Chan and then having a common thread evaluate them in future which is usually not the desired behavior.

The implementation of safeCatch is very simple:

So now if we provided an instance of NFData for our MyRecord type we could do the following:

Now any exceptions that occur when decoding will be caught by the exception handler. Pretty cool.

Now the second technique I use can be used by itself or in conjunction with what I have just showed. I often do the latter. Developers will frequently find themselves assign data collected from the outside world into records. Since the assignment of fields in a record will be lazy even though the record itself may be evaluated (weak head normal form) we may want to create an exception handler on the field assignment. Fortunately there is a great function called mapException which has two important properties. First the return value it pure which we need for field assignment. Second it takes a function which allows us to transform one exception into a new one. Developers with experience in Java or C# will no doubt be familiar with inner exceptions and this allows us to do the same thing. Under the hood mapException uses unsafePerformIO to work its magic. The comments note that this is proven safe in the paper.

Complete Sample:

While there is a lot more to know about exceptions I hope this proves useful to some of you out there.

FPGAs and High Frequency Trading

If you are looking for a brief overview of the use FPGAs in High Frequency Trading check out this. Cover the motivation for using FPGAs and the infrastructure needed to support a strategy.

Talk on High Frequency Trading

James Thomas from Headlands Technologies did an interview on a podcast earlier this year. If you involved in or follow the HFT industry it will be of limited interest but if you are tech savvy and want to listen to an industry insider talk about it then this is interesting. Ironically Headlands has an office located just a few blocks from our offices in San Francisco.

Monoid Instance on Conduit

This past weekend I had reason to use conduit to process the contents of a large number of files as a single stream. Fortunately conduit supplies a Monoid instance. Monoid allows us  to append things together. This example is very short but very useful. This little code block creates many sources and collapses them together into a single source that will be ordered by the file name. Quite cool.

let source = foldl1 (<>) $ map sourceFile $ sort files
runResourceT $ do
  source $$
    someConduit chain ...

Thanks to drb226 for his suggestions.

Update:

So obviously there are other ways of writing this. Tekmo suggested using the >> binad and discard operator which would make our code look like:

let source = foldl1 (<>) $ map sourceFile $ sort files

Sjoerd Visscher suggested using foldMap from Data.Foldable. So that gives us:

let source = foldMap sourceFile $ sort files

Thanks for the feedback.

Taxi Medallions and Regulatory Interference

Slate ran an article on taxi medallions in major US cities and how they are related to fare increases and lower wages for taxi drivers. The article posits that treating medallions like a financial instrument is what has lead to lower wages and higher fares. I want to argue that financialization is not bad and that regulatory interference is largely to blame.

Lets first agree that the market for taxi cabs in New York is inefficient. In 1937 when medallions were first issued for taxi cabs the population was around 6.5 million people. Today it is just over 8 million people. We will assume that the per capita usage of cabs remains relatively constant over time. I have no idea if it is true but we will assume it to simplify things. If the issue is that the rents charged by the medallion holders to cab drivers is too high then most likely there is a shortage of medallions. The article does indeed confirm this.

The supply of medallions is regulated by the city government, so it seems like a logical place to start is increasing the number of medallions. Now understandably the owners of the medallions don’t want any additional medallions issued since it would lower the value of their existing medallions. However the problem is fundamental shortage of medallions so more need to be issues. In economics this phenomenon is called artificial scarcity. Indeed Paul Krugman writes about the medallion shortage as the primary example of artificial scarcity in his micro economics text book. So its not really the financialization of the medallions that is the problem. It makes sense for large cab companies to have a pool of medallions since an individual medallion can be used 100% of the time, something that would be impossible if an individual owned the medallion. What is needed is more medallions, not telling individuals (or organizations) that they may not lease them out.

This is a great example of unintended consequences. While it seems rational to ensure that the market players act in an ethical fashion through some sort of licensing process, it does not seem rational to cause a large artificial lack of supply. Increasing the supply would be an easy way to drive down costs.

I will write shortly about why ticket scalping should be legal and why professional sports teams are not good a pricing tickets.

SimpleDB to MongoDB

I have just completed moving all my database related code from SimpleDB to MongoDB. Maybe I am just the latest in a trend since Netflix is migrating to Cassandra (http://www.slideshare.net/adrianco/global-netflix-hpts-workshop). This has been a two stage process.

The first stage was to move my write dominant tables. Specifically I first moved the table that holds url meta data for our web crawler. As I scaled up the crawler SimpleDB started to have large numbers for problems with write pressure. Crawlers generate considerable write pressure and while a few dropped writes is not a real concern SimpleDB returned large numbers of errors. The transition to Mongo was smooth and Mongo has handled ever increasing write pressure. I also found SimpleDB to be too expensive and it does not have good tools for analyzing queries to analyze the load they create.

The second stage which I completed this weekend was cutting over all remaining metadata. This was also completed without fanfare and took just a couple of hours to copy and redeploy all executables.

Why did we choose Mongo? Well it has an excellent Haskell package. I very much enjoy increased type safety, which SimpleDB does not have since it stores everything as strings. I also like its operators for mutating lists atomically. The query model is pretty sweet. In fact MongoDB has become a kind of data swiss army knife for us. We use it in almost everything we do. Not that I would not use other types of solutions such as Cassandra in the future.