Category Archives: Haskell

AHGTH: Comments on Snap/Yesod

I spent some time last night looking at Snap, which is a web development framework for haskell. I have seen repeated comments that Snap is superior to Yesod so I finally poked around myself. We use Yesod at Alpha Heavy Industries for internal status pages and tools. I have found it to be a bit kludgy to use and Snap was advertising these Snaplets which I figured held some promise. My frame of reference for comparison is ASP.Net around version 3.5 and 4.0 which I also used to build a lot of tools.

First off I am not really interested in the technical merits of either platform. Both advertise very high request throughput and have made design choices to support it. I have no need of such a solution. Our webserver is lucky if it has two people making requests at the same time. I am mostly concerned with ease of development. In ASP.Net I could build a quick webapp to display data or do data entry (my most common tasks) in a couple of hours. In Yesod it often turns into a day long affair. I have realized how much ASP.Net had insulated me from the mechanics of HTML, building forms, etc… Yesod is a much lower level affair, and in some ways higher.

At the risk of inciting your wrath I will admit that I have repeatedly compared Yesod to many half baked Microsoft technologies like WCF and WWF. Both were very easy to get up and running with. Both demoed fantastically. Everything was great, until a scenario that was not considered comes up. I feel the same in Yesod. Its easy to get going, especially now, but then you need to do something a bit complicated and you have to dive into the details of the framework. Admittedly as my haskell wizardry has increased I have been able to devise simpler solutions to complex problems in Yesod.

Snap doesn’t seem to be much different. I will leave it to others to argue about what the right templating syntax is. It seems to have most of the same features although quite a bit less documentation. I was disappointed that most of the snaplets were built to interface with databases. I was really hoping for some powerful controls. I may be missing something and if I am I invite anyone to fill me in on what that is.

ASP.Nets controls are one thing I miss from the old world. It was super easy to build a datagrid or a series of charts bound to arbitrary data sources. Now many will say that to build highly scalable websites you can’t use these things and they are right. But most of the websites out there are not serving thousands of pages per second. What I really think either framework could benefit from would be a layer on top of what exists now. A set of full featured controls for charts, data grids, calendars and things I have never thought of.

In Yesod I am still not sure what this would look like. Is a chart a Widget? My chart control today is a Widget. In fact most of my widgets are controls and they are hacky and nasty since they are javascript powered. I have done more javascript code generation than I care to mention. Wait aren’t I supposed to be writing haskell?

All that being said both seem to have solid technical foundations. More work just needs to be spent on the higher levels. I look forward to seeing both evolve.

P.S. A reader may ask why use Yesod if I don’t like it? I don’t want to have to rewrite blocks of functionality in another language. It requires less time overall to just suck it up and make the best of it.

P. P. S. Is anyone knows how reusable controls in Yesod should be developed so they can just be linked in I will develop any future controls in such a manner and release them.

AHGTH: Exception Scoping

A languages exception mechanism is important to understand as they are vital to building software which can work with the outside world. Haskell’s laziness creates some problems for dealing with exceptions that you need to be aware of. Specifically it can be difficult to reason about exactly when a code will be evaluated and when it can escape the scope of an exception handler. Consider the following:

Now of course since haskell is lazy the evaluation of iThrowExceptions, which can throw exceptions is going to be deferred until we actually use its value. Another example is assigning records in a field:

In this case since mrField is not a primitive decodeUtf8 is once again not going to be evaluated within the exception handler. Even if we are in IO we will only evaluate into Weak Head Normal Form so our list is still going to be thunks.

I am going to show two practices that we use in our code to help with proper exception scoping. The first is safeCatch. The problem with catch is that it does not guarantee that the evaluation will take place within the exception handler. Why the decision to have exception handlers travel with thunks was not made I do not know, but we live in a world where they do not. Fortunately Control.DeepSeq provides us a mechanism to ensure that our evaluation is done at a specific point and our data structures will have no thunks in them. The core of DeepSeq is a typeclass called NFData. The implementation of the rnf function fully evaluated our data type and then recurses through its children to evaluate any thunks they contain. Now implementing NFData instances for all our types would involve lots of boilerplate code so we can use a package like deepseq-th to make instances for us. Internally we have our own implementation for historical reasons. I would like to see a move made to Generics and away Template Haskell in the future.

As an aside using NFData is also very important in parallel programming to ensure that operations are done on specific threads and not on a thread that is later viewing their results. If you are not very careful then you be sticking thunks representing large amounts of work into an MVar or Chan and then having a common thread evaluate them in future which is usually not the desired behavior.

The implementation of safeCatch is very simple:

So now if we provided an instance of NFData for our MyRecord type we could do the following:

Now any exceptions that occur when decoding will be caught by the exception handler. Pretty cool.

Now the second technique I use can be used by itself or in conjunction with what I have just showed. I often do the latter. Developers will frequently find themselves assign data collected from the outside world into records. Since the assignment of fields in a record will be lazy even though the record itself may be evaluated (weak head normal form) we may want to create an exception handler on the field assignment. Fortunately there is a great function called mapException which has two important properties. First the return value it pure which we need for field assignment. Second it takes a function which allows us to transform one exception into a new one. Developers with experience in Java or C# will no doubt be familiar with inner exceptions and this allows us to do the same thing. Under the hood mapException uses unsafePerformIO to work its magic. The comments note that this is proven safe in the paper.

Complete Sample:

While there is a lot more to know about exceptions I hope this proves useful to some of you out there.

Monoid Instance on Conduit

This past weekend I had reason to use conduit to process the contents of a large number of files as a single stream. Fortunately conduit supplies a Monoid instance. Monoid allows us  to append things together. This example is very short but very useful. This little code block creates many sources and collapses them together into a single source that will be ordered by the file name. Quite cool.

let source = foldl1 (<>) $ map sourceFile $ sort files
runResourceT $ do
  source $$
    someConduit chain ...

Thanks to drb226 for his suggestions.

Update:

So obviously there are other ways of writing this. Tekmo suggested using the >> binad and discard operator which would make our code look like:

let source = foldl1 (<>) $ map sourceFile $ sort files

Sjoerd Visscher suggested using foldMap from Data.Foldable. So that gives us:

let source = foldMap sourceFile $ sort files

Thanks for the feedback.

A Hitchhikers Guide To Haskell

I have been writing in Haskell almost exclusively since the early summer. Before that I was writing in Scala. I have a significant amount of experience in both C# and F#. One of the most frustrating things about learning Haskell is there is not a good map that takes the way things are done in the OO world of Java and .Net to the way things should be done in Haskell. Even for a good developer it can be difficult to make the transition especially for writing high performance, highly concurrent applications.

I am starting this series to document things that I find have worked in Haskell. I don’t claim to be a Haskell expert, but I do build code that ships and works. I tend to use less of the fancy language features than other people and I prefer to write very readable code even if it is slightly more verbose than it might otherwise be. I hope that others can learn from my experience and move along the learning curve faster.

For our inaugural topic I will cover our development environment. We operate in a mixed language environment using Haskell, C++ and a smattering of both Python and R. We develop primarily on vim (I use mvim) on OSX. We deploy code onto custom Gentoo linux running on both our own boxes as well as EC2 machines.

As our codebase grew we quickly started to become frustrated with cabal as there is no notion of recompile all dependencies, so when a change is made in a common package everyone had to manually recompile their packages in the correct order. We evaluated cabal-dev but ultimately decided that it did not meet our needs, primarily due to the fact that we needed good C++ support and wanted more flexibility to add custom steps which did not seem like it was going to be particularly easy using cabal-dev. Ultimately we built our own build system, cabal-waf, using waf. You can find Nathan’s blog post with more details here.

For debugging we have increasingly moved to using a custom build of the RTS which has our own debugging extensions specifically around heap analysis. Most debugging is done through GDB which in and of itself is a terrible experience compared to using Windbg. In the future we will be adding more functionality for debugging as we will need it to debug production issues. I have not found GHCI to be useful for anything other than trivial issues as it runs far to slow. All core applications and libraries are compiled with -Wall and -Werror to catch any potential bugs that they can pickup.

We use our own forked version of the LLVM bindings for Haskell extensively to do code generation. There was functionality we required the depended on type unsafe code that the maintainers did not feel that they wanted, so we maintain our own fork.

It turns out that we have developed our own set of core libraries as many companies do. We will probably release a couple of these in the future. We have our own time library that takes most of its design from Joda Time. I was actually very shocked by how undeveloped this functionality was on Haskell, and we had to do a far amount of work to get complex time manipulation working initially. Apparently we are not the only ones with problems as I have found another implementation that takes its inspiration from .Net’s DateTime.

Paper on using OCaml for Trading

If you have not read the excellent paper by Yaron Minsky and Stephen Weeks titled Caml trading – experiences with functional programming on Wall Street I suggest you read it. It covers why Janes Street picked OCaml as the primary language they use. Many reasons related to safety of code.

Many of the same reasons drove Alpha Heavy Industries to pick Haskell as the primary development platform. I will publish in the future what I see as some of the short comings of the platform.

Haskell And My Conversion To The IO Monad

I first started learning haskell in early 2010. Coming from a .net (C#/F#) background I thought that haskell would be very similar to F#. I was very, very wrong. When I first started with haskell I was like many people extremely frustrated with the IO monad which in haskell is where all IO happens. A chief complaint I had was that types get infected with an IO prefix. For instance say a string is read from a file. It’s type is not just String it is instead IO String.

Adding IO to the type it turns out is very useful. IO is used to denote unsafe operations which happen outside the runtime such as reading from a file or a socket. The IO annotation simply tells the runtime that something unsafe and unpredictable can happen here. By contrast the rest of the code is what is called pure which means that the compiler can prove that it is deterministic. The proof has great properties for optimization since the optimizer in the compiler can rewrite the code any way it wants.

There is a reason that this IO is so important. I spent the last year writing scala code on hadoop. While hadoop is frustratingly limited using scala with it made writing both map reduce jobs and hive UDFs much less verbose than writing in java. I started to like scala aside from the inherit limitations of the JVM. I then started building my first large scale stand alone app in scala. That turned out to be a painful disaster. One of the most frustrating things I found was varying semantics of Java streams. I also found writing highly asynchronous code in Scala to be very painful. I began to realize that the reason I liked scala was because all my code was pure. Hadoop was handling the IO (whether it was doing a good job of it was completely a different question).

Using the IO monad to reason about IO turns out to be much easier than the alternatives. When writing large scale high performance apps in languages such as C++ typically tools and libraries are developed to compensate for the lack of the IO type annotation. However in Haskell the type checker in the compiler can alert us when we have done something incorrectly as the types will not line up. All this worrying about IO becomes very important when writing highly reliable software. Having your language do as much work as possible toward eliminating entire classes of bugs is very important in driving down the costs of developing top quality software.

I am very pleased with the fact that the software I write now often works once it compiles as the type system is able to provide assurances that languages without as rich a type system such as Java or C# cannot provide. Consider me a convert.