coreutils in Haskell, Part I
The first programming language I ever used was HTML. This was in the days where all-caps tag/attribute names were all the rage. My editor of choice? Notepad. HTML was satisfying to write because it offered immediately visible results, and it was just plain fun. I soon began perusing the plethora of learning materials available for HTML online and quickly discovered a technology called JavaScript that I was promised would provide my simple HTML pages with exciting dynamicity. Soon after, I discovered the marvelous technology CSS, and suddenly I could make pretty web pages with cool interactive effects. It was a period of joyful experimentation.
The rest of my programming journey is long enough to warrant another post, so I’ll save it for another time. Suffice it to say that, while there has been plenty of fun and many satisfying things produced along the way, learning Haskell is the first time programming has been as joyful as it was when writing HTML. It’s a different kind of satisfaction, to be sure, something which my experiences along the way have shaped me to be able to appreciate. This blog post, which I’m calling “Part I” in an attempt to motivate myself to make it a series, is dedicated to sharing and celebrating that joy. Of course, it won’t all be joyous, but there’s some truth to the platitudes about needing contrast to truly appreciate joy.
I would certainly call myself a Haskell newcomer. I’ve read Learn You a Haskell for Great Good! a couple of times, along with the requisite dozen or so blog posts explaining what a monad is. I’ve also made it about halfway through Bartosz Milewski’s excellent “Category Theory for Programmers,” in recorded lecture form. Beyond that, about all I’ve done is implement a couple of IETF RFCs and most of a sudoku solver. I enjoy the expressivity of Haskell and the sorts of reasoning enabled by functional programming abstractions. This series is an attempt to explore Haskell more thoroughly and document the process for Haskellers, newcomers, and outsiders alike. Let’s get started!
I’ve seen Rust veterans recommend that newcomers practice the language by reimplementing parts of coreutils
themselves.
I suspect this is partly because of Rust’s role as a systems language, but I’ve decided to try this idea out with Haskell, so there you have it: the premise of this series.
I’ve decided to save my creativity for the actual programming and call this project hutils
.
My understanding is that the stack
project’s goal is reproducible builds, which sounds fine to me, so I’ll use that, I guess?
$ stack new hutils
The command output suggests I’ve forgotten to do some configuration, so I guess I’ll try again.
$ rm -rf hutils
$ vim ~/.stack/config.yaml # YAML, huh? Surprising.
$ stack new hutils
Oh cool, ~/.stack/config.yaml
has a lot of the missing fields already listed (and commented out).
Anyway, time to start actually editing.
$ cd hutils
$ exa # https://github.com/ogham/exa
That’s a lot of files!
I’ll take a closer look later, but I’m really itching to write some code.
I’m guessing, based on my experience with Rust, that app
is intended for executable stuff and src
is for library stuff.
The presence of src/Lib.hs
seems to reinforce this.
Wait a second — which utility do I want to replicate?
nl
and wc
come to mind, as do cp
and rm
.
cat
seems likely to be unexpectedly complex.
On the other hand, echo
seems too simple.
How about yes
?
It contains a handful of interesting things while avoiding filesystem interaction: standard output, command line argument parsing, recursion, maybe some signal handling.
Sounds like a good candidate!
Some poking around in stack.yaml
suggests that I can set up the project to accomodate multiple subprojects pretty easily.
Searching online makes it seem like I should be able to just create a subdirectory; maybe I’ll run stack new
again?
$ stack new yus
Now I’ll add ./yus
to packages
in stack.yaml
and try stack run yus
.
Oh yeah, I guess I’ll wait for stack
to install GHC. Sure.
$ stack run yus
someFunc
Just to make sure, I’ll edit yus/src/Lib.hs
to print "yus"
instead of "someFunc"
.
Okay, let’s try again.
$ stack run yus
someFunc
Hmm.
Running stack run
from within ./yus
executes something called yus-exe
; I’ll try fiddling with this.
$ stack run yus-exe
yus
Annoying, but understandable. Time to actually write some code.
So, what does yus
actually need to do?
I’m going to start with the most basic version: write a single y
, followed by a newline, to standard output until interrupted.
But this is Haskell, so I want to think in terms of types first.
This function takes no (meaningful) argument and produces a side effect.
Namely, it writes something to standard output.
That is, it performs I/O, and it returns nothing meaningful.
I’ve seen this before: IO ()
.
Since there’s no meaningful parameter type to assign, I’ll also use the unit type for this.
Otherwise, I think GHC will just treat my “function” as a constant, which I probably don’t want?
I think that might be not good for monadic effects?
I’ll have to remember to try that out later.
Okay, digressions aside, I now know at least the type of the “function” (which may be multiple functions) representing my program.
yus :: () -> IO ()
I know I can write something followed by a newline with putStrLn
; is there something specifically for writing a character followed by a newline?
Searching Hoogle for Character -> IO ()
doesn’t show anything particularly interesting, so I guess I’ll stick with putStrLn
.
Now all that’s left is to write an infinite loop.
I know that monadic control flow is often simplified using Control.Monad
, so I’ll take a look at those docs.
forever
looks promising:
Repeat an action indefinitely.
However, its written as forever :: Applicative f => f a -> f b
.
Is IO
an applicative functor?
Oh yeah, it’s a monad, so of course it is.
(Weird to think about putting a function inside IO
, though.
I wonder if there’s some cool serialization library that lets me read functions directly from files.
I suppose GHC is one.)
It doesn’t totally make sense to me that forever
takes an f a
and gives me an f b
, but let’s run with it.
In yus/src/Lib.hs
(which I totally mistyped as lib.rs
several times):
module Lib (yus) where
import Control.Monad (forever)
yus :: () -> IO ()
yus _ = forever $ putStrLn "y"
As for yus/app/Main.hs
:
module Main where
import Lib
main :: IO ()
main = yus
Wait, that’s not quite right.
module Main where
import Lib
main :: IO ()
main = yus ()
I fully expect this to fail, since I don’t think the signature of forever
is quite right for this usage.
One way to find out!
$ stack run yus-exe
y
y
y
y
# ...
Uh-oh.
I hope I don’t have to do signal handling myself.
Nope, it responds to SIGINT
!
Great!
…but why does that type-check?
Recall the type of forever
:
forever :: Applicative f => f a -> f b
What does this actually say?
To me, at my current level of understanding, it promises to take an instance of an applicative functor f a
and return a value of a completely different, apparently unrelated type f b
.
Oh.
Of course.
It’s a divergent function!
For all I care, it could be implemented using absurd
: it’s not going to return, so it can promise to return whatever makes our program type-check.
I can see how this might be convenient, but I think I’d prefer if it returned something like Void
.
Maybe this somehow breaks the type system or the optimizer or something.
For now, I guess I’ll just accept that the weirdness is due to its nature as a divergent function.
Then the type signature is much less confusing.
Now it’s time to learn what yes
actually does.
I’m not sure I’ve ever actually used it (especially since discovering apt-get -y
exists!).
I don’t anticipate that I’ll want to implement the whole set of functionality, but I’ll take a look before I decide one way or the other.
$ man yes
Well, as it turns out, the GNU coreutils
version of yes
doesn’t actually take any arguments other than an optional string to repeat.
Testing it reveals that multiple arguments are simply joined with spaces and treated as a single argument, which is actually quite handy and an example of good shell ergonomics.
Anyway, I suppose it’s time to figure out how to get command line arguments in Haskell.
A quick search brings up System.Environment.getArgs
.
Conveniently, the documentation says that the list returned by this function doesn’t contain the program name, which is something I had foreseen being an annoyance.
It seems to me that there are two outcomes:
- the argument list is empty — use the default string
y
; or - the argument list is nonempty — join the argument list with spaces and use this as the string to repeat.
It seems like an almost silly abstraction to make, but I think it at least makes logical sense to separate concerns a little bit and make a function which repeats its argument on standard output indefinitely.
This can then be called either with "y"
or with a string derived from user input.
The type of this function is pretty apparent:
writeForever :: String -> IO ()
The implementation’s not much harder:
writeForever :: String -> IO ()
writeForever x = forever $ putStrLn x
Hmm…can I write that in point-free?
writeForever :: String -> IO ()
writeForever = forever . putStrLn
Nice!
Now I can move on to the problem of handling arguments.
I’ll write a function to fetch the string to be used, which also seems like an almost silly level of abstraction.
I know about unlines
to join strings with newlines; is there a version that uses spaces?
Searching Hoogle for [String] -> String
brings me to unwords
, which seems to be exactly what I want.
targetString :: String
targetString
| null args = return "y"
| otherwise = unwords args
where args = getArgs
Oops. I completely forgot that this performs I/O.
fetchString args :: String
fetchString [] = return "y"
fetchString xs = unwords xs
targetString :: IO String
targetString = fmap fetchString getArgs
Somewhat less elegant, but I guess it’s good enough. Now to connect all of the plumbing:
module Lib (yus) where
import Control.Monad (forever)
import System.Environment (getArgs)
yus :: IO ()
yus = targetString >>= writeForever
writeForever :: String -> IO ()
writeForever = forever . putStrLn
fetchString :: [String] -> String
fetchString [] = "y"
fetchString xs = unwords xs
targetString :: IO String
targetString = fmap fetchString getArgs
Rewriting this helped me see things a bit more clearly: since the I/O action is the entity of interest here (rather than some function producing an I/O action), there’s no reason yus
has to “look like” (or be) a function; rather, it can simply be a constant with a value of type IO ()
.
If I then say main = yus
, the IO action bound to yus
will simply be executed; there’s no reason that it has to take any arguments.
On that note, here’s yus/app/Main.hs
:
module Main where
import Lib
main :: IO ()
main = yus
Short and sweet!
Trying it out returns the desired results; it would seem I’ve successfully reimplemented yes
in Haskell by importing two functions from base
and writing a handful of my own.
Fewer than twenty lines of code!
The Rustacean in me wants to go dig at performance characteristics, figure out how to explicitly lock stdout
, etc., but exactly how often does yes
need to supply input even multiple times per second?
I think this is good enough for me, for now.
Or almost good enough, anyway.
Defining both fetchString
and targetString
seems like overkill; surely I can find a way to reduce this verbosity?
After a bit of digging, I recall that <$>
exists as an infix alternative to fmap
.
I’m going to try writing this in a different way.
module Lib (yus) where
import Control.Monad (forever)
import System.Environment (getArgs)
yus :: IO ()
yus = fetchString <$> getArgs >>= writeForever
writeForever :: String -> IO ()
writeForever = forever . putStrLn
fetchString :: [String] -> String
fetchString [] = "y"
fetchString xs = unwords xs
I like this a lot more, but there’s one last thing: I really would prefer the yus
function to be more intuitively readable scanning from left to right.
As it is now, the first conceptual “action” performed is written second on the right side.
I could use <<=
instead of >>=
, but I want to read left-to-right and not right-to-left.
I’m stumped. I actually don’t know how to pull this off more elegantly. I’ve heard good things about the Haskell community, and I seem to recall seeing something like the Rust “easy questions” thread on the Haskell subreddit; maybe I’ll ask there?
Sure enough, a while later, a very helpful /u/philh has a suggestion: I can import <&>
from Data.Functor
, which is just <$>
with its arguments flipped!
This seems like a good deal to me.
module Lib (yus) where
import Control.Monad (forever)
import Data.Functor ((<&>))
import System.Environment (getArgs)
yus :: IO ()
yus = getArgs <&> fetchString >>= writeForever
writeForever :: String -> IO ()
writeForever = forever . putStrLn
fetchString :: [String] -> String
fetchString [] = "y"
fetchString xs = unwords xs
Now this feels elegant.
The main yus
function reads almost like English: “get arguments, fetch the string to print from them, then write it as output forever.”
writeForever
is easy to read and understand intuitively as well, and it’s point-free to boot!
fetchString
is unambiguous too; I think I even prefer the pattern matching to using null
.
So what did I learn from this?
Well, I still have a bit of a tendency to think imperatively, as evidenced both by my function names and my worrying about the distinction between constants and functions.
I also learned about <&>
from Data.Functor
, which I anticipate being helpful in the future.
My impression from reading the final result is that Haskell is extremely expressive, but finding the “words” to express what you want can be difficult at times, especially if you don’t know exactly what it is you’re looking for.
At any rate, this implementation was a lot of fun, and I’m very happy with the end result.
Onward!