Wednesday 24 October 2007

The aftermath

We are wrapping up our efforts on the ICFP Programming Contest 2007. Most contest-related matters have been finished by now.

If you want to read more about the ICFP Programming contest, you should have a look at our report about it.

You can order a t-shirt with the after picture on the back, and one picture from the arrival sequence on the front from cafepress, or create your own t-shirt using these materials (4.3MB).

If you want to see the presentation we gave about the contest at ICFP 2007, have a look at our video (with an explanation about how to solve the contest problem starting at 18:00, and Endo appearing at around 1:12:00).

The question we got after the contest was:
How much time did you spend creating it?
We have different answers to this question, but probably the one that comes closest to the truth is: as much as we enjoyed creating it.

Wednesday 10 October 2007

Endo lives!

At the International Conference on Functional Programming (ICFP) in Freiburg last week, we announced the results of the ICFP Programming Contest 2007.

Just as last year, the first prize was won by Team Smartass from Google research.



The second prize was one by the United Coding Team from the University of Cape Town, South Africa.



The judges' prize was won by Celestial Dire Badger (Jed Davis).



The highest survival chance produced during the contest (by Team Smartass) was over 90%, so Endo survived! Endo actually came with us to Freiburg to say `thank you' to the people present.



Later in the week he was spotted by several ICFP attendants in the city of Freiburg, running around and enjoying himself.

Friday 17 August 2007

Tickling grass

As you all already know, Endo's chance of survival is also affected by other organisms and objects in his environment. One of the organisms that has recently attracted our attention is grass. It turns out that with just a few modifications to Endo's DNA you can make grass grow in completely different places! We have not discovered, however, how to make it grow at the right locations. Take a look at what we tried:
IIPIFFCPICCFPICICFPPICICIPICIIICCCIICIIICCCIIICCCP
IICIPIIICCPIIPIICIIPIPIIICCPIICIIPIICIPIIICCPIICIP
PPCCCFFCCCCCCCCCCCCCCCCCCICIPPCPFCCCFFCCFCCCCCCCCC
CCFFFICIPPCCPCFCFCFFCCCCCFCFFCCCCCCCICIICIIPIFFCPI
CCFPICICFPPICICIPICCIIICICIIICIICCIIICCCPIICIPIIIC
CPIIPIICIIPIPIIICCPIICIIPIICIIPIPIIICCPIICIIPIICIP
IIICCPIICIPPPCCCCFFCCCCCCCCCCCCCCCCCICIPPCPCCFFFCC
FCFFFCCFFFFFFCFFICIPPCCPCFFCFFCCCCFCCCCCFCCCFFFICI
PPCICPCFFFCFFFFFCCCCCCFCCCCCCICIIC
This prefix enables grass repositioning and redefines the function bioMul (see initial conditions help page) as follows:
bioMul x y = bioAdd x y
It seems that the bioMul function influences where grass is going to grow. We are still searching for the right definition of bioMul, and we are confident that when we find it, grass will grow at the right locations. Do you want to help us figure out how to fix bioMul? If you do, you might find it helpful to understand first how bioAdd is encoded in DNA.

Friday 10 August 2007

Finding palindromes

--------------------------------------------------------------------------------
This blog post receives thousands of hits per month. I do not want to change the blog post from 2007, since it is part of a larger context.

Visit Finding palindromes for a more extensive description of the algorithm and the underlying concepts and ideas.

Visit http://www.jeuring.net/homepage/palindromes/ to obtain the software for finding palindromes.
--------------------------------------------------------------------------------
A palindrome is a number, verse, word or a phrase that is the same whether you read it backwards or forwards, for example the word `refer'.

When I started as a PhD student in Computer Science in 1988, my professor gave the following problem assignment to me. Construct an algorithm that, when given a string, finds the longest palindrome substring occurring in the string. For example, given the string "yabadabadoo", the algorithm should return the substring "abadaba". The algorithm should not only return the longest palindrome substring, it should also return it fast. The requirement was that the algorithm should only use a number of steps linear in the length of the argument string. This was a difficult problem, which gave me severe headaches in the months to follow. I spent four months on the problem, and solved it. The one comment I got from my professor was "The best thing about this problem and its solution is that they have absolutely no practical relevance."

He was wrong.

Since 1988 I have found that palindromes play an important role in the world around us. Palindromes appear in music, in genomes, in art, architecture and design, mathematics, physics, chemistry, etc.

Particularly interesting is the fact that palindromes occur in the male genome: of the about 20,000,000 characters representing the male-specific region of the Y chromosome, 5,700,000 characters form together eight large palindromes (off by a couple of characters). I don't think there is any proven explanation for the presence of palindromes in the male genome, but I suspect it has something to do with repairing genomes. If you know more about it: please let me know.

As you may or may not have noticed when working on repairing Endo's DNA, the cloud is corrupted. It turns out that the cloud's genome also is a palindrome (off by a couple of characters), and that the corruption can be repaired by taking the mirror image of the palindrome. To do so, you have to find the palindrome. In the case of Endo, this is pretty easy: `duolc' follows cloud immediately in the symbol table. Had this information not been in the symbol table, you would have to find this palindrome in the DNA by means of an algorithm for finding palindromes. Actually, the DNA for the cloud is only 6,500 acids long, and using a naive quadratic-time algorithm for finding palindromes is not a real problem if you have a sufficiently fast machine. A quadratic-time algorithm for determining palindromes in human male DNA isn't going to work: it would take a long time to find the palindromes.

This blog message explains how to find palindrome substrings in linear time. I will develop a program in Haskell for finding exact palindromes. The problem of finding approximate palindromes is left as an exercise to the reader :-) Interestingly, the algorithms for finding palindromes developed by computational biologists generally use suffix trees, and both space and time consumption seem to be worse compared with the algorithm I give in this blog message. But descriptions of the algorithm given here have been published in the eighties (by Galil and others, described in RAM-code) and nineties (by me, described in Squiggol, which is probably even harder to decypher than RAM-code) of the previous century, which is a long time ago of course.

This blog message is a literate Haskell file, which can be interpreted with ghci.


The type of a function for finding palindromes


I want to find the longest palindrome substring in a string. The first attempt at a type for a function longestPalindrome is hence:

longestPalindrome :: String -> String

To determine whether or not a string is a palindrome, I have to compare characters at different positions in a list. It would be helpful if I had random access into the string. For that purpose, I'm going to change the input type of the longestPalindrome function to an array. Furthermore, the longestPalindrome algorithm can also be used to find the longest palindrome in a list of integers, or any other type on which I can define an equality function. So here is the second attempt at a type for a function for finding the longest palindrome:

longestPalindrome :: Eq a => Array Int a -> Array Int a

To find the longest Palindrome in an array, I have to calculate the longest palindrome around each position of the array, where a position in an array is either on an element, or before or after an element. For example, the array corresponding to [1,2,3] has 7 positions: before the 1, on the 1, before the 2, ..., until after the 3. So I want to express the function longestPalindrome in terms of a function longestPalindromes of type

longestPalindromes :: Eq a => Array Int a -> [Int]

where the result list is the list of the lengths of the longest palindrome around each position in the argument array. For example,

?longestPalindromes (arrayList (0,2) [1,2,2])
[0,1,0,1,2,1,0]

I will omit the definition of longestPalindrome in terms of longestpalindromes, and define function longestPalindromes below.


A naive algorithm for finding palindromes


A naive algorithm for finding palindromes calculates all positions of an input array, and for each of those positions, calculates the length of the longest palindrome around that position.


> module Palindromes where
>
> import Data.Array
>
> longestPalindromesQ :: Eq a => Array Int a -> [Int]
> longestPalindromesQ a =
> let (afirst,alast) = bounds a
> positions = [0 .. 2*(alast-afirst+1)]
> in map (lengthPalindromeAround a) positions


Function lengthPalindromeAround takes an array and a position, and calculates the length of the longest palindrome around that position.


> lengthPalindromeAround :: Eq a => Array Int a
> -> Int
> -> Int
> lengthPalindromeAround a position
> | even position =
> extendPalindromeAround (afirst+pos-1)
> (afirst+pos)
> | odd position =
> extendPalindromeAround (afirst+pos-1)
> (afirst+pos+1)
> where pos = div position 2
> (afirst,alast) = bounds a
> extendPalindromeAround start end =
> if start < 0
> || end > alast-afirst
> || a!start /= a!end
> then end-start-1
> else extendPalindromeAround (start-1)
> (end+1)


For each position, this function may take an amount of steps linear in the length of the array, so this is a quadratic-time algorithm.


A linear-time algorithm for finding palindromes


I now describe a linear-time algorithm for finding palindromes. Although the program is only about 15 lines long, it is rather intricate. I guess that you need to experiment a bit with to find out how and why it works.

The algorithm processes the input array from left to right. It maintains the length of the current longest tail palindrome, and a list of lengths of longest palindromes around positions before the center of the current longest tail palindrome, in reverse order. Function longestPalindromes is expressed in terms of function extendTail.


> longestPalindromes :: Eq a => Array Int a -> [Int]
> longestPalindromes a =
> let (afirst,alast) = bounds a
> in extendTail a afirst 0 []


Function extendTail takes an array as argument, the current position in the array, the length of the current longest tail, and a list of lengths of longest palindromes around positions before the center of the current longest tail palindrome, in reverse order. There are four cases to be considered in function extendTail. If the current position is after the end of the array, we only have to add the final palindromes in the array to the list of longest palindromes, for which we use the function finalCentres. If the current longest tail palindrome extends to the start of the array, we extend the list of lengths of longest palindromes by means of function extendCentres. If the element at the current position in the array equals the element before the longest tail palindrome we extend the longest tail palindrome. If these elements are not equal, we extend the list of length of longest palindromes.


> extendTail a n currentTail centres
> | n > alast =
> -- reached the end of the array
> finalCentres currentTail centres
> (currentTail:centres)
> | n-currentTail == afirst =
> -- the current longest tail palindrome
> -- extends to the start of the array
> extendCentres a n (currentTail:centres)
> centres currentTail
> | a!n == a!(n-currentTail-1) =
> -- the current longest tail palindrome
> -- can be extended
> extendTail a (n+1) (currentTail+2) centres
> | otherwise =
> -- the current longest tail palindrome
> -- cannot be extended
> extendCentres a n (currentTail:centres)
> centres currentTail
> where (afirst,alast) = bounds a


Function extendCentres adds palindromes to the list of lengths of longest palindromes. It takes the array as argument, the current position in the array, the list of palindromes to be extended, and the list of palindromes around centres before the centre of the current longest palindrome tail. It uses the mirror property of palindromes to calculate longest palindromes around centres after the centre of the current longest palindrome tail. If the last centre is on the last element, we call extendTail with a longest tail palindrome of length 1, and the position moved tot he right. If the previous element in the centre list reaches exactly to the end of the last tail palindrome, use the mirror property of palindromes to find the longest tail palindrome. In the other cases, we've found the longest palindrome around a centre, and add that to the list of length of longest palindromes. We proceed by moving the centres one position, and calling extendCentres again.


> extendCentres a n centres tcentres centreDistance
> | centreDistance == 0 =
> -- the last centre is on the last element:
> -- try to extend the tail of length 1
> extendTail a (n+1) 1 centres
> | centreDistance-1 == head tcentres =
> -- the previous element in the centre list
> -- reaches exactly to the end of the last
> -- tail palindrome use the mirror property
> -- of palindromes to find the longest tail
> -- palindrome
> extendTail a n (head tcentres) centres
> | otherwise =
> -- move the centres one step
> -- add the length of the longest palindrome
> -- to the centres
> extendCentres a n (min (head tcentres)
> (centreDistance-1):centres)
> (tail tcentres) (centreDistance-1)


Function finalCentres calculates the lengths of the longest palindromes around the centres that come after the centre of the longest tail palindrome of the array. These palindromes are obtained by using the mirror property of palindromes.


> finalCentres 0 tcentres centres = centres
> finalCentres (n+1) tcentres centres =
> finalCentres n
> (tail tcentres)
> (min (head tcentres) n:centres)


At each step in this algorithm, either the longest tail palindrome is extended, and the current position in the array is moved, or the list of lengths of longest palindromes is extended. Since both the array, and the list of lengths of longest palindromes are linear in the length of the array, this is a linear-time algorithm.

Saturday 4 August 2007

The Major Imp Episodes

Although the Fuun typically want little to do with them, they do suffer from an inferiority complex when it comes to Imps. This is most evident in the large amount of extremely popular spy stories in which the Fuun novice invariably wins against a hard-boiled, sinister Imp character, who either dies (after a heroic attempt by the hero to save his life) or is converted to Fuun philosophy and thereafter serves the functional cause. A few short excerpts of such a story were recovered from Endo's DNA. It is still unclear whether these stories could actually play a role in saving Endo. For your amusement, we have included the very first episode below:

Only just before sleep-groggy roosters reared their scrawny heads, there was a hesitant knock on the steel door of the austere office of semi-colonel Void. "ENTER", rang out from the room, as though spoken in a medium-sized cathedral. Novice Imp entered the cramped room with some trepidation, knowing full well that many a (not so intelligent) agent had become deallocated after nary a slip of the tongue. Lieutenant Ptr, for example, had referred in His presence to "some unwanted side-effects" of a particular investigation. Ptr had promised that it wouldn't happen again, and Void had made sure that indeed it wouldn't.

Somewhat weak at the knees, Imp made for the middle of the room, where he stood at attention. "RELAX", spoke Void, in his usual teletype voice that belied the contents. "I have heard that among our recruits, you are the one adhering closest our revered commandos. We have a sensitive and dangerous mission scheduled and you have been chosen to volunteer..." (call-cc)

Monday 30 July 2007

How did it go?

The ICFP Contest 2007 ended last Monday. We have seen writeups of several teams that participated in the ICFP Contest. Many of those are interesting and sometimes funny reading. We particularly enjoyed

http://stereotype441.livejournal.com/45150.html

Marco Galotta collected a list of writeups, and posted them on his blog

http://marco-za.blogspot.com/2007/07/icfp-how-we-reached-top-15.html

We don't want to give our writeup now, we'll do that for ICFP in Freiburg, but we do want to briefly share our experiences.

We had a very hectic and pleasant time. I cannot remember I've had such an intense time at work before.

Headquarters was never empty during the 72 hours of the contest. We had a laptop showing several screens via a beamer: last 20 submissions, standings, the icfp mail account, several irc channels devoted to the contest, etc. Not much happened in the middle of the contest, but both the start and end were very hectic. The end was a thriller, I can assure you. But more about that at the ICFP conference in Freiburg.

Everything went fine. We haven't discovered an error in the task description yet. Initially we were worried about the slow start by many teams, but as time passed we were increasingly impressed by the creativity and cleverness of many teams. We have excellent winners. Our machines were unreachable twice for a short period, but that hasn't been a problem. Web statistics show that we processed millions of hits in these three days. All in all we are quite satisfied by how things went.

From now on we will prepare our presentation at Freiburg. By then we will know whether or not Endo survived. In the presentation we will discuss the problem, possible solutions, techniques we used to create the problem, solutions we have seen from the participants, the winners, etc. We will hand out a writeup about the contest in Freiburg.

Monday 23 July 2007

Thank you

Thank you all so much for participating.

It has been heart-warming to see all participants trying so hard to save the poor Endo. It was a hard struggle, but we are very optimistic Endo will survive. We have sent the best submitted prefix to Arrow, but haven't heard anything since. We hope that this is due to the fact that Arrow is using all its remaining energy to apply the DNA prefix to Endo, and not because something has gone wrong. For now, we can do nothing but wait for a sign of life. Repairing DNA takes time, but we expect that we will know whether or not it has been successful by the time of the ICFP Conference in Freiburg.