Home Blog Contact
fil RSSRSS Flux

Recent entries

Archives

Multimedia processing with Scheme scripts

Recording TV is easy. Take a USB TV Stick and start recording your favorite show, movie or cartoon. The next step is to process it, so that you keep only the most interesting part. There are plenty of programs to process your recordings. Until recently everything was smooth. But I had a problem lately where the audio and video was not in synch after processing.

After some research it happens that I need to first pass the recorded file through projectx. It is an application which repair DVB recordings and adds timestamps where needed. After that I can use my usual program for the final transformation. In fact projectx takes a DVB recording and generates two files: one constaining the video and another for the sound.

Recomposing a single MPEG file is possible with the FFMPEG program. In the end follow these two steps and you are done:

  • Step 1: run projectx your-dvb-recording
  • Step 2: run ffmpeg -i your-dvb-recording.m2v -i your-dvb-recording.mp2 -vcodec copy -acodec copy final-dvb-recording.mpg

Now that the process is clear, you can write a nice script to automate it. Writing one in Scheme is easy. Here is one solution.

#! /usr/bin/env scmscript

(define (resynch-file file-name)
  (let ((m2v (replace-extension file-name ".m2v"))
        (mp2 (replace-extension file-name ".mp2"))
        (final (string-append (file-name-sans-extension file-name)
                              "-synched.mpg")))
    (run (projectx ,file-name))
    (run (ffmpeg -i ,m2v -i ,mp2 -acodec copy -vcodec copy ,final))))

(define (main args)
  (for-each (lambda (fn)
              (display "converting ") (display fn) (newline)
              (resynch-file fn))
            args))

MapReduce in Scheme

MapReduce is a useful algorithm to analyze massive amount of data. It is powerful because it dispatches jobs in parallel to multiple machines. When you have a cluster of server machine, you can put it to good use.

MapReduce was originally described by Google, but they never released their implementation. Later the Apache Software Foundation made available a concrete implementation named Hadoop. Part of Hadoop, HFS (Hadoop File System) is used to distribute data files across your cluster of machines.

The core of the algorithm is two function named map and reduce. The first takes one parameter and returns a pair containing a key and a value. The parameter could be any input value: it is usually a file. If you have many input files, you can run the map function in parallel on all of them. The result would be a list of key and value pairs.

The next step is to run the reduce operation. All the key/value pairs are grouped by their key, so that one reduce operation receives as input one key with all the values associated with it. The result of the MapReduce system is the aggregation of all the reduce jobs.

Here we will implement a simple MapReduce system in Scheme. It will not be distributed and made to run on a cluster of machines, instead it will be completely sequential and run on a single machine. The purpose here is to allow an easy experimentation of MapReduce algorithm. The main entry point of the MapReduce system is the mapreduce procedure. It takes the two map and reduce functions to use together with the input and returns the aggregated result. Here is its definition:

(define (fold-right init lst proc)
  (cond ((null? lst) init)
	((pair? lst)
	 (proc (car lst) (fold-right init (cdr lst) proc)))))

(define (add-entry! keys k+v)
  (let* ((entry (assq (car k+v) keys)))
    (if entry
	(begin 
	  (set-cdr! entry (cons (cdr k+v) (cdr entry)))
	  keys)
	(cons (cons (car k+v) (list (cdr k+v))) keys))))
    
(define (mapper map)
  (lambda (data keys+vals)
    (let ((key+val (map data)))
      (add-entry! keys+vals key+val))))

(define (mapreduce map* reduce* data)
  (map reduce* (fold-right '() data (mapper map*))))

That's it for the implementation. What can you compute with it ? The standard example given is word count for counting the occurence of words in documents. Instead here we will compute the intersection of sets. The map function produces the pair e,e of key value for all elements e of the sets to intersect. The reduce function produces the element e only if two elements are associated with the key. If two elements are associated with the key, it means one element comes from the first set and the second element from the second set. Otherwise nothing is produced as a result.

Here is the Scheme implementation:

  (define (intersect-map e) (cons e e))
  (define (intersect-reduce k vs)
    (let ((v (cdr k+v)))
      (if (= (length v) 2)
          (car v)
          #f)))

Let see if it works

  (mapreduce intersect-map intersect-reduce (append '(1 2 3 4 5) '( 3 4 5 6 7 8)))
  =>
  '(3 4 5)

Try to be creative now !

Mocking for your Scheme Unit Test

Unit Testing is useful for testing your code. At the heart of a unit test are your assetions that check your code. Here is an example.


(define-unit-test check

  (assert (string=? (string-upcase "Hello world") "HELLO WORLD")))

The assert macro performs the test and reports any problems encountered. This kind of tests goes a long way and allow you to have a nice test suite for your code.

Now let say that you have a web application. For the login process you also have a password reset procedure that resets the password and sends a mail containing the new password to the user. Here is how such a procedure could be implemented.


(define (reset-password-for-user user-name)

  (with-db *db*

    (lambda ()

      (let ((user (find-user-by-name user-name))

            (new-password (random-password)))

        (set-user-password! user new-password)

        (send-mail (user-email user) (string-append "Your new password is:" new-password))))))

You want your unit test to verify that the mail is really sent when you reset a password. What we are going to do is to mock the send-mail procedure. Mocking a procedure consists in modifying the original procedure such that calling it is recorded in the system. Later your unit test will check that a record exists.

So let define the mocking framework.


(define (mock procedure)

  (lambda args

    (record-procedure-call! procedure args)

    (apply procedure args)))



(define-syntax mock!

  (syntax-rules ()

    ((mock! procedure-name)

     (set! procedure-name (mock procedure-name)))))

To mock a procedure you call (mock! <your-procedure>). The mock macro registers a procedure to be mocked.

The second step for your unit test is the check definition for ensuring that the code behave correctly. Let's define some check procedures.


(define (assert-that-procedure-was-called-n-times procedure n)

  (= (length (procedure-called procedure)) n))

Finally you can define your unit test in the following way.


(define-unit-test mock

  "Check that a mail is sent to the user when we reset the password for a user"

  (mock! send-mail)

  (reset-password-for-user "pierre")

  (assert-that-procedure-was-called-n-times send-mail 1))

A good mock framework provides many more kinds of assertion related to procedure calling. For example instead of simply making sure that the procedure send-mail is called, we could ensure that the parameters are right. Your imagination is the limit here.

Unit testing with Scheme part 2

Implementing a unit testing framework is fun to implement in Scheme. With a 100 lines of Scheme code, you are basically up and running. See my previous blog about unit testing in Scheme

One of the improvement is to provide access to the currently running test case. As naive this feature seems to be, it brings some interesting implementation discussion. So how do you go about implementing this ?

One simple solution is to have the currently running unit test in a global variable. The code looks like this:

(define *unit-test* #f)

(define (the-unit-test) *unit-test*)

(define (run-all-test!)
  (for-each (lambda (t) (set! *unit-test* t) ((unit-test-thunk t)))
            *unit-tests*))

In the running unit test, you simply call (the-unit-test). All fun, but what's wrong with this solution ? We have a global variable ! Everybody knows that global state is bad and you are better without it.

Now that you have multi-core processors, you may be tempted to run your unit tests suite in parallel. But how do you go about it if you have a global variable ? You can't.

So the standard solution is to pass around the running unit test instance around. The implementation goes like this:

(define (run-all-tests!)
  (for-each (lambda (t) ((unit-test-thunk t) t)) *unit-tests*))

(define-syntax define-unit-test
  (syntax-rules ()
    ((define-unit-test ?name ?doc ?body)
     (add-unit-test! (make-unit-test '?name ?doc ?body)))))

(define-unit-test test-string=?
  "Tests the string=?"
  (lambda (unit-test)
    (assert-true! unit-test (string=? "Hello" "Hello"))))

Some comments are required in order to fully grasp this design. The first change is that the running test is passed as the first parameter to the code performing the test cases.

This leads to the second change: the definition of the define-unit-test macro. The code for the unit test is now a lambda expression in itself. This is necessary because syntax-rules only supports hygienic macro definition. Having the name of the unit test variable defined implicitly requires a non-hygienic macro. Thus you should use a syntax-case macro facility or the explicit renaming macros.

The last change is that now the assert-xxx procedure takes an additional parameter (the currently running unit test). This makes it possible to print the currently running unit test name when some assertion fails !

Now you have a nice implementation which makes running more than one unit test concurrently possible. So you may take advantage of the multiple cores of your processor.

Let say that you want some statistics about your unit tests, like how many assertions have been done, the number of unit tests run and the ratio of failed/passed unit tests. If you think about it, you should recognize the statistic collector as another global object that should be accessible when running your unit tests.

You use a global variable for this. But as we have seen, this is not a good solution. You add another parameter to the running unit test and you have 2 parameters to pass around. To simplify the implementation, you may define a combo datastructure containing the running unit test and the collected statistics.

This principle is fine, but scales poorly. Maybe in the future, you want to provide another kind of information within the running tests. So you add another parameter or another entry in your combo datastructure. Finally you have thousands of parameters/entries in your solution. Isn't there a more simple and scalable solution ? In a following blog entry, you will know about a unique feature (almost) of the lisp family of language: dynamic varables.

Tracing facility for Scheme

Trace support is a nice debugging tool. When you trace a function, you make it so that each time the function is called you print out the function name and the value of its arguments. When the function finally returns, the return value is also printed out.

A scrennshot showing the result of tracing a function appears at the bottom.

Implementing a basic trace facility is quite simple and is also a nice example of higher-order programming. The tracify procedure is the heart of the system. It takes a procedure as a parameter (the procedure to trace) and returns a new procedure performing the tracing.

(define (tracify proc)
  (lambda args
    (display "Entering with arguments:") (write args) (newline)
    (let ((result (apply proc args)))
      (display "Leaving => ") (write result) (newline)
      result)))

Now that you can transform a procedure into one which does tracing, you need a command/procedure to enable tracing. It turns out, you can't do that with a procedure. Think about it. What is the argument ? a procedure, but then you need to store back the traced procedure. If you pass it a symbol for the procedure's name, how do you grab the procedure associated with the symbol ?

The easy solution is to write a macro for that.

(define-syntax trace
  (syntax-rules ()
    ((trace ?symbol) (set! ?symbol (tracify ?symbol)))))

You have now a nice tracing facility. What is missing ?

  • you may want to support procedure which returns multiple values.
  • you may want to indent the recursive calls as show in the screenshot above.
  • you may want to untrace a procedure. That is replacing the procedure with the original one.
  • you may want to list all the currently traced procedures.
tracing screenshot