Writing and running analysis in the middle of your paper with org-babel!

This document is up-to-date as of 2010-12-24 16:22:12

Table of Contents

1 TODO

Utilize some kind of macro to make sure the download links are correct

  • how the workshop is received and if there are any

common themes in the user feedback. – create a detailed feedback sheet

bring a recorder to record questions. announce that it will be recorded but deleted after transcription

what about the wiki? and stellar?

1.1 other stuff, check

run through this with L/W/M setup + language environments + exercises… exercises?

2 common concerns

why aren't more people using it?

  • learning curve (RTFM is not going to get you new users!)
  • no very good and very clear leader in hand-holding beginners

org-mode (and emacs, for that matter) isn't for everybody, …

but for those — blah blah selling point

carsten dominik gave a talk on org-mode at google. it was over 1 hr long. point is org mode is complex. org-babel talk can take very long too

3 more examples

check the org paper on executing multiple code blocks with single command? by chaining maybe?

3.1 our current work

3.1.1 nipype coding

running nipype pipeline – where computationally intensive things should be tangled

3.1.2 paper in progress

inserting subject summaries & figures aside: integration with gollum = org+git+wiki

3.2 using org-mode to learn a new programming language

3.2.1 (semi) interactive tutorial

ruby koans example

3.2.2 writing an interactive tutorial (or programming book)

show how Zed Shaw's "How To Write A LxTHW" is a perfect fit http://sheddingbikes.com/posts/1288945508.html

3.2.3 generating an interactive programming book a single command

lua tutorial

4 Abstract

This is a hands-on, practical workshop in getting started with literate programming (and beyond) with Emacs + org-mode + org-babel. We will touch briefly on the concepts of LP, organizing and manipulating information in org-mode, and dive right into using the uniquely powerful features provided by org-babel: writing documents in multiple programming languages and executing them without leaving your writing environment; outputting, processing, and formatting program output; publishing documents containing (or not) the source code, and the results and figures. This workshop is geared towards: students, researchers, programmers, and individuals who want a more streamlined workflow and more efficient information management methods, write programs with richer and more understandable documentation; anybody who is interested in Knuth's literate programming philosophy, and/or how to write "executable documents".

4.1 software versions used in this workshop (based on reference systems)

  • emacs (win/mac/lin: 23.2.1, athena: 22.2.1)
  • org-mode (7.4, all setups)

if you want to run versions as per the reference setup, use the VM

5 ideas from Eric + org-mode community

5.1 initial demo – demonstrate how the workflow is useful

multiple export – html + pdf; setup a "live update" machine with an iframe that keeps refreshing on an html page + pdf viewer to demonstrate export

5.2 the Emacs starter kit

If you search, there a couple versions. technomancy's is the original, but if you're specifically aiming to jump into org-mode and org-babel, you'll want to get eschulte's version.

To my knowledge, the main difference (and a big one), aside of org-mode, org-babel, auctex, and some other extra packages Eric added, is the maintenance process for the configuration (startup) files.

When using Eric's version, most of the .el files get generated from .org files, so most of your configrations would be contained in source blocks in the .org files. If you check out the emacs-starter-kit documentation, you'll also realize that it's an export of those same .org files!

5.3 hands-on examples?

After that I'd recommend a quick general Org-mode introduction (see Worg for resources) and then I'll recommend a couple of resources for code block specific examples.

Babel Paper by Dan, Carsten, Tom, Eric
https://github.com/eschulte/babel-dev/raw/master/paper/babel.org
scraps.org
https://github.com/eschulte/babel-dev/raw/master/scraps.org
???
Worg

5.4 after you install the starter kit

if you want to generate the exact same webpage you see on http://eschulte.github.com/emacs-starter-kit/, simply export it! (show export)

6 Intro (non-hands-on)

will not know answers to all questions, but make sure:

  1. record
  2. find out answer
  3. reply asker + reply to class
  4. post onto wiki

6.1 where is everybody from?

show of hands: who is a student, researcher, programmer, other? show of hands: who is a total beginner to emacs? to org-mode? to org-babel?

6.2 why should I care?

6.2.1 I'm a student/researcher, how does this help me?

  • example of student report with reference management and rapid movement of sections
  • example of paper-in-progress with figures, tables, analysis code
  • I'm a heavy LaTeX user, what's better about this?

    you can always fallback to LaTeX if you want – show example (o18?)

6.2.2 I'm a programmer

  • how is this different from verbose commenting?

    the flow of thought is facilitated differently

6.3 org-mode

do people care about history?

6.4 literate programming and org-babel

6.4.1 intro to literate programming philosophy

spend no more than 1 minute

6.4.2 unique powers of Babel

  • whirlwind demo of babel
    • show a raw completed file and export it nicely
  • hello world in 10 languages

    tie outputs from each block into the next

    1. C

      yes, this actually compiles and runs somehow syntax highlighting isn't working unless i use lowercase 'c', but compilation requires it to be uppercase

      #include <stdio.h>
      int main(void) { printf("hello world!\n"); return 0; }
      
    1. python
      print "hello world!"
      
    2. ruby
      puts "hello world!"
      
    3. emacs-lisp ?
      (format "hello world" )
      
    4. shell
      echo "hello world"
      
    5. perl
      print "hello world"
      
    6. R
      print("hello world")
      
    7. haskell haskell bug
      putStr "hello world"
      
    8. octave
      disp('hello world')
      
    9. lua (experimental)
      print "hello world"
      

7 Setting up

get the bit from osx……….

the current stable version of org-mode is 7.4

7.1 people who want minimal fuss: use these VM images, or use it on Athena

7.1.1 VirtualBox

7.1.2 VMWare

7.2 prepackaged archive containing useful files

7.2.1 starter kit

  • keybindings and quirks

7.2.2 non-starter-kit

make sure you have (org-src-fontify-natively t) in your .emacs else you won't get syntax highlighting in-buffer!

7.3 anybody a mindhive user? I'm not, but…

mindhive runs emacs 23.1.1 with org-mode 6.21b bundled with it. You'll need to install your own copy of org-mode 7.4, but otherwise it should be able to run:

  • python
  • perl
  • R
  • matlab
  • pdflatex
  • tcsh
  • bash

7.4 Athena

This is not recommended, but if you so wish, you are able to run org-mode + babel, even evaluate code within your emacs buffer and export directly to pdf, on an athena session. Here's how:

I assume if you like to ssh into athena, this will be straightforward for you.

  • If you ssh in to athena, you will be able to run emacs directly.
  • athena runs emacs 22.2.1, and has bundled org-mode 4.67c with it. The current stable version is 7.4 and we won't be talking about anything other than version 7.4 here.
  • you will want to run a newer org-mode:
    cd ~/.emacs.d
    wget http://orgmode.org/org-7.4.tar.gz
    tar xvzf org-7.4.tar.gz
    

    this is a small file; the download + unpack took me less than 1 minute.

  • edit your ~/.emacs file to contain this:
    (add-to-list 'load-path "~/.emacs.d/bundle/org-7.4/lisp")
    (add-to-list 'load-path "~/.emacs.d/bundle/org-7.4/contrib/lisp")
    (require 'org-install)
    
    (org-babel-do-load-languages
     'org-babel-load-languages
     '((R . t)
       (python . t)
       (emacs-lisp . t)
       (ruby . t)
       (haskell . t)
       (sh . t)))
    

    there may be more supported languages, but these are the only ones that I have tested that work with zero extra configuration, directly from athena

  • if you ssh with the -X option, you will even be able to run the pdf+display export option

7.4.1 what works on athena

python
2.6.2
ruby
1.8.7
perl
5.10.0
haskell (ghci)
6.8.2
R
2.8.1
latex (pdfTeX)
3.141592-1.40.3-2.2
???
tcsh
???
bash

7.5 linux (tested on ubuntu 10.10)

  • packages texlive texlive-extra

7.6 osx (tested on 10.6.2 and 10.5.8 to a fair degree)

Strongly recommended to use MacPorts to install the various programs, although since MacPorts likes to compile everything from source code, setup can take several hours. If you are coming with a pristine (clean-slate) machine without your development environment set up, consider just using the VM. Otherwise:

7.6.1 to install macports

7.6.2 macports users

packages you'll need:

the command to install all of them:

7.6.3 non-macports users

7.6.4 Carbon Emacs

7.7 windows

  • there are some nice instructions here (make sure you grab the latest version!)
  • get the emacs 23.2 binary for windows (ftp-w32)

7.7.1 XP

7.7.2 Vista

7.7.3 W7

7.7.4 git clone from repo.or.cz is hideously slow

takes like 20 minutes to download or something consider making a snapshot package

7.7.5 make on windows

make will create org-install, which is responsible for loading the stuff in init.el doesn't work on windows. workaround?

7.7.6 procedures to force org-babel-starter to run without extra effort

toggle org-mode M-x org-mode then toggle back, and rerun the last line of lisp

it throws error, due to flyspell

7.7.7 fixing flyspell

ref: http://stackoverflow.com/questions/3805647/enabling-flyspell-mode-on-emacs-w32 ref: http://book.chinaunix.net/special/ebook/oreilly/LearningGnuEmacs/0596006489/gnu3-CHP-13-SECT-3.html

get ispell.zip from http://examples.oreilly.com/9780596006488/

unzip ispell.exe into emacs-xyz/bin unzip english.hash into ~ copy english.hash to american.hash – verify this step is necessary?

restart emacs, will throw error upon eval the starter.org part (last elisp)

quit backtrace and rerun, works somehow

7.8 installing language support

7.8.1 LaTeX with texlive

7.8.3 ruby

7.8.4 python

7.8.5 perl

7.8.6 graphviz

7.8.7 matlab? or octave

7.9 post-installation

  • emacs version check

7.9.1 dot files

make sure pdflatex works!

  • new to emacs
    • starter kit setup instructions
    • undo-tree visualizer?
  • already emacs user
    • gotchas like setenv/getenv, exec-path
  • other tweaks
    • iimage-mode
      • demonstrate iimage-mode showing and hiding images within org doc
      • better iimage-mode regex, provide in dotfile
      • gotcha with image path for LaTeX output
    • yasnippet?
      • provide simple way of enhancing yasnippet

7.9.2 common keybindings?

like M-up M-down?

7.9.3 adding babel language support

  • babel languages
    • what el files needed? ruby-inf etc.

8 quick orgmode rundown

If you are not familiar with org-mode, you can just think about it as a plugin for emacs that gives general purpose outlining functionality. This will make your files centered around hierarchical headlines and entries. What this means for us is that now when we are writing, for example, a research report, we will put the relevant analysis code under the relevant section; we will be thinking about how and where the code fits within the flow of the report.

Here's an example of a working outline of a paper explaining the remarkable properties of stars.

8.1 remarkable properties of stars

stars are remarkable

8.1.1 why stars are remarkable

9 evaluating code blocks within a single buffer, in multiple languages

the fast way to becoming a polyglot

9.1 how this is useful: write code that writes my document for me

9.1.1 emacs lisp… rather stupid example

(dotimes (counter 10) (insert (format "trial %s: blah\n" counter)))

9.1.2 use what you are familiar with

  • shell script
    for i in {1..10}; do echo image-`printf %03d "$i"`.png; done
    
  • haskell – there seesm to be a bug in haskell output – last line does not get printed, but it does get evaluated
    import System.Process
    show (take 10 [1..])
    runCommand "echo hi there | espeak"
    
  • ruby, "pagination mockup" – probably not

    demo the exported version of this after running

    use case = programming blog?

    puts " < [[prev]] | [[next]] >"
    puts "=" * 20
    20.times do puts "#{(10+(rand 89))} hits | [[" + (0..1+(rand 2)).collect{('a'..'z').to_a.shuffle[0..4+(rand 5)].join}.join(" ") + "]]" end
    puts "=" * 20
    puts " < [[prev]] | [[next]] >"
    
  • clojure – probably not

    newest slime-clojure doesn't play well with this

  • something that reads twitter… ???? probably not
  • more relevant example: subject stats with python
  • other example – submission to satra, compare benchmark of loop vs. regex
    import time, sys, re
    
    ls_test = (" i am illegal file name with spaces",
            "zxocvijOZJVPOIJDFPOJSDOFIJ89u40958qu3405982345zlvjlzj.......oxzijc",
            "54984to8vz9x(*&()*@&%)(*&#$)(*@UC*V^(X d98)(&)(////",
            "asdf/asdf/zije/rta/e46/4567<F5>/4t/hx/rtu0485",
            "!@#$%^&*())`-=",
            )
    def get_valid_pathstr(pathstr):
        for symbol in [' ','[',']','(',')','{','}','?',':','<','>','#','!','|','"',';']:
            pathstr = pathstr.replace(symbol, '')
        return pathstr
    
    def get_valid_pathstr_re(pathstr):
        return re.sub(r'''[] (){}?:<>#!|"';]''', '', pathstr)
    
    # test methods are equal
    for test in ls_test:
        o, n = get_valid_pathstr(test), get_valid_pathstr_re(test)
        if o != n:
            print "OLD GIVES:", o
            print "NEW GIVES:", n
    
    NROUND = 10**5
    STARTPATH = "LOREMIPSUMSITDOLORAMET"
    
    def test(func, count):
        t0 = time.time()
        for i in xrange(count):
            func(STARTPATH)
        print time.time()-t0
    
    for USE_REGEXP in False, True:
        if USE_REGEXP:
            print "use regex"
            test(get_valid_pathstr_re, NROUND)
        else:
            print "use old"
            test(get_valid_pathstr, NROUND)
    
    

9.1.3 what about something iterative/interactive?

  • using octave

    If you use a single, one-off source block, babel will usually just run the script, grab the output, quit the script, and output the result according to the :results parameters. In other words, you won't be able to use figures:

    figure
    disp('do you see my figure?')
    

    To show figures, you'll need a persistent session running as an "inferior process". Use the :session header to start one:

    figure
    disp('do you see my figure?')
    x = linspace(0, pi, 20);
    y = sin(x);
    
    

    the results don't get output like before though

    Since it's an actual process running within emacs, you can jump to that "session" and use it like your normal inferior process. To do this, go to a code block that has the session name set to the one you want:

    disp('I am looking for my octave session')
    disp('if you run M-x org-babel-pop-to-session here -- or do C-x C-e at the closing parenthesis here: (org-babel-pop-to-session), you''ll find your session!')
    plot(x, y)
    

    You'll be able to run quit from the octave session and quit it. If you re-execute the octave block again, it will restart a new session for you (obviously, you'll lose any of the values you've set from the previous execution)

  • using R
    x <- seq(0, pi, length.out=20)
    y <- sin(x)
    
    
    plot(y ~ x)
    
  • R "visual area" plots

    turns out do.call doesn't play nice with 40k-row dataset? plot blows up for x11

    use sessions instead

9.2 passing evaluation results to other code blocks

10 tangling files

10.1 single block tangle

10.2 mutliple files tangle

10.3 multiple blocks into single file

11 publishing

11.1 LaTeX headers

  • image captions
  • overriding defaults

11.2 publishing styles

12 advanced techniques

  • export option template
  • other export header options
  • post evaluation hooks to format your output
    • worth working on multilingual hook?
  • yasnippets
    • overwrite default src snippet?
  • org-specific: export to beamer

12.1 how to run code across multiple blocks in the same buffer, at once?

12.1.1 example

x = 1
print x

how to get 1?

  • see example from paper or worg. pascal triangle is probably best

    aside of the :noweb directive I don't know if that is possible

    problem with :noweb is that if you keep using noweb includes you might be including more code into a block than you actually want in the tangled output

    the workaround if you really want to eval multiple blocks, in my knowledge, is to use :session. there's the added benefit that it's async (I think so). but side effect code is dangerous

12.2 how to convert my existing LaTeX to org?

12.2.1 ikmeans example

  • attempt 2, put all LaTeX into a single latex block

    of course it doesn't all come out right, because this command simply inserts the origin LaTeX document's contents into the file, while org-mode has its own default document template. To fix this, I'm going to do is insert the original contents myself and reappropriate the LaTeX headers for org-mode's export.

    ... (insert entire file contents)
    

    this asks whether we want to eval, sure – that generates the LaTeX that is used to export

  • attempt 3, changing LaTeX directives into org-mode headers

    it turns out that the default LaTeX directives that org-mode uses doesn't play nice with the original, so we override them by changing the title, adding some headers, and a bit of elisp that adds the headers the way we want:

    #+TITLE: Notes on Exploratory Data Analysis
    #+AUTHOR: Arnaldo E. Pereira
    #+LATEX_CLASS: org-article
    #+LaTeX_CLASS_OPTIONS: []
    #+LaTeX_HEADER: \usepackage{amssymb}
    #+LaTeX_HEADER: \usepackage{amsmath}
    #+LaTeX_HEADER: \usepackage{fullpage}
    #+LaTeX_HEADER: \usepackage{graphicx}
         
    
    \newcommand{\kmeans}{$K$-means }
    \newcommand{\figurebox}[1]{\begin{center}\fbox{\includegraphics[height=2.8in]{#1}}\end{center}}
    ...
    
    

    my export program doesn't like the eps files I'm linking to… maybe change these but if not, resort to:

    texi2dvi arnaldo-ikmeans-converted.tex
    dvipdf arnaldo-ikmeans-converted.dvi
    
    

    and it looks about right!

  • attempt 4 and so on…

    convert the latex to org, if you so wish

12.3 how to convert my code to org?

This is an example taken from the nipype fsl tutorial… and we change it to org code.

12.4 tangle-based dev cycle? is this a good idea at all?

the bad is before you execute, you need to tangle

the good is you're always in a "prose friendly" environment that allows you to quickly switch between report sections / documentation and code.

12.4.1 creating hooks to auto-execute

don't have a beautiful solution yet

13 limitations & issues

  • debugging
  • sudo commands from source block
    • ugly workaround: create a session and execute
  • stderr output capture
  • text indentation following a code block is messed up (it indents to match the line within the code block, treating the code line like a text line
  • sometimes tangling causes a :PROPERTIES drawer to appear. reverting does not solve the problem. i have resorted to restarting emacs to stop it. not sure what causes it but today i got it to appear, seemingly after runnin a C-c ' on a non-src-block region, then tangle

13.1 other strange behavior and or bugs

  • not-folding correctly –> revert
  • insert PROPERTIES after tangle, no idea how/why this happens –> revert
  • multiple tangle indentation issue – solved in email list but doesn't look like integrated in upstream
  • clear undo tree when start org-babel buffer or if you press undo you can clear the screen?

14 specific use-cases and questions. how do I…? etc.

how do i word-wrap?
M-x visual-line-mode
???
I want to include certain lines from a different file into my org file. how do I do that?
???
how large a file can org handle? – syntax highlighting does get slower when the file is large
???
how to denote separate blocks and related blocks across several sections? i.e.
  • section 1.
    • some text
    • code bit 1
    • some text
    • code bit 2
    • some text
    • code bit continuation of 2

this is easy to author, but what about execution? is session the way to go?

different-size headers?
M-x customize-group org-faces, or:
(setq org-level-1 ((t (:inherit outline-1 :weight bold :height 1.6 :family "Verdana"))))
(setq org-level-2 ((t (:inherit outline-2 :height 1.5 :family "Verdana"))))

etc.

14.1 key rebindings

(for relative newcomers) do not hesitate to rebind keys. depending on your workflow, the defaults may be suboptimal. doesn't seem like it at first, but 2-keystroke shortcuts can end up feeling too slow!

15 final notes, links and resources

for new explorers ready to take the plunge into org-mode: organize your life in plain text

http://orgmode.org/worg/org-contrib/babel/uses.php

16 presentation schedule control

automated presentation flow controller

16.1 <schedule controller>

16.1.1 <start>

(org-open-link-from-string "[[Abstract]]")

starting without specified time uses (now)

16.1.2 (+2) using relative time!!!

(org-open-link-from-string "[[Intro]]")

16.1.3 (4) used to be absolute – if use without plus

(org-open-link-from-string "[[Setup Procedure]]")

16.2 control code

  (defun my-org-run-presentation-schedule (&optional next-headline sec0)
    (interactive)

    (if (not next-headline) ;; start case, call with no arguments
        (progn (org-open-link-from-string "[[<schedule controller>]]")
               (show-subtree)
               (my-org-run-presentation-schedule "<start>" (second (current-time))))

      (let ((time-now (current-time)))

        ;;(org-overview)
        (goto-char (org-find-exact-headline-in-buffer next-headline))
        (show-subtree)

        (end-of-line)
        (open-line 1)
        (next-line)
        (insert (format "# -- -- -- -- -- -- -- -- -- -- section [[%s]] started at [%s]\n" next-headline (format-time-string "%Y-%m-%d %H:%M:%S" time-now)))

        (let (;; determine whether there is another match
              (next-section-loc (save-excursion (search-forward-regexp "^\\*+ \\((\\(\\+?\\)\\(.*\\)) .*\\)" nil t))))

          ;; there is a time-specification headline after this one;
          ;; parse it for schedule and queue next run
          (when next-section-loc
            (let* ((next-section-headline (match-string-no-properties 1))
                   (use-relative (> (length (match-string-no-properties 2)) 0))
                   (spl-rev-time (reverse (map 'list 'string-to-number (split-string (match-string-no-properties 3) ":"))))
                   (in-sec (or (first spl-rev-time) 0))
                   (in-min (or (second spl-rev-time) 0))
                   (in-hour (third spl-rev-time))
                   (str-run-time (if use-relative
                                     (format "%s min %s sec %s hour" in-min in-sec (or in-hour 0))
                                   ;; don't absolute next event schedule to exceed 65536 sec so just
                                   ;; calculate offset seconds using low-value from (current-time)
                                   (format "%s sec"
                                           (- (+ sec0 (* 60 in-min) in-sec) (second time-now))))))

              ;;(insert (format "(run-at-time %s = str-run-time nil 'my-org-run-presentation-schedule)" str-run-time))
              ;;(insert "NEXT FOUND: " next-section-headline)
              (run-at-time str-run-time nil 'my-org-run-presentation-schedule next-section-headline sec0)
              ))

          ;; find and run code blocks
          (let ((next-src-block-end (save-excursion
                                      (re-search-forward org-babel-src-block-regexp nil t)))
                current-ob-evaluate-confirm org-confirm-babel-evaluate)
            (message (format "%s -- %s" next-src-block-end next-section-loc))
            (when (< next-src-block-end (or next-section-loc (buffer-size)))
              (goto-char (match-beginning 0))
              ;;(org-overview)
              ;;(org-show-subtree)
              ;;(org-show-context)
              (beginning-of-line)
              (setq org-confirm-babel-evaluate nil)
              (org-babel-execute-src-block)
              (setq org-confirm-babel-evaluate current-ob-evaluate-confirm)

              ;;(goto-char (+ 1 next-src-block-end))
              )
            )

          (when (not next-section-loc)
              ;; done -- no more sections with schedule format
            (save-excursion
              (goto-char (org-find-exact-headline-in-buffer next-headline))
              (show-subtree)
p              (next-line)
              (end-of-line)
              (open-line 1)
              (next-line)
              (insert (format "# -- -- -- -- -- -- -- -- -- -- presentation ended at [%s]\n" (format-time-string "%Y-%m-%d %H:%M:%S" (current-time)))))
            )))))

Author: natto

Date: 2010-10-21 Thu

HTML generated by org-mode 7.4 in emacs 22