Upload and parse CSV in ClojureScript

11-Jun-2016

Here's the scenario: Upload a CSV file from your computer, parse it, slice/dice/massage it into a form worth rendering, shove it in the DOM for your viewing pleasure and then print it. I had the need to do this recently for a friend. So here's the result.

First off, doing any data transformation in javascript isn't my idea of fun. I'd rather use immutable data structures and a well thought out standard lib. So ClojureScript it is.

Rendering the upload button

Let's start by defining some data driven components with reagent.

(defn upload-btn [file-name]
  [:span.upload-label
   [:label
    [:input.hidden-xs-up 
     {:type "file" :accept ".csv" :on-change put-upload}]
    [:i.fa.fa-upload.fa-lg]
    (or file-name "click here to upload and render csv...")]
   (when file-name 
     [:i.fa.fa-times {:on-click #(reset! app-state {})}])])

By hiding the input and wrapping it in a label we can customize the look of the button. Also note the component is passed the file-name which is used to indicate the current uploaded file.

Below is the root component (called app) which just dereferences and destructures the app-state then feeds the requisite parts to the sub components.

(defn app []
  (let [{:keys [file-name data] :as state} @app-state]
    [:div.app
     [flyout state]
     [:div.topbar.hidden-print 
      [upload-btn file-name]]
     [report data]]))

(r/render-component [app] (js/document.getElementById "app"))

So far so good. The report component renders the data. The flyout component is a dev-only component which just pretty prints the current state on a full-screen overlay (toggled with cmd+shift+s. more on this in a future post). But what happens when put-upload is called? How is data produced?

Handling upload events

The browser API docs and examples show that hidden away in the event object (triggered on upload) is a list of files. From there a FileReader object can be used to read from the file. The content is provided to the onload callback of the FileReader.

To re-phrase all that: We need a callback to get the selected file and another callback to get the contents of that file. Rather than do the nested callback dance, which is all too familiar to most JS developers, let's use core.async!

(def first-file
  (map (fn [e]
         (let [target (.-currentTarget e)
               file (-> target .-files (aget 0))]
           (set! (.-value target) "")
           file))))

(def extract-result
  (map #(-> % .-target .-result csv/parse js->clj)))

(def upload-reqs (chan 1 first-file))
(def file-reads (chan 1 extract-result))

All the browser interop is handled by two transducers first-file and extract-result. Transducers capture the essence of computation independent of the input source and output destination.

  • first-file accepts input change events and gets the first selected file. Also note that we clear the target value. This allows re-uploading the same file.
  • extract-result accepts a FileReader onload event, gets the string contents, parses the CSV and converts the result to ClojureScript data structures.

When it comes to parsing CSV we're in luck! The csv/parse function comes straight from the built-in Google Closure library. You can find in the goog.labs.format.csv namespace.

Next we define two channels one for upload requests and the other for file read events. When defining the channels we supply the corresponding transducers to handle the browser interop. This means we can expect:

  • taking from upload-reqs will produce file values
  • taking from file-reads will produce CSV as ClojureScript data structures

All that's left is to wire up the channel logic so events flow through our channels.

(defn put-upload [e]
  (put! upload-reqs e))

(go-loop []
  (let [reader (js/FileReader.)
        file (<! upload-reqs)]
    (swap! app-state assoc :file-name (.-name file))
    (set! (.-onload reader) #(put! file-reads %))
    (.readAsText reader file)
    (recur)))

(go-loop []
  (swap! app-state assoc :data (<! file-reads))
  (recur))

We see put-upload is just a callback that puts to the upload-reqs channel. The first go-loop is responsible for

  • taking files from the upload-reqs channel
  • updating the file-name in the app-state
  • creating a FileReader whose onload event puts to the file-reads channel
  • starting the file reading process

In the second go-loop we just take the data structures off the file-reads channel and swap them into the state.

That's all folks

At this point all the hard stuff is done. The remaining implementation is all problem specific. We have the CSV data in hand so all that remains is to build the reagent component [report data] that renders it to the screen for viewing and printing.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.