Using Cursors with Postgres and clojure.java.jdbc

This post will show how to use cursors with clojure.java.jdbc and Postgres. Cursors allows SQL clients to traverse through large records. JDBC Drivers use this feature to send batches of records to the client.

This article assumes familiarity with clojure.java.jdbc.

Setting up cursors with clojure.java.jdbc with Postgres is a little tricky. The Postgres JDBC docs show you how to do it with PreparedStatements. I prefer to use clojure.java.jdbc’s API where possible.

On the surface, use of simple queries and cursors are indistinguisable. The only way to know if we have performance benefits is to measure! TL;DR cursors respond faster than simple queries for large queries.

This topic builds upon a blog post from 2010. ¹

Experiment Setup

The experiment will read ~1,500,000 record from a table. We examine three metrics:

Time to first record from the database
Time to 10000th records
Duration to read all records

(1) and (2) is a measure of responsiveness. The faster your application receives records, the sooner you can process records. (3) is a measure of throughput.

We measure two functions. without-cursors uses JDBC’s simple queries. with-cursor uses cursors for queries. This is database-engine specific. See Postgres JDBC Docs. Play close attention to use of transaction, :fetch-size and :result-type to enable cursors.

(defn without-cursor [db sqlvec f]
  (jdbc/query db sqlvec {:row-fn f}))

(defn with-cursor [db sqlvec f]
  (jdbc/with-db-transaction [conn db]
    (jdbc/query conn sqlvec {:row-fn f :fetch-size 10000 :result-type :forward-only})))
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                              These options are important

We measure with a helper function by recording elapsed time to stdout. Measurements are executed in a Clojure REPL with several warmup runs. The final run is recorded in the Results table.

(defn new-measurement
  []
  (let [state (atom 0)
        start (System/currentTimeMillis)]
    (fn [x]
      (let [count (swap! state inc)]
        (when (or (= 1 count) (zero? (mod count 10000)))
          (println (- (System/currentTimeMillis) start) " at records: " count))))))

(without-cursor db ["SELECT * FROM events"] (new-measurement))
(with-cursor    db ["SELECT * FROM events"] (new-measurement) 10)
(with-cursor    db ["SELECT * FROM events"] (new-measurement) 100)
(with-cursor    db ["SELECT * FROM events"] (new-measurement) 1000)
(with-cursor    db ["SELECT * FROM events"] (new-measurement) 10000)
(with-cursor    db ["SELECT * FROM events"] (new-measurement) 100000)

This is an example of the output:

at records:  1
at records:  10000
at records:  20000
...
at records:  1570000
at records:  1571000
at records:  1572000

Results

Fetch Size	Time to first record (ms)	Time to 10000 record (ms)	Total duration (ms)
without-cursor	3041	3105	13656
10	2	359	58005
100	2	124	19029
1000	13	93	13735
10000	24	81	13617
100000	186	192	13066

Without cursors, we wait 3 full seconds before seeing any prints. With cursors, the time is on the order of milliseconds. This indicates that cursors are working! The total duration is consistent for fetch_size > 1000. From this measurement, fetch size: 1000 has the best of responsiveness and throughput.

Conclusion

Cursors are awesome! You should use if you can meet the constraints of the DB engine. Your applications can be more responsive with cursors. Also, take the measurements here with a grain of salt. You should measure whether your application benefits from Cursors.

Kyle Burton wrote this up with an old version of clojure.java.jdbc whose API has changed. http://asymmetrical-view.com/2010/10/14/clojure-and-large-result-sets.html ↩

rymndhng (repeatedly #(do (coffee) (think) (write)))

Experiment Setup

Results

Conclusion

Related Posts