iNZight, Surveys, and the IDI

Tom Elliott

Te Rourou Tātaritanga
Victoria University of Wellington

tomelliott.co.nz

Postdoc @ VUW @ UoA

  • MBIE Endeavour grant

    • Colin Simpson, Barry Milne, Andrew Sporle

    • Informatics for Social Services and Wellbeing …

    • more later!

  • Honorary position here (thanks James)

iNZight

iNZight main window

iNZight

  • lead developer since 2013/14

  • shifting focus as audience has evolved

iNZight

Before 2015

  • schools
  • some university

iNZight

2015–2019

  • education (school/university/MOOC)
  • unexpected places

Alberto Cairo

The Functional Art

http://www.thefunctionalart.com/

iNZight

Recently

  • democratisation

    See Chris Wild’s talks featuring hits like We Will Plot You

  • rapid research development

    for organisations/groups with low/no money/time/skill

Surveys and iNZight

iNZight

  • v4.1: surveys now handled natively

    • plots

    • summaries (tables of counts)

    • inference / modelling

    • data wrangling …

  • same goal: removal of barriers

Flashback: the usual iNZight process

Data

GUI

Explore

Export results/code

What if data is from a survey?

In

iNZight isn’t much better … or is it?!

Specify survey design

(Remember survey variables never have nice names)

Academic Performance Index (API) data

(two-stage cluster sample)
  • apiclus2.csv
  • apiclus2.svydesign

Demo: apiclus2.svydesign

data = "apiclus2.csv"
ids = "dnum + snum"
fpc = "fpc1 + fpc2"

Details: inzight.nz/docs/survey-specification.html

  • accessible
  • quickly open and explore
  • business as usual
    • plots
    • summaries/inference (population counts)
    • data wrangling

(A few) Details

iNZight’s package collection

9+ iNZight* packages

  • iNZight (GUI interface, collects user input, displays results)

  • iNZightModules (UI for time series, regression, maps, …)

  • iNZightPlots (graphs, summaries, inference)

  • iNZightTools (utility functions, data wrangling)

  • iNZightTS (time series)

  • iNZightMR (multiple response)

  • iNZightRegression (model summaries, residual plots)

  • iNZightMaps (lat/lng points, fill-in-the-shapefile maps)

  • plus vit and some others …

  • wrapper functions makes programming GUIs easier

    • inputs \(\equiv\) arguments
  • packages don’t need GUI

    • iNZightPlots::inzplot()

    • simple functions aimed towards novice coders

  • returns the R code

GUI \(\rightarrow\) high level functions \(\rightarrow\) lower-level (e.g., ggplot)

An example: Filtering data

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          4.9         3.0          1.4         0.2  setosa
## 2          4.7         3.2          1.3         0.2  setosa
## 3          4.6         3.1          1.5         0.2  setosa
## 4          4.6         3.4          1.4         0.3  setosa
## 5          5.0         3.4          1.5         0.2  setosa
## 6          4.4         2.9          1.4         0.2  setosa
## iris %>% dplyr::filter(Sepal.Width < 3.5)

A slightly more complex example

## # A tibble: 3 x 3
##   Species    Sepal.Length_median Sepal.Length_var
##   <fct>                    <dbl>            <dbl>
## 1 setosa                     5              0.124
## 2 versicolor                 5.9            0.266
## 3 virginica                  6.5            0.404
## iris %>%
##     dplyr::group_by(Species) %>%
##     dplyr::summarize(
##         Sepal.Length_median = median(Sepal.Length,
##             na.rm = TRUE
##         ),
##         Sepal.Length_var = var(Sepal.Length,
##             na.rm = TRUE
##         ),
##         .groups = "drop"
##     )

What about surveys?

  • modified wrapper functions to handle surveys

  • refactored GUI to pass around a ‘data-thing’ (data or survey)

## dclus2 %>%
##     srvyr::as_survey() %>%
##     srvyr::filter(api99 >= 700)

Big thanks to

Te Rourou Tātaritanga

Nā tō rourou, nā taku rourou,
ka ora ai te iwi.

Te Rourou Tātaritanga

Primary goals

  1. Improve data standards

  2. Promote Māori data sovereignty

  3. Develop systems to support access

  4. Evaluate synthesising of datasets

  5. Security and privacy implications

  6. Machine learning and AI methods

terourou.org

The Integrated Data Infrastructure (IDI)

  • database connecting data across NZs sectors

  • high security environment

  • but also other unnecessary barriers: coding!

iNZight to the rescue!

  • high school and/or university

  • no coding necessary

  • easy to learn and relearn

  • iNZight in Stats NZ data lab …? Watch this space!

iNZight in the Data Lab (WIP)

Start confined to (example) small data sets …

  • primary researcher: SQL \(\Rightarrow\) CSV

  • non-coding researchers: graphs, tables, …

… and build from there!

iNZight outside the Data Lab

  • groups/organisations/communities

    • population summaries (tables of counts)

    • regression models

    • demographic information …?

  • easy to learn and relearn

    • repeat analyses after 6 months / 2 years

    • no (or low) (re)training or consultation costs

  • produces R code script

Bayesian small area demography

Some important demographic information for communites (e.g., birth or death rates) requires specialist techniques and models.

  • John Bryant’s R packages (dembase, demest, …) for Bayesian demography

  • R coding required (and data transformations, working with multi-dimensional arrays, …)

  • so we tested out iNZight’s new add-on system …

DEMO

Other projects

Both work and ‘fun’

IDI Search app

  • simple web app (ReactJS)

  • searchable database

  • researchers can explore what’s available

terourou.org/idisearch

DEMO

idi-search.web.app

Bus display v2

  • the display in 302 was broken

  • rebuilt it (again) using ReactJS + d3

  • uses newly available real-time occupancy

DEMO

tomelliott.co.nz/bus-display

Lots of ReactJS …

  • long-term goal: a prototype of iNZight built with ReactJS and R-serve
  • a single app for Windows / macOS / Linux / web
  • connecting to local/remote R server (user permissions, firewall, etc.)

Thank you