The Australian Government coat of Arms

Communities of practice

Communities of practice

Curated data list and go/ no go on local magda instance for a small team

content
#1

Hi community

I’m beginning some work to bring some light touch improvements to data governance and ideally a very lean set of data cataloguing tools for a small team of officers with minimal coding skills in an Intelligence and Analysis unit at the NSW EPA. They source a range of public data including a growing set from data.gov to undertake analysis work mostly using excel and Power BI. An example is reporting on the trend of pollution using the National Pollution Inventory for a company in the Australian Business Register, a light sprinkle of SA1 ABS data and after joining these sets with some internal
address and company based licensing data. Not being from a data or IT background their processes have been very ad-hoc. They will have limited and sporadic developer support.

Looking for some quick wins I have the following questions:

  • using data.gov.au site can a curated set of data sources be created for a user, an organisation (a consumer), or a set of users (a team) ? I don’t see it anywhere. Is this the case ?
  • I’m keen to know more about the model use cases, requirements, and minimum level support/self serve deployment investment for magda - for a single team to consider a local instance ? I have basic docker, npm, yarn, java and API skills but no node in anger and zip in scala… and 4 to 6 months of my time with uncertain support there after. Advice on go, no-go would be great.

Many thanks in advance

Simon

2 Likes

#2

Hi Simon,

  1. Not at this stage. This is actually sounds really similar to a feature that CSIRO Land and Water have on their roadmap for Knowledge Network (their Magda instance) - building “playlists” of datasets that can be shared as a whole. Hopefully when they develop it we’ll be able to put something like it on data.gov.au.

We’re also adding the ability to associate datasets with user groups for our internal, private-data-focused instances of Magda, which could maybe be made to work a bit like this on data.gov.au.

  1. In terms of use cases for Magda, the primary use case that we’re developing towards right now is what’s described on magda.io - government agencies that want to catalogue and discover their data in a more organised and effective manner.

The main hurdle for running Magda is Kubernetes - if you have Kubernetes set up (which is really simple via a cloud service, reasonably simple locally and pretty complex on-premises) then it’s not that much harder to get Magda running (although if you want HTTPS via Let’s Encrypt it’s a bit of extra fiddling). Minimum requirements for running it are 2 CPU cores and 4GiB of RAM, although you’d probably want more like 3-4 CPUs to be comfortable (our internal instances run on 3x 1 CPU Google Cloud instances and 1x 1 CPU Google Cloud SQL database).

Unless you actually want to contribute new features you shouldn’t need any node/java/scala experience - everything’s packaged as docker images and Kubernetes Helm charts so you don’t need to touch the code to get running.

In terms of how much support you’d need to run it… in theory it should pretty much run by itself, and updates can be installed by running one or two commands as long as you’re happy to accept a few minutes downtime. In reality there’s probably blindspots that I don’t know about as there aren’t a lot of instances that we don’t directly administer (just the aforementioned CSIRO Land and Water one at this point).

We are very keen to see Magda adopted in more places though, so it’s hard for me to say whether it makes sense for you or not - perhaps send me an email at contact@magda.io and we can have a chat? Would be great to hear about what you’re doing.

1 Like

#3

thanks very much Alex

Very helpful.

I’ll take you up on a chance to chat soon.

Cheers

Simon

0 Likes