University of Canterbury embraces cloud computing and big data
The University of Canterbury has pulled off an Australasian first and is officially using cloud infrastructure and big data analytics to empower teachers and students.
Dr Raazesh Sainudiin, UC senior lecturer from the School of Mathematics and Statistics, secured grants from Databricks Academic Partners Program and Amazon Web Services Educate to give free and ongoing access to all UC faculty, staff and students and enable them to use their cloud-computing infrastructure for academic teaching and research.
According to Sainudiin, this provides UC with potential to become a leader in big data analytics in this region of the globe.
“In today's digital world, data about every conceivable aspect of life is being collected and amassed at an unprecedented scale. To give you some idea of how much data we are talking about, IBM estimated that a whopping 2.5 exabytes (2,500,000,000,000,000,000 bytes) of data was generated every single day, and that was back in 2012.
“This massive data could potentially hold answers for many critical questions and problems facing our world today. But to be able to get at these important answers, the first step is to be able to explore and analyse this gargantuan volume of data in a meaningful way,” he says.
“Cloud computing allows you to instantly scale up access to over 10,000 off-site computers, as required by the scale of the real-world big data problem at hand, and complete the data analyses in the least amount of time needed - usually a matter of hours.
“What if all past and present recorded and real-time data of earthquakes on the planet could be analysed simultaneously? Or consider the live analysis of every tweet on Earth. There are on average 60 tweets per second. The scale of such volumes of data is such that they can't be stored, let alone analysed, by one computer or even a 100 computers in any sort of reasonable timeframe,” says Sainudiin.
UC has already established a research cluster with thousands of computer nodes running Apache Spark, a cluster computing engine for large-scale data processing. This locally set-up resource taps into the infrastructure provided by these grants and is being used by UC students in a new course STAT478: Special Topics in Scalable Data Science, including several students who are full-time employees in the local tech industry.
Furthermore, students are trained to run their own big-data projects as part of their course requirements. This cutting-edge training using cloud infrastructure to solve big-data problems will generate globally competitive graduates for the data industry, with key skills in top paying technologies listed in the 2016 Developer Survey, Dr Sainudiin says.
With a curriculum created in consultation with the tech industry, the innovative course has been praised by Wynyard Group's chief technical officer Roger Jarquin.
“We hope that such industry-academia collaborations will continue to be a dynamic training ground for future employees in our growing data industry,” says Jarquin, also an Adjunct Fellow of UC's School of Mathematics and Statistics.
Professor James Smithies, director of King's Digital Lab Department of Digital Humanities at King's College London, and former senior lecturer in History at UC, says the course in Scalable Data Science is an ‘excellent' resource for the digital humanities, and sits well beside activities occurring at King's Digital Lab (KDL).
He says, “The combination of AWS and Databricks is broadly in line with what we think digital humanities students and researchers will need, and benefits from excellent levels of usability and scalability.
"This kind of approach is of crucial importance to the future of digital humanities, as researchers move into big data analysis and we seek to provide our students with the tools and experiences they need to succeed in their careers both inside and outside university."