Artificial intelligence (AI) is one of the most important trends of our time: Enterprises are increasingly considering the development of AI programs and robots in order to boost automation, eliminate error prone processes and do things that humans can’t. Nowadays, we are only scratching the surface of AI, as most deployments in areas like trade, industry and healthcare are in their early stages. In the coming couple of decades, we will see AI deployments expanding in order to replace humans in a great deal of tasks. Studies estimate that robots, AI and machines that learn, could replace humans in as many as 50% of today’s jobs. Nevertheless, the same studies indicate that new jobs will be created, notably jobs around the data science ecosystem. In practice, this means that enterprises will need data scientists for almost all areas of their business. As a result, we expect the demand for data scientists and machine learning experts to grow substantially. This means that the data science skills shortage already faced by enterprises will become much more dramatic in the coming years. In this context, there is a need for platforms and resources that will help training the future data scientists. Nowadays, there are already on-line communities that prepare data scientists through enabling them to learn, practice and evolve their data science skills.
Introducing the Kaggle Community
Kaggle.com is one of the most prominent online communities of data scientists worldwide, which has since 2017 been acquired by Google. It brings together data scientists and machine learning experts, which are offered with tools and techniques that boost their skills and lifelong learning. In particular, Kaggle users can:
- Find real-life data sets for testing, learning and experimentation.
- Publish their own data sets, as a means of contributing additional data to the community.
- Explore machine learning models and how they perform on different datasets.
- Build their own models in order to solve machine learning problems based on datasets available in the platform. Kaggle supports data scientists in their model building efforts based on a cloud-based workbench that offers access to datasets and machine learning libraries.
- Collaborate with other data scientists and machine learning experts, in order to jointly solve problems and towards sharing insights and experiences in machine learning.
- Participate in competitions that involve data science challenges.
Kaggle has over 1 million registered users from over 190 countries, which are commonly called “Kagglers”. Based on these numbers Kaggle is considered the largest and most diverse community of data scientists worldwide, as it is joined by data science beginners and by some of the world’s best machine learning experts at the same time.
Kaggle’s Machine Learning and Data Science Competitions
Competitions was one of the first services offered by Kaggle and is what has made the platform famous. Competitions are organized as follows: Companies post concrete business problems on the platform, along with the datasets needed to solve them. They also specify the way in which the final result should be formulated. Kagglers compete in order to build the best algorithm for the problem at hand. Successful competitors gain monetary prizes offered by the companies that post the problem, along with credits awarded to them by the platform. In most cases, winners publish their code in a git repository, unless they are bound by some non-disclosure agreement. Over the years, Kaggle has hosted some very important competitions about known and popular problems such as gesture recognition for Microsoft Kinect. Most important, Kaggle competitions have produced novel results that advanced entire research areas and sectors of the economy, such as biotechnology, transport and gaming.
Competitions are a great way to get exposure on real problem and gain genuine experience in machine learning. This is something very difficult to acquire in school or through on-line courses. Participation in a Kaggle competition may sound ambitious and complicated, but it can really pay off for both beginners and experienced data scientists. For beginners, Kaggle competitions provide an excellent opportunity to understand the process and statistical skills needed to solve real problems. All Kaggle participants get acquainted with git, cloud computing, evaluation metrics and Kaggle’s data science workbench, which sharpens their skills and helps them practice in a realistic context. This is the reason why Kaggle allows its members to experiment even with competitions that have been completed: Beyond wining the prize, competitions are about learning and gaining experience as well.
On the other hand, for experienced data scientists, competitions provide unique opportunities to sharpen their skills, use their knowledge in the scope of real problems, and experiment with leading edge, advanced algorithms, techniques and toolkits (e.g., deep learning, TensorFlow). Note also that Kaggle is an excellent experimentation forum for academics, who want to publish papers using publicly available datasets. Such academics can take advantage of Kaggle datasets and Kaggle “kernels” (i.e. predefined pieces of data analysis code) in their research, experimentation and validation efforts. Furthermore, they can use their rankings in Kaggle competitions in order to prove the viability, novelty and performance of their approaches.
Kernels, Tutorials and Job Boards
Kaggle Kernels are available through the platform’s cloud-based workbench, which facilitates sharing of data and code inside the platform. Kaggle enables data scientists to use two of the most popular programming languages for dataset manipulation and statistical computing, namely Python and R. For newcomers to the platform, Kaggle offers a range of tutorial courses, which enable community members to start with machine learning and Artificial Intelligence. Kaggle tutorials provide an excellent presentation of hot issues in data science, such as supervised learning, unsupervised learning and the issue of data bias.
For companies and employers, Kaggle offers a jobs board, which hosts jobs related to machine learning, data science and AI jobs. When it comes to attracting talent, Kaggle offers an attractive value proposition: Kagglers have their own portfolio of data science solutions, which is indicative of their experience and capability. This portfolio provides a valuable add-on over conventional bios and resumes. Portfolios and rankings in competition boost meritocracy, as they are based on objective metrics, rather than on the subjective assessment of a Curriculum Vitae.
In the coming couple of decades, developing and attracting talent in machine learning and data science will be one of the top priorities for many enterprises. On-line data science communities will provide a great boost in the direction of training data scientists, boosting their lifelong learning and helping experienced machine learning experts to compete and share insights with their peers. Kaggle has a pioneering role, which is likely to pave the ground for establishing similar communities in the years to come.