Classifies Cloud Patterns from Satellite Images.


Understanding Clouds from Satellite Images


Can you classify cloud structures from satellites?

You will learn in this project how to solve an image classification problem in the climate, environment and earth sciences using the physical understanding of clouds. This project requires no prior AI knowledge. We’re going to create a computer vision machine learning (ML) model that classifies cloud organization patterns from satellite images.



The problem

Climate change has been at the top of our minds and on the forefront of important political decision-making for many years. We hope you can use this project’s dataset to help demystify an important climatic variable. Scientists, like those at Max Planck Institute for Meteorology, are leading the charge with new research on the world’s ever-changing atmosphere and they need your help to better understand the clouds.

Shallow clouds play a huge role in determining the Earth's climate. They’re also difficult to understand and to represent in climate models. By classifying different types of cloud organization, researchers at Max Planck hope to improve our physical understanding of these clouds, which in turn will help us build better climate models.

There are many ways in which clouds can organize, but the boundaries between different forms of organization are murky. This makes it challenging to build traditional rule-based algorithms to separate cloud features. The human eye, however, is really good at detecting features—such as clouds that resemble flowers.

Classification problems have always been around. Humans have in all times tried to classify the world around them. We try to make sense of the world by making predictive models.

Below, we will build a machine learning model that can be trained on a data set of images – satellite images of cloud organization patterns – to help scientists better understand how clouds will shape our future climate. This research will guide the development of next-generation models which could reduce uncertainties in climate projections.

Help us remove the haze from climate models and bring clarity to cloud identification. For more information on the scientific background and how the labels were created see the following paper.



Classification is a supervised learning technique in which the computer uses input data (in this project, images of diseased and healthy plant leaves are the input data) to learn and accordingly classify new observations. Further information can be found in the second course of Level2, Supervised Learning, in the AI Citizenship Program.  


learning outcomes

After completing this project, you will understand how to: 

    • Collect data and split it into training and testing datasets
    • Solve a classification problem by creating a ML model to classify a set of satellite images of cloud organization patterns
    • Apply ML in real life scenario by deploying your own AI model, using our API, into your own app or website


The data

In this AI project, you will be identifying regions in a dataset of satellite images that contain certain cloud formations, with label names: Fish, Flower, Gravel, Sugar. For each image in the testing dataset, you must segment the regions of each cloud formation label. Each image has at least one cloud formation and can possibly contain up to all all four.

This dataset will be used to train your ML model on how to do the classification task on its own when it sees new images of cloud formations. You will need to make sure that you split your data into training dataset and testing dataset in order to know how to assess the performance of your ML model. 

The dataset of satellite images were downloaded from NASA Worldview. Three regions, spanning 21 degrees longitude and 14 degrees latitude, were chosen. The true-color images were taken from two polar-orbiting satellites, TERRA and AQUA, each of which pass a specific region once a day. Due to the small footprint of the imager (MODIS) on board these satellites, an image might be stitched together from two orbits. The remaining area, which has not been covered by two succeeding orbits, is marked black.

The labels were created in a crowd-sourcing activity at the Max-Planck-Institite for Meteorology in Hamburg, Germany, and the Laboratoire de météorologie dynamique in Paris, France. A team of 68 scientists identified areas of cloud patterns in each image, and each images was labeled by approximately 3 different scientists. Ground truth was determined by the union of the areas marked by all labelers for that image, after removing any black band area from the areas.

The segment for each cloud formation label for an image is encoded into a single row, even if there are several non-contiguous areas of the same formation in an image. If there is no area of a certain cloud type for an image, the corresponding EncodedPixels prediction should be left blank.




Meet your AI Identity. You will teach your AI how to complete tasks as a human would. In due time, your AI Identity will grow more intelligent as you train it some more to take on different tasks with multiple projects. Eventually -in the future- you will be able to transfer your AI identity's intellect into a physical robot you can interact with in the real world.  



First, create a project and name it so you know what kind of project it is. Naming is important. A project combines all of the steps in solving a problem, from the pre-processing of datasets to model building, evaluation, and deployment. Using projects makes it easy to collaborate with others. After adding the project’s name and description, click on CREATE



In this step, you will need to select one of the ML models in order to solve your problem, that is classifying satellite images. You will find a computer vision model (Make Me See), Natural Language Processing model (Make Me Read), Voice Recognition (Make Me Hear), and a tabular model (Make Me Count). In our case, you will need to train your AI transformer to how to see and classify visual data (i.e. images). Therefore, you need to select MAKE ME SEE. 

More AI capabilities (models) will be added to the platform, so stay tuned! These models will help you to accelerate the process of creating a trained AI model that solves your problem, and thus export into your app to start using it!



This is the most important step in any ML project. In fact, data preprocessing is 70-80% of any ML project. In this section, you will need to manually collect your satellite images from reliable resources, such as Kaggle, or import your dataset from our data library. You need to classify the data into Fish, Flower, Gravel, Sugar data classes. Then upload images. You can use the “Understanding Clouds from Satellite Images” dataset from the dataset library.



In this step, you will start training your AI transformer (i.e. ML model) on your collected data. It is simply where you show your AI how to classify satellite images of various cloud organizations structures, so that when the AI sees a new image of a cloud pattern, it can recognize to which of the 4 categories it belongs. Click TRAIN ME to start the training. This may take a few moments.



Right now, and after you completed training your model, you are ready to test it using a new image sample of satellite images. You can test your model using one of the images from the testing dataset (see point 4). Just make sure that you don’t use an image from the dataset that you used in the training part, otherwise, you are cheating! Upload or import the sample image to assess the performance in the preview section. If you like the confidence score, you can export your model. Otherwise, go back to the "collect data" step and check whether you have used sufficient number of images in the training dataset. 



If you are happy with the testing results, you can "Export" your ML model using our API into your own website, application or even to your robot or any machine! If you are not happy with the confidence score, you will need to check your training data classes again, and make sure that you have clean and sufficient number of examples for each data class. Keep in mind the concept of GIGO (garbage in, garbage out) which means that the quality of your ML model output is determined by the quality of your input, i.e. training dataset.