Monday, April 4, 2022
HomeArtificial IntelligenceKnowledge Visualization in Python with matplotlib, Seaborn and Bokeh

Knowledge Visualization in Python with matplotlib, Seaborn and Bokeh

Final Up to date on March 28, 2022

Knowledge visualization is a crucial facet of all AI and machine studying purposes. You’ll be able to achieve key insights of your information by way of totally different graphical representations. On this tutorial, we’ll speak about a number of choices for information visualization in Python. We’ll use the MNIST dataset and the Tensorflow library for quantity crunching and information manipulation. As an example numerous strategies for creating various kinds of graphs, we’ll use the Python’s graphing libraries particularly matplotlib, Seaborn and Bokeh.

After finishing this tutorial, you’ll know:

  • How you can visualize photos in matplotlib
  • How you can make scatter plots in matplotlib, Seaborn and Bokeh
  • How you can make multiline plots in matplotlib, Seaborn and Bokeh

Let’s get began.

Picture of Istanbul taken from airplane

Knowledge Visualization in Python With matplotlib, Seaborn and Bokeh
Picture by Mehreen Saeed, some rights reserved.

Tutorial Overview

This tutorial is split into 7 components; they’re:

  • Preparation of scatter information
  • Figures in matplotlib
  • Scatter plots in matplotlib and Seaborn
  • Scatter plots in Bokeh
  • Preparation of line plot information
  • Line plots in matplotlib, Seaborn, and Bokeh
  • Extra on visualization

Preparation of scatter information

On this submit, we’ll use matplotlib, seaborn, and bokeh. They’re all exterior libraries have to be put in. To put in them utilizing pip, run the next command:

For demonstration functions, we may even use the MNIST handwritten digits dataset. We are going to load it from Tensorflow and run PCA algorithm on it. Therefore we may even want to put in Tensorflow and pandas:

The code afterwards will assume the next imports are executed:

We load the MNIST dataset from keras.datasets library. To maintain issues easy, we’ll retain solely the subset of information containing the primary three digits. We’ll additionally ignore the check set for now.

Figures in matplotlib

Seaborn is certainly an add-on to matplotlib. Due to this fact it’s good to perceive how matplotlib handles plots even in case you’re utilizing Seaborn.

Matplotlib calls its canvas the determine. You’ll be able to divide the determine into a number of sections known as subplots, so you possibly can put two visualizations side-by-side.

For example, let’s visualize the primary 16 photos of our MNIST dataset utilizing matplotlib. We’ll create 2 rows and eight columns utilizing the subplots() operate. The subplots() operate will create the axes objects for every unit. Then we’ll show every picture on every axes object utilizing the imshow() technique. Lastly, the determine can be proven utilizing the present() operate.

First 16 images of the training dataset displayed in 2 rows and 8 columns

First 16 photos of the coaching dataset displayed in 2 rows and eight columns

Right here we will see a number of properties of matplotlib. There’s a default determine and default axes in matplotlib. There are a selection of features outlined in matplotlib underneath the pyplot submodule for plotting on the default axes. If we wish to plot on a selected axes, we will use the plotting operate underneath the axes objects. The operations to govern a determine is procedural. Which means, there’s a information construction remembered internally by matplotlib and our operations will mutate it. The present() operate merely show the results of a collection of operations. Due to that, we will steadily fine-tune a number of particulars on the determine. Within the instance above, we hid the “ticks” (i.e., the markers on axes) by setting xticks and yticks to empty lists.

Scatter plots in matplotlib and Seaborn

One of many widespread visualizations we use in machine studying tasks is the scatter plot.

For example, we apply PCA to the MNIST dataset and extract the primary three elements of every picture. Within the code under, we compute the eigenvectors and eigenvalues from the dataset, then tasks the info of every picture alongside the path of the eigenvectors, and retailer the end in x_pca. For simplicity, we didn’t normalize the info to zero imply and unit variance earlier than computing the eigenvectors. This omission doesn’t have an effect on our function of visualization.

The eigenvalues printed are as follows:

The array x_pca is in form 18623 x 784. Let’s take into account the final two columns because the x- and y-coordinates and make the purpose of every row within the plot. We will additional colour the purpose based on which digit it corresponds to.

The next code generates a scatter plot utilizing matplotlib. The plot is created utilizing the axes object’s scatter() operate, which takes the x- and y-coordinates as the primary two argument. The c argument to scatter() technique specifies a price that can develop into its colour. The s argument specifies its dimension. The code additionally creates a legend and provides a title to the plot.

2D scatter plot generated using Matplotlib

2D scatter plot generated utilizing matplotlib

Placing the above altogether, the next is the entire code to generate the 2D scatter plot utilizing matplotlib:

Matplotlib additionally permits a 3D scatter plot to be produced. To take action, it’s good to create an axes object with 3D projection first. Then the 3D scatter plot is created with the scatter3D() operate, with the x-, y-, and z-coordinates as the primary three arguments. The code under makes use of the info projected alongside the eigenvectors akin to the three largest eigenvalues. As an alternative of making a legend, this code creates a colorbar.

3D scatter plot generated using Matplotlib

3D scatter plot generated utilizing matplotlib

The scatter3D() operate simply places the factors onto the 3D house. Afterwards, we will nonetheless modify how the determine shows such because the label of every axis and the background colour. However in 3D plots, one widespread tweak is the viewport, particularly, the angle we have a look at the 3D house. Viewport is managed by the view_init() operate within the axes object:

The viewport is managed by the elevation angle (i.e., angle to the horizon aircraft) and the azimuthal angle (i.e., rotation on the horizon aircraft). By default, matplotlib makes use of 30 diploma elevation and -60 diploma azimuthal, as proven above.

Placing every little thing collectively, the next is the entire code to create the 3D scatter plot in matplotlib:

Creating scatter plots in Seaborn is equally simple. The scatterplot() technique robotically creates a legend and makes use of totally different symbols for various lessons when plotting the factors. By default, the plot is created on the “present axes” from matplotlib, until the axes object is specified by the ax argument.

2D scatter plot generated using Seaborn

2D scatter plot generated utilizing Seaborn

The good thing about Seaborn over matplotlib is 2 fold: First we now have a elegant default fashion. For instance, if we examine the purpose fashion within the two scatter plots above, the Seaborn one has a border across the dot to forestall the numerous factors smurged collectively. Certainly, if we run the next line earlier than calling any matplotlib features:

we will nonetheless use the matplotlib features however get a greater trying determine by utilizing Seaborn’s fashion. Secondly, it’s extra handy to make use of Seaborn if we’re utilizing pandas DataFrame to carry our information. For example, let’s convert our MNIST information from a tensor right into a pandas DataFrame:

which the DataFrame appears like the next:

Then, we will reproduce the Seaborn’s scatter plot with the next:

which we don’t go in arrays as coordinates to the scatterplot() operate, however column names to the information argument as a substitute.

The next is the entire code to generate a scatter plot utilizing Seaborn with the info saved in pandas:

Seaborn as a wrapper to some matplotlib features, shouldn’t be changing matplotlib totally. Plotting in 3D, for instance, aren’t supported by Seaborn and we nonetheless have to resort to matplotlib features for such functions.

Scatter plots in Bokeh

The plots created by matplotlib and Seaborn are static photos. If it’s good to zoom in, pan, or toggle the show of some a part of the plot, it’s best to use Bokeh as a substitute.

Creating scatter plots in Bokeh can also be simple. The next code generates a scatter plot and provides a legend. The present() technique from Bokeh library opens a brand new browser window to show the picture. You’ll be able to work together with the plot by scaling, zooming, scrolling and extra choices which might be proven within the toolbar subsequent to the rendered plot. You may also conceal a part of the scatter by clicking on the legend.

Bokeh will produce the plot in HTML with Javascript. All of your actions to regulate the plot are dealt with by some Javascript features. Its output would appears like the next:

2D scatter plot generated using Bokeh in a new browser window. Note the various options on the right for interacting with the plot.

2D scatter plot generated utilizing Bokeh in a brand new browser window. Be aware the varied choices on the suitable for interacting with the plot.

The next is the entire code to generate the above scatter plot utilizing Bokeh:

In case you are rendering the Bokeh plot in Jupyter pocket book, you may even see the plot is produced in a brand new browser window. To place the plot within the Jupyter pocket book, it’s good to inform Bokeh that you’re underneath the pocket book surroundings by working the next earlier than the Bokeh features:

Additionally observe that we create the scatter plot of the three digit in a loop, one digit at a time. That is required to make the legend interactive, since every time scatter() is named, a brand new object is created. If we use create all scatter factors directly, like the next, clicking on the legend will conceal and present every little thing as a substitute of solely the factors of one of many digits.

Preparation of line plot information

Earlier than we transfer on to point out how we will visualize line plot information, let’s generate some information for illustration. Under is an easy classifier utilizing the Keras library, which we practice it to study the handwritten digit classification. The historical past object returned by the match() technique is a dictionary that incorporates all the educational historical past of the coaching stage. For simplicity, we’ll practice the mannequin utilizing solely 10 epochs.

The code above will produce a dictionary with keys loss, accuracy, val_loss, and val_accuracy, as follows:

Line plots in matplotlib, Seaborn, and Bokeh

Let’s have a look at numerous choices for visualizing the educational historical past obtained from coaching our classifier.

Making a multi-line plots in matplotlib is as trivial as following. We acquire the checklist of values of the coaching and validation accuracies from the historical past, and by default, matplotlib will take into account that as sequential information (i.e., x-coordinates are integers counting from 0 onwards).

Multi-line plot using Matplotlib

Multi-line plot utilizing Matplotlib

The entire code for creating the multi-line plot is as follows:

Equally, we will do the identical in Seaborn. As we now have seen within the case of scatter plot, we will go within the information to Seaborn as a collection of values explicitly, or by way of a pandas DataFrame. Let’s plot the coaching loss and validation loss within the following utilizing a pandas DataFrame:

It can print the next desk, which is the DataFrame we created from the historical past:

And the plot it generated is as follows:

Multi-line plot using Seaborn

Multi-line plot utilizing Seaborn

By default, Seaborn will perceive the column labels from the DataFrame and use it as legend. Within the above, we offer a brand new label for every plot. Furthermore, the x-axis of the road plot is taken from the index of the DataFrame by default, which is integer working from 0 to 9 in our case as we will see above.

The entire code of manufacturing the plot in Seaborn is as follows:

As you possibly can anticipate, we will additionally present arguments x and y along with information to our name to lineplot() as in our instance of Seaborn scatter plot above if we wish to management the x- and y-coordinates exactly.

Bokeh also can generate multi-line plots, as illustrated within the code under. As we noticed within the scatter plot instance, we have to present the x- and y-coordinates explicitly and do one line at a time. Once more, the present() technique opens a brand new browser window to show the plot and you may work together with it.

Multi-line plot using Bokeh. Note the options for user interaction shown on the toolbar on the right.

Multi-line plot utilizing Bokeh. Be aware the choices for person interplay proven on the toolbar on the suitable.

The entire code for making the Bokeh plot is as follows:

Extra on visualization

Every of the instruments we launched above has much more features for us to regulate the bits and items of the small print within the visualization. It is very important search on their respective documentation to seek out the methods you possibly can polish your plots. It’s equally necessary to take a look at the instance code of their documentation to study how one can presumably make your visualization higher.

With out offering an excessive amount of element, listed here are some concepts that you could be wish to add to your visualization:

  • add auxiliary traces, akin to to mark the coaching and validation dataset on a time collection information. The axvline() operate from matplotlib could make a vertical line on plots for this function
  • add annotations, akin to arrows and textual content labels to establish key factors on the plot. See the annotate() operate in matplotlib axes objects.
  • management the transparency stage in case of overlapping graphic parts. All plotting features we launched above permits an alpha argument to offer a price between 0 and 1 for the way a lot we will see by way of the graph.
  • if the info is best illustrated this fashion, we might present among the axes in log scale. It’s normally known as the log plot or semilog plot.

Earlier than we conclude this submit, the next is an instance that we will create a side-by-side visualization in matplotlib, which considered one of them is created utilizing Seaborn:

Aspect-by-side visualization created utilizing matplotlib and Seaborn

The equal in Bokeh is to create every subplot individually after which specify the format after we present it:

Aspect-by-side plot created in Bokeh

Additional Studying

This part supplies extra assets on the subject if you’re seeking to go deeper.



API Reference


On this tutorial, you found numerous choices for information visualization in Python.

Particularly, you realized:

  • How you can create subplots in numerous rows and columns
  • How you can render photos utilizing Matplotlib
  • How you can generate 2D and 3D scatter plots utilizing Matplotlib
  • How you can create 2D plots utilizing seaborn and Bokeh
  • How you can create multi-line plots utilizing Matplotlib, Seaborn and Bokeh

Do you’ve got any questions on information visualization choices mentioned on this submit? Ask your questions within the feedback under and I’ll do my greatest to reply.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments