A skinny on colors for data scientists

A skinny on colors for data scientists

Color palettes are important. This may sound like an oversimplification, but it’s definitely true.

Colors can change how we perceive things, as well as how we process information. They can make data sets either easier to understand or more difficult. They also play a part in what we do with what we learn: whether we pay attention to it, ignore it, remember it, or act on it.

That’s true across the board, regardless of what niche or market you’re working in. From graphic design to school textbooks, choosing colors carefully is vital to successfully transmit information.

Which means that it’s all that much more important for the publication of data. No matter what kind of presentation you decide on, from pie charts to bar graphs, colors can make the difference in how your data is received. And there’s a science behind how it all works.

Let’s break some of the science down with these tips, tricks, and tutorials for choosing colors for optimum data presentation.

Color types

Data presentation color choice can be broken down depending on the information being presented. There are, roughly speaking, three groups of colors:

color types

  • Main colors comprise the majority of your information or auxiliary information. These should not be too distracting. Neutrals, like gray, are important players in the main color palette.
  • Highlight colors draw focus to key information. Since they are constrained to certain portions of your data, they can be brighter, attention-getting shades.
  • Supporting colors act as background shades. These should not draw much attention and should be chosen on the basis of the other colors you use for your data so as to make the information easy to read.

It can be tempting to just choose the colors that you like, or which you think look good together. But for optimum readability and memorability, your color choice should be based on more than just personal preference.

Let’s take a look at two things you could base your color choice on, and why.

1. Choose colors according to color psychology

The psychology of color is an interesting field of study that contains a lot of variables. The sheer number of variables, in fact, means that it can be difficult to pin down facts that are true 100% of the time.

This is largely because of the differences in perception and personal preference. However, research does indicate that certain colors tend to elicit certain responses in viewers.

Blue, as an example, is a soothing and trustworthy color, while red is more likely to lead to action and may even trigger strong feelings.

For this reason, then, choosing colors based on color psychology will take into account the likely response of the viewer.

Main color vs highlight color

In the two examples above, you can see that blue is a good main or supporting color, while red is an attention-getting color and would work well as a highlight color. If you use an active color that demands attention for all of your data, as in the first example, none of your data will stand out.

How to do it:

  • Do some research on the psychology of color and common responses from viewers
  • Choose neutrals or calming colors for the majority of the colors involved in your data graph
  • Choose dynamic, attention-getting colors only for highlights

2. Choose colors based on the data itself

A second option is to base your palette on the type of data you’re representing. Not all types of data have inherent colors, so this may only be an obvious solution depending on your area of research.

color based on data

As an example, however, consider data regarding environmental research. For research-based on green spaces and forests, a variety of greens would be a natural choice — as in the example on the right. For research on oceans and the environmental impact of overfishing, for example, aqua and blues are a clear option, as used on the left.

As another example, consider political data sets and the colors that are tied to politics in the public consciousness. Red is linked to Republicans and blue to Democrats. Switching these colors in a related data set would be confusing and counterintuitive to the viewer.

Your data sets may require more thought, but don’t be quick to write off this method. You may be required to do a little research into the history and public perception of the area in which you work.

How to do it:

  • Brainstorm on which colors may naturally be attached to or elicited by your data sets
  • If no colors are immediately apparent, try doing a Google image search for your subject and see what colors are most commonly reflected
  • Don’t think too far outside the box with colors that have common associations in the public mind — it can be confusing to the viewer and muddy the message you’re sending

The optimal amount of colors in data presentation

Another consideration is the number of colors that you include in your data set or workflow diagrams.

Say that you’re presenting a data set with twelve different variables. Several of these variables are important, and you want to draw attention to each of them.

What is the solution? To opt for a wide range of highlight colors in order to make sure that each point stands out?

too many highlight colors

Actually, the more colors included in a graph, the more difficult the graph is to read and understand. Graph designers suggest not using more than seven different colors in a graph; if more are needed, consider using a different type of graph or grouping data differently so as to minimize confusion.

warm colors vs variety of colors

For attention-getting colors, it’s better to choose either warm or cold colors within a single graph. For example, choosing varying shades of red, orange, and yellow to highlight important points (as used in the example on the left) will be easier on the eye than choosing bright green, red, and pink (as on the right).

Here are some tips:

  • Aim for fewer than seven colors in your data set or graph
  • Make sure to include a color key for easy reference

Choosing colors for effective presentation

Data scientists already know that compiling and presenting information isn’t just a matter of personal preference. The data must be read and understood by the viewer, whether the set is for public perusal or for scientific presentation. So the colors that are chosen for data set design and related materials can’t be just left up to chance or personal preference, either.

Related: Check out Cacoo’s Dynamic Charts

Choosing colors doesn’t follow hard and fast rules, though. And the right color choice for one data set won’t necessarily be the right color choice for another.

There’s a wide range of variables that need to be considered. But it is definitely recommended to consider, at a minimum, either the psychology of color or intuitive choice based on the data itself.

However, what it all comes down to is choosing colors that make your data easy to read, understand, and remember.

Ayesha Ambreen Ayesha Ambreen is a Creative Content Strategist, Partner at Quora, and Featured Graphic Designer. Ayesha’s work has been featured on blogs such as Entrepreneur.com, Smashing Magazine, CreativePro, and more. Follow her on Twitter at @AyeshaAmbreen