Seeking for the right career path according to your talents with joint forces of data science and CliftonStrengths assessment.

Introduction

Have you ever heard about the 34 CliftonStrengths Themes or Gallup Test? In short, the whole idea is about the fact that although people can learn skills, develop their knowledge and gain experiance, they can’t acquire talent. It’s innate. Each of us is unique and trying to achieve anything without good knowledge of yourself, your strengths and motivation drivers, is like walking in the dark.

I don’t know if I have ever been more excited starting any article here. I came cross CliftonStrenghts idea on the beginning of 2021 while searching for a good podcast to listen to. Spotify suggested me the one led by Dominik Juszczyk who is the expert and trainer in Gallup talents. Don’t get me wrong, I guess many of you may immediately get sceptical about such tools. First behavioural test which comes to the mind is MBTI one (also known as 16Personalities) who definitely has its cons. I have also doubted the idea on the beginning. Nevertheless, the more I have got deeper into the topic, the more I have discovered myself, the more real benefits I see.

34 CliftonStrengths Themes

When you take the CliftonStrengths assessment in the full version, you uncover your unique combination of 34 themes (I will also name them talents interchangeably). The themes are sorted into 4 domains and together they perfectly explain profound element of human behaviour. Individually, each talent gives you the idea of what you naturally do best and when it’s better to ask for someone’s else help (and benefit from their talents this way). Below you can see all themes grouped into their domains:

CliftonStrengths themes and their domains / Source: gallup.com

After taking the assessment, you receive your own CliftonStrengths talent DNA, shown in rank based on your responses. First 5 themes on the list are most powerful. That’s where you should maximize your potential. These talents represent your greatest chance to succeed by strenghthening what you naturally do best and simply doing more of it! Themes from 6 to 10 are your next step to focus on.

But what about remaining 24? Are those my weaknesses? Definitely not! To fully understand your talent DNA, you need to also work with the themes from the middle so that you are conscious about your behaviours happening from time to time. You need to remember that those can be also helpful in certain situations as your little superheros!

What about the bottom ones? They show you who you are not. This doesn’t mean they’re your weaknesses but you shouldn’t rely on them to maximize your potential.

It’s also possible to take a basic version of the test and get idea only about your Top 5 CliftonStrengths themes. Such solution is taken by many people but for me it’s like watching just the trailer of the movie – althought you’re seeing all best parts, you lose the whole perspective.

Taking my own CliftonStrengths assessment

I have decided to take my own CliftonStrengths assessment on December 2021. You might wonder what kept me from making it much ealier, immediately after getting interested in the topic. How funny is that, that’s the answer of such behaviour of mine is also represented in the test results! I assume it’s my intellection visible there. Instead of getting immediate answers, I wanted to first explore all themes and try to guess which of them are most visible in my own CliftonStrengths talent DNA. Have my managed? I made some good guesses, but in general I was totally suprised by the results.

My CliftonStrengths talent DNA

I need to admit that having the idea of how often do the talents happen in statistics, I was disappointed at the beginning. “My Top5 is so popular among people! I expected something more unique…” – I have thought. Now I am smiling reading those words. After weeks of working with test results, I wouldn’t exchange them for any other set. My talent DNA gives me so much value in life! Now it’s finally clear to myself what I am good at, what activities in projects give me most joy and in which areas I shouldn’t force myself unneccessarily. Instead of punishing myself for my weaknesses, I focus on my strengths. Althought it’s still a huge space for improvement, I finally understand the clue.

How I observe and benefit from my CliftonStrengths TOP5?

Individualization

I am a people person, enjoy working not only with data but also with others. No matter if you are my co-worker, business stakeholder, client or a trainee, I will treat you independently from others. I appreciate diversity among people – their ways of working, ingesting knowledge and communication. It’s clear for me that the methods which are good for myself, can be irrelevant for another person. Generalization is what I hate the most.

While working on a project, I try to wear my customer or end-user shoes. When collecting requirements, I want to be sure no remark has been omitted. When running workshop, teaching people, I tend to be as close and supportive as possible to share knowledge and possibilities where needed. I want people to learn on my mistakes, giving them the tools and information I wished to receive when I have struggled with same issues. It’s easy for me to spot and understand people’s desires, motivations and concerns and support them in achieving their goals.

Input

I am addicted to collecting and archivization. Not only information – but also things, artefacts or even relationships. I believe that’s even if a specific info or a thing is not necessary now, might be useful for future. It doesn’t need to support me, might be as well used by someone else who needs it. While learning about data science, exploring various articles, course materials and books, I make large amount of notes. Both digical and old-fashioned ones stored in notebooks. I am sure that when I come across a specific problem in any project, I will know when to find the answer. In work projects I am the fan of documentation made in a responsible manner. Storing anything which could be beneficial for a new person.

Last but not least, writing this blog – the best evidence for my input. Collecting own projects, resources, valuable insights from interviews with experts – everything which can be useful for any newbie in the world of data science.

Learner

Endless hunger for knowing more and self-development. It took me years to understand that for me the whole process of learning is more important than the result. I tend to finalize courses, books and instead of spending some time for reflections, I immediately think of starting another one. The process of acquiring knowledge is more than the final achievement. I am conscious that it’s bad and keep on fighting with that (also using my input to collect “what went well/bad” stuff). Nevertheless I consider this theme as extremely useful for data science. In this industry you need to crave for knowledge. It’s not a thing you learn once in a lifetime and keep repeating. You need to be excited about the changing reality and the necessity to keep up with its pace. I am. What about you?

Intellection

This theme is my blessing and curse at the same time! From my childhood I tend to overthink everything. I have always considered it as my weakness but now understand how to use it in the mature way. In the group of people brainstorming the solution should be always this one person who will remain sceptical and spend more time thinking. Taking immediate decisions was never my strength but I am really good when given time. With the support of input and learner themes I will revise all pros and cons and give you an valuable advice. You can be sure that I have done my best to do this right and I am there for you. I just need a little bit more time to be effective, dive into all data I’ve collected. I don’t like to share opinions rashly.

Connectedness

People with this theme tend to believe that there are no coincidences. Every situation has its own reason. That talent has always helped me in my learning process. When working closely with someone I know that there is a reason. It’s just me who needs to find it. What can I learn from this person? Something technical? Soft skills? Even if our collaboration wasn’t successful – what was the cause? What insights can I take for future when I will have to work with someone similar? What themes could this person represent? What’s more, I do not use this theme only with people. It fits all my previous top themes in such a perfect way! Connecting information, connecting insights, connecting content, connecting people. Asking why and seeking for the answer. Isn’t it something we do with the historical data as well?

Wrap up

I enjoy collecting information (INPUT), ingesting it (LEARNER) and putting into the right order so that after transforming it to a simpler form (INTELLECTION) I can combine it with everything I have ever learnt (CONNECTEDNESS) and share with others so they could benefit from it (INDIVIDUALIZATION).

Yes, I love my talents.

proud" Meme Templates - Imgflip

How it corresponds to data science and my blog?

For those who are still reading and haven’t left my site being disappointed that this article is not the data science content they are used to – please, wait! I am writing about this on purpose. Data science techniques, algorithms, statistics – what is it all about if we are missing the domain?

Being a member of Dominik’s Facebook Group where we support each other in CliftonStrengths themes development, I have came across a great data source. They have an Excel list where people share their CliftonStrenghts results and basic information about themselves. Since the day I have seen it for the first time, I couldn’t forget about it! Maybe it’s not a lot of data for now, but how much potential it gives for the future! What if I analyze the patterns visible in the data currently? What conclusions can I come to now? Is there anything I could do with this data with its current form?

I have exchanged some messages with Dominik and hopefully he has shared my enthusiasm about this project. I have received his approval to anonymize the dataset and perform some introductory exploratory data analysis. Based on its results, we can share reflections with the group and brainstorm together in which direction this project should evolve. Are there any specific questions we would like to get answers to? Is there a model we could create to help people develop their strengths? How could we update the current Excel sheet to collect valuable information to use in the model?

I would like this article to be just a beginning of this journey. The starting point from which we could seek for more. How can we describe Dominik’s community members who have already done their CliftonStrengths assessment? What themes and domains are most common in this group? Are themes dependent on a gender in this population? What themes we can observe most often in specific industries in Poland? Could we use this information to match people based on their CliftonStrengths assessment results to the suggested professions? Is there a possibility to create something like profession recommendation system based on the assessment results? Let’s see.

Data preparation

It’s often stated in data science world that preparation is usually the longest and most tedious step when working with data. As mentioned before, the dataset I am working with is the open Excel Sheet where users can fill their information after participantion in CliftonStrengths assessment. I have downloaded the file while there were 1494 responses and started the cleansing process.

Suggestions for Excel sheet improvements

I have spent several afternoons trying to get to the point from which analysis could be started. First thing I would suggest Dominik from the beginning is limitting user input a little bit with dropdown lists. It might probably reduce some information but definitely prevent from typos and errors. Polish language is quite demanding regarding spelling and respondents did not pay much attention to the errors while providing their professions or city names. Although I have applied many string manipulation methods, to e.g. remove special characters, interpunction, make words lowercase, same information still could be spotted in various versions of spelling. If I was to suggest anything, I would keep free text column with the profession to leave responders this flexibility to specify how they define theirselves professionaly but also added one more industry column with a limited dropdown list to assign yourself to a group.

Same approach I would take for location. Although this field is optional in the Excel sheet, users have added their cities with plenty typos. Limitting this column with a dropdown list would be hard obviously due to the hundreds of cities in Poland but adding another column for voivodeship could be helpful.

Data frame transformations

I have introduced suggested columns in the data, also added the column for gender based on the person name (in Poland it’s easy – all female names ends with “a” letter) and below you can see the result I’ve received (I would recommend reading this post on your desktop as some of the screenshot might not be visible on mobile):

Apart from personal details you can see 34 columns for CliftonStrengths themes. Users were to provide the number from 1 to 5 giving the information about their top5 talents. So e.g. if someone’s Activator column is filled with 1, then it’s clear that this is his first theme. Although we have a number as a value, it’s a categorical feature.

For further purposes, I have also transformed this initial dataframe with a pivot table. In such format it makes no sense to keep rows with 0 as a score thus I have kept just information of 5 rows per person there and named df as strengths:

To anonymize the dataset I have used PersonID instead of name and surname.

If you wonder what’s Metropolis column for, I have added it to isolate additionally biggest Polish cities and check later if living in the metropolis has any relationship with specific themes.

Summing everything up, our dataset consists of just categorical data (nominal and ordinal). We can and I will definitely introduce continuous information by various statistics, aggregations and manipulations with pandas but it’s definitely point worth noting for future to check what numerical columns could bring value to the analysis if we added them to the Excel Sheet.

General details about the population

Gender

Who are our responders? Do we have more female or male community members? Let’s get deeper into the dataset! I know that I am usually making jokes about pie charts in data science but you can see that in this case you can definitely go for it. It’s visible from the first sight that we have almost twice as women then men here!

Gender distribution in data

Industry

Now let’s look at our respondents’ occupations. You can see below the industry groups I have implemented. It’s visible from the first sight that we have vast majority of people working within IT or Sales and Marketing areas. There is also a big group of currently unemployed people who probably are exploring their talents to seek for best career opportunities. We need to be careful on making assumptions, we cannot say that some occupations are most common for men than for women thus around 1500 people is still a small population to take such insights. In the further steps of analysis, during talents investigation I will probably focus more on best represented industries.

Industry distribution in data

Occupation

As a reminder – industry is a field I have introduced to the dataset based on profession column. Several months ago I have spotted a great tutorial on Mirosław Mamczur’s blog (which I highly recommend to all Polish people interested in data science, especially beginners). In this blog post, he has presented how using several lines of code, you can present text data in a customized wordcloud. It’s such small thing but can really boost your data visualization skills and impress your audience!

All you need is to find a good picture on the Internet (the less details the better) and apply several Python libraries for this satisfying effect. As we are talking about our strengths, I have selected a muscular guy and collected all professions indicated by respondents within his countour. 🙂

# read the mask; 
mask = np.array(Image.open(path.join(getcwd(), "person.png")))
wc = WordCloud(background_color="white", max_words=500, mask=mask, max_font_size=60, random_state=2019)
# generate word cloud
wc.generate(text)

# create coloring from image
image_colors = ImageColorGenerator(mask)
image_colors.default_color = [0,0,0]

#plot wordcloud
plt.figure(figsize=[25,25])
plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off");
Profession Word Cloud – customized version

For the comparison I will also add standard wordcloud and the lines of code to obtain such effect. Please let me know which one would you choose. 🙂

# generate word cloud
wc = WordCloud(background_color="white", max_words=500, max_font_size=60, random_state=2019)
wc.generate(text)
plt.figure(figsize=[10,10])
plt.imshow(wc, interpolation="bilinear")
plt.axis("off");
Profession Word Cloud – basic version

Explaining to English part of the readers, some people asked about their profession answered with the industry name, some with job position. We have lots of pretty chaotic answers there, very often with typos. Nevertheless, some of the most common answers are visible from the first sight – we have a lot of people dealing with marketing, education, trainings, HR, finance, administration and more. Lot of respondents also haven’t shared their occupation (indicated by “nan”). Having this presented in this word cloud format, I am even more sure that implementation of industry aggregation was a good idea.

Location

How about location? In which polish cities can we find biggest numbers of CliftonStrengths enthusiasts?

Voivodeship distribution in data

Biggest groups of members are located in the Masovian (mazowieckie) voivodeship, probably related to Warsaw. Unfortunately lot of people hasn’t provided information about their location.

Let’s also look a little bit deaper and check just biggest cities of Poland. How many of responders live or come from metropolis?

Percentages of respondents living in biggest Polish cities

I have selected some of the cities from all options which can be considered as metropolis in Poland. Looking at above plot, we can easiliy see that almost 25% of survey population is living in the capital city. Other biggest accumulation for our dataset are Kraków with 12% and Wrocław with 10% and then values drop more. 36% of responders either do not live in a Polish city considered as a metropolis or haven’t revealed their location in survey.

Themes Distribution

Let’s start from looking at all respondents together and how are their themes distributed.

CliftonStrengths domains distribution across respondents

From above visualizations we can see that for the Dominik’s group members who have shared their assessment results, talents from strategic and relationship domains are most frequent. From my own experience of a person having those two domains leading I can say that our dataset is very likely to be a little biased. People having themes like learner or input (strategic) will probably more often join communities like this to collect more information. Similarly, people having relationship talents like connectedness or individualization could join to find more diversity and connections. I wonder what do you think in this topic?

If we get deeper, we can see that Individualization is the most common theme for all and SelfAssurance is least common. Does it mean that we Poles believe in the beauty of human diversity but not in ourselves? 🙂 Let me revert to the assumption I have included in the previous paragraph. Majority of respondents have talents which might be the reason of joining the community and seeking for interaction – Individualization, Learner, Input, Relator.

Comparison with Gallup’s Report

To make this analysis more meaty, I will add statistics from Gallup’s reports for our countries. If you are interested in theme frequencies for your own country, you can get familiar with Gallup’s report available here. It would be so interesting to see how various countries are correlated each other, which clusters are being formed and make such insights. I have never come across such analysis – maybe any of you, my dear readers, have anything to share in this topic? If not, maybe it would be good to think about such analysis in future if such data is to be accessed in more editable format.

But getting back to our country, let’s see how our small population is distributed with talents comparing with general Gallup’s population for Poland:

Screenshot from Gallup’s report for Poland

It’s really following the pattern! Although the sequence is not exactly the same, I will risk saying that the features are in more or less the same areas of the chart. Self-Assurance is last for both, confirming my assumption. Themes such as achiever or responsibility in general statistics is much higher than in our population but let’s be honest – approx. 1400-1500 responders is far less than 61800. Probably if we collected more responses, themes would shuffle more. In general, we can say that our group members follow Polish pattern.

Themes distribution by gender

Are you also curious about the differences between men and women in our dataset? Although we have imbalanced data here, having almost twice as much female responders than male, let’s compare the plots:

Themes distribution for men
Themes distribution for women

Some of the differences we can observe based on our population are:

  • Men seem to be more analytical than women
  • Focus is also higher in hierarchy for men
  • Both genders have relator in top5 themes but for women it also goes in pair with empathy whether for men it’s much lower in the distribution
  • Although individualization and learner are very common for both genders, for women individualization wins and for men it’s inversely
  • I am also surprised that we have Woo theme on exactly the same position. It seems that Woo theme doesn’t depend on a gender – both man and women have this talent in similar frequency.

Wrapping up, I am a little bit dissapointed by this comparison. I hoped for more interesting insights, while majority of themes are on exactly the same positions. Unfortunately our population seems to be too small to perform such deep analysis. It also confirms the number of cases taken into Gallup’s report.

Writing about Gallup report, we have also their statistics for men and women. Let’s compare how genders vary from each other in the larger population (please not that it’s not a split for Poland but for all respondents worldwide):

Theme distribution for male in Gallup reports

For the whole population worldwide, individualization is so much lower for men! For Polish male respondents in our little population, we also do not see achiever theme such high. Male analytical thinking is also confirmed here. There are a lot of talents taking similar positions like relator, harmony or self-assurance but there are a lot of suprises for me! Thinking sterotypically, I would indicate that command or significance would be much higher in the hierarchy. 🙂

Theme distribution for female in Gallup reports

Have you noticed – both genders have achiever at the first place! I am proud that’s it’s not a male thing. 🙂 Similarly to our population, here also relator and empathy are close to each other. Comparing to men, women tend to be also more disciplined. We are also more often having developer theme high on the list. It seems that positivity doesn’t correlate with gender but female more often show “woo” feature.

I need to admit that I am also suprised that individualization is out of top5 themes. It seems that it’s much more popular for our group members. However, I wonder again whether it’s just a bias? As you will notice later, there is a lot of people dealing with personal development, coaching in our population. It’s obvious that such people, who are characterized with high individualization and learner or input, will be more often joining closed communities to develop themselves. Similarly, looking at achiever’s first place in Gallup report. Isn’t it like that that people who want to achieve more in life make such assessments and the ones less ambitious have no idea about such opportunities? Please share your thoughts in comments.

Industries overview

Is it just me or is it really interesting how themes are distributed among industries? People who decide to take ClifftonStrengths assessment and then understand their talent DNA often do this to find the right career path for themselves. There are plenty of Facebook group members wishing to change their current job into something more fitting their profile. It’s great step if you feel burned out of your current position. According to our data, there are also students who haven’t begin their career path yet but want to use Gallup’s assessment to understand in which area they could not only succeed but also feel professionally fulfilled.

I am so happy that the times we currently live are so different than the ones of our parents. In the past, job was treated like something you do 40h a week or more. Doesn’t matter if you enjoy it or not, until it gives you money. In our generation, our professions has become an important part of our lives and we do not want to spent such percentage of our day doing stuff we don’t consider interesting. Also the definition of work is different now. More and more often people become freelancers or choose their own business instead of working for someone. We have more and more possibilities to create products online.

So many new opportunities mean more choice, and to choose well, you first need to know yourself and your strengths. Let’s see what are the most popular themes for some of the most represented industries from our population.

IT

IT is our most represented group regarding industry. We have 256 respondents here (162 men and 94 women). Let’s see what are the differences between man and female working within this field.

TOP5 talent distribution in IT for men
TOP5 talent distribution in IT for women

It’s worth mentioning how high in the distribution you can observe learner and input for this group. Having this two themes in my own TOP5 and working within IT at the same time, I can definitely confirm that it really helps. In such quickly evolving industry, where you need to stay up-to-date with all technologies and skills, you need to be hungry for knowledge. Women tend to have Individualization as a frequent skill within top5 – I wonder how they find it useful. For me, having this theme as the first one, I really benefit from it when I run trainings to the Citizen Data Science community. Unfortunately I am afraid that such small population can be not enough to take more serious conclusions.

Sales and Marketing

Let’s go with the second most represented industry now. For Sales & Marketing we have 174 respondents (116 female and 58 male).

TOP5 talent distribution in Sales & Marketing for men
TOP5 talent distribution in Sales & Marketing for women

Individualization definitely is a good talent for this area. To sell the product or attract the potential clients, you need to think like your customer. Men also use ideation to achieve better results. I wonder also how empathy and relator are used by both genders. What’s suprising for me is how rare is Woo theme in this group. My immediate thought for Sales & Marketing would be exactly this one!

Finance and Accounting

For Finance & Accounting we have 82 respondents (63 female and 19 male). For such poor representation for men we cannot definitely make brave assumptions but let’s look at the results:

Looking more at women, which are more represented group, we can immediately spot how high in the hierarchy is Responsibility here. Responsibility theme forces you to take psychological ownership for anything you commit to, and whether large or small, you feel emotionally bound to follow it through to completion. When you deal with finance or accounting, it’s definitely useful theme! Similarly to Deliberative (being careful, vigilant) – it’s good to leave our money in such hands. People working within this industry are less focused on future and developing new ideas (like it was for IT) than on retrospective and historical analysis.

Education and Science

For Education & Science I have aggregated people either doing any scientific research, PhDs or the ones responsible for various trainings and education. We have 77 respondents like that, mostly women (66).

Again I will look deeper into the chart for female. Relationship and strategic talents domains are very strong here. It’s good to have people dealing with education representing hard empathy and relator skills. This group is more focused on learning, knowledge sharing and development towards others than on competition or leadership. Futuristic and Ideation are also pretty high in the hierarchy.

Architecture and Construction

Now Architecture & Construction so area for all architects, designers and building constructors. Although it seems to be more male field, it’s dominated by female architects in our dataset. We have 62 respondents and 43 of them are women. Let’s see what characterises them.

Both genders are great learners. It’s also worth to indicate how high Restorative (love towards solving problems) is in the hierarchy. Men are more analytical and focused there, while women are more creative (ideation, futuristic). Probably that’s the split between constructors and architects/designers but it’s just my assumption.

Project Management

We have also a well represented group of people interested in project management – so all people dealing with quality, productivity and effectiveness. We have Scrum Masters, Project Managers and more Agile and Lean roles here. 58 respondents in total from which 32 are women.

I really like how Significance (willingness to be recognized) plays important role for men! It means that male representatives of Project Management tend to enjoy working with things which are really important for the whole organization and they feel standing out in their roles. Individualizaton and Relator is crucial theme for both genders here. Men represent Achiever and Activator talent here better, which would be my immediate guess for this industry. For women it’s more relationship themes than strategic ones. To be honest I would also consider Maximizer higher in the structure not only for women but for men, but maybe it’s because our gender is better represented in data.

HR

Time for HR! To be honest I have thought that this theme will be the one of the most represented themes in the dataset but it’s just 56 respondents within this area. People seeking for talents should be the ones most interested in CliftonStrengths for me, but… seems that other areas use the assessment more actively. 🙂 It can be also just our population, let’s not forget about this.

There are only 7 men within this industry so it makes not much sense to analyze such poorly represented data. For women most represented talents are quite easy to guess. I see that my talents would match there perfectly as I am having 4/5 from top5. Women in HR are very rich in relationship talents. Till now, we haven’t got any industry with such high level for Woo. Significance, Command, Competition and other Influencing themes are definitely not crucial in recruiter role. I really regret that we are having such poor representation of men here because it would be so interesting too see if such condensation of relationship themes is more women thing or HR characteristic in general.

Psychology and Coaching

And now another theme which I expected to be better represented – Psychology & Coaching. By reading group members’ posts I had a feeling that we have such a big group of coaches in the community. Maybe they’ve simply didn’t shared their occupation? Neverless, we have 51 respondents in this area, from which only 10 are men.

Both genders have relator on the first place which is quite understandable. I would consider Individualization as the top one, but it’s also high in the hierarchy. Developer and Empathy play an important role in this area. Communication talent has never been on such high level till now. I am quite suprised that Positivity is not higher but in general seeing this chart without the title I would definitely guess easily about which industry are we talking.

Medical Services

Another group I wanted to discuss is Medical Services. Those are variety of doctors from different specializations but dieteticians play a really important role here. We’ll look more at women for another time, as out of 43 respondents from this industry, women are 37.

I definitely understand why top5 talents are on their places. You need to enjoy learning, attending conferences and courses, read a lot scientific books and publications but also have many relationship domain themes to understand your patient right. You need also to be responsible for your actions. To be honest, I would consider empathy to be more significant than relator – for the people more into StrengthsFinder education, I wonder, what are you thoughts about this?

Engineering & Electronics

I have combined various engineers with people dealing with electronics as both areas are very technical. We have 39 respondents in total from which majority (25 people) are men.

Both genders are having strong ideation representation within their talent DNA. Focus, Analytical and Futuristic also would be my guesses and play important role. For this industry unfortunately more data would be needed to make more reliable insights.

Own Business

Last but not least I have selected people running their own businesses. It’s a total mix of all industries under same umbrella – those people are not working for any employeers, they’re just their own bosses. We have at least 39 of such people in the Excel sheet (20 men, 19 women). Let’s see if with such representation we are able to spot any interesting insights.

For men activator plays crucial role while for women is more about relationships. Is it the factor which activates the decisions to start own business? For both genders we have quite significant ideation and futuristic themes. Significance is also very high. What’s also very interesting is the fact that for the first category we have command on such high place. All of that just confirm that CliftonStrengths assessement and exploring your talent DNA is a wonderful tool to explore in which industries you would succeed in.

Themes per localization

I’ve decided also to investigate how talents are distributed across all voivodeships in Poland but to be honest haven’t found any significant conclusions. For majority of areas, top talents are similar with the advantage of talents such as Individualization, Input, Learner or Relator. Nevertheless I will leave the plots below so that anyone interested could analyze differences on their own. Please pay attention on the y axis where you’ll find details about number of responses per every voivodeship.

For more technical readers I am leaving also the code for the loop I’ve created to visualize all voivodeships at once. I hope you understand that I won’t be able to share all code with you this time due to GDPR but in case you’re interested in any specific part of analysis, please feel free to leave your question in the comment or reach out contact page or on my instagram accounts (@datascientistdiaryPL and @datascientistdiary).

Remark: For English readers, you can find English names for Polish voivodeships in figures’ descriptions.

def plot_voivodeships(df = strengths):
    for voivodeship in df['Voivodeship']:
        # create 2 dimension data with the same feature in future data and in training data

        data = df[df['Voivodeship'] == voivodeship].groupby('Talent').size().reset_index()
        data.rename(columns={data.columns[1]: "n" }, inplace = True)
        data = data.sort_values('n', ascending=False)
        
        # Figure Size
        fig, ax = plt.subplots(figsize =(13, 9))

        # Create data
        x = data['Talent']
        y = data['n']


        # creating the bar plot
        plt.bar(x, y, color ='slategrey', width = 0.6)

        plt.xticks(rotation=90)
 
        plt.xlabel("Themes", fontsize = 16)
        plt.ylabel("Number of occurences across respondents from {} voivodeship".format(voivodeship), fontsize = 12)
        plt.title("CliftonStrengths themes distribution for {} voivodeship".format(voivodeship), fontsize = 24)
        plt.show()
CliftonStrengths distribution for Lower Silesia voivodeship
CliftonStrengths distribution for Kuyavia-Pomerania voivodeship
CliftonStrengths distribution for Lublin voivodeship
CliftonStrenghts distribution for Lubusz voivodeship
CliftonStrengths distribution for Lodz voivodeship
CliftonStrengths distribution for Lesser Poland voivodeship
CliftonStrengths distribution for Masovian voivodeship
CliftonStrengths distribution for Opolskie voivodeship
CliftonStrengths distribution for Subcarpathia voivodeship
CliftonStrenghts distribution for Podlaskie voivodeship
CliftonStrengths distribution for Pomeranian voivodeship
CliftonStrengths distribution for Silesian voivodeship
CliftonStrengths distribution for Świętokrzyskie voivodeship
CliftonStrengths distribution for Warmia-Masuria voivodeship
CliftonStrenghts distribution for Greater Poland voivodeship
CliftonStrengths distribution for West Pomerania voivodeship
CliftonStengths distribution for people living abroad
CliftonStrengths distribution for unknown location

As you can easily see, some of the voivodeships are not represented well enough to take any conclusions, e.g. świętokrzyskie. I would definitely recommend Dominik to encourage community members more to fill such details and take assessment. There is still a lot of respondents who haven’t shared their location (last chart). For sure we have also lots of community members who despite taking assessment, haven’t added them into the file. Maybe in future, having more data in the dataset it will be possible to take those insights to another level and use in some kind of model. I would definitely also analyze the differences between Polish people living in our country vs the ones who decided to emigrate.

Relationships in data

What would an exploratory data analysis be without hipothesis testing? While cleansing data and collecting my first questions, it was interesting for me whether the few variables we got describing responders are dependent from each other.

To check the relationships between variables I have decided to use chi-quared test as all of them are categorical. For the ones less familiar with statistical topics, you can find the definition below:

Chi-Square test is a statistical test which is used to find out the difference between the observed and the expected data we can also use this test to find the correlation between categorical variables in our data. The purpose of this test is to determine if the difference between 2 categorical variables is due to chance, or if it is due to a relationship between them.

Source: AnalyticsVidhya
The formula of Chi-Squared test

For the test to be effective, at least five observations are required in each cell of the contingency table.

While conducting the chi-square test we have to initially consider 2 hypothesis i.e the null hypothesis and the alternate hypothesis.

  1. H0 (null hypothesis) – the 2 variables to be compared are independent.
  2. H1 (alternate hypothesis) – the 2 variables are dependent.

Now, if the p-value obtained after conducting the test is less than 0.05 we reject the null hypothesis and accept the alternate hypothesis and if the p-value is greater that 0.05 we accept the null hypothesis and reject the alternate hypothesis.

Now I would like to move to the practical implementation in Python. In case you’d like to get more details about contingency tables and other relevant theory, I would highly recommend this resource.

Example piece of Python code for Chi-Square test:

from scipy.stats import chi2_contingency

# Cross tabulation between GENDER and INDUSTRY
Crosstab_Gender_Industry = pd.crosstab(index=strengths['Gender'], columns=strengths['Industry'])

# Performing Chi-sq test
ChiSqResult_Gender_Industry = chi2_contingency(Crosstab_Gender_Industry)

# P-Value is the Probability of H0 being True
# If P-Value<0.05 then only we Accept the assumption(H0)
 
print('The P-Value of the ChiSq Test between Gender and Industry is:', ChiSqResult_Gender_Industry[1])

Industry vs Gender

Is the Industry in which the respondents work dependent on the Gender? Let’s see.

Chi-Squared test Industry vs Gender

P-value is smaller than 0.05 thus we are rejecting the null hypothesis. Gender and Industry variables are dependent.

Industry vs Voivodeship

Is the industry in which the respondents work dependent on the voivodeship?

Chi-Squared test Industry vs Voivodeship

According to what I’ve read on various forums, if the statistical software renders a p value of 0.000 it means that the value is very low, with many “0” before any other digit. Thus we can also reject the null hypothesis there. Industry and voivodeship seem to be dependent.

Remark: In case you have other information around 0.0 result for p-value, please reach out to me. I can honestly admit that although making some research I am not 100% sure about this and if you’ve got better experience, I am open for some advice.

Industry vs Metropolis

Is the industry in which the respondents work dependent on working in metropolis?

Chi-Squared test Industry vs Metropolis

Again, p-value smaller than 0.05. Metropolis (whether respondent lives in a big city, e.g. Gdynia, Gdańsk, Warszawa) and Industry are dependent.

Industry vs Talent

Is having a specific talent in top 5 having a relationship with working in a specific industry?

Chi-Squared test Industry vs Talent

This one I was more curious about. 0.016 is still smaller than 0.05 thus we can say that Industry and Talent variables are also dependant on each other.

Next Steps

Woah, that was the longest blog post here definitely! Have anybody out there managed to go through all of it? I really hope so, because at least for me, working with CliftonStrengths data was really pleasant. Now it’s high time to collect final reflections and ideas for further steps.

  • More data should be collected and new columns could be added. I think you’ll agree with me that we would have obtained more conclusions having bigger population of respondents. Althought 1500 responses seem to be a lot, while split into voivodeships and industries it’s really poor representation. We can drive some interesting conclusions by limiting data to several locations and professions but in general many groups are so poorly represented for now. Comparing with Gallup reports, our dataset is really small. However, I think it’s a great starting point. I will present my analysis to Dominik and his community members and I hope that they’ll be eager for some brainstorming. Having so many perspectives, so many people talented in various domains, we definitely have potential to improve the way of collecting data, inspire more members to reveal their assessment results and think about ways how to benefit from such resource in future.
  • Occupation recommendation system idea. First thing that comes to my mind would be creating some kind of industry recommendation system. There are so many people, thinking about switching into another role and hope that CliftonStrengths assessment will give them the answer. Talents are definitely correlated with professions, which we have obtained during the analysis, thus if we collected more data, maybe it would be worth trying to create such solution?
  • Talent pairings. I was also thinking about talent pairings. According to Dominik’s podcasts and CliftonStrengths materials, some talents combined can benefit from each other. We can seek for a talent partner, who will have the theme we are missing and can support us in development or decision making. There are so many ideas coming up to my mind and I hope that I am not the only one. For sure there are many resources I am not familiar with, which we could add to our Excel Sheet and carry the analysis further.

All the filters, all aggregations I have used are chosen by myself. In case you’d like to know any details, please reach out to me. I am aware that there might be some better ways of input aggregation and columns transformations but as mentioned many times – it’s just a draft. Anyway, I am really curious about your reflections, ideas and tips about what have I done for now and what can we do in the next steps. 🙂 Or maybe it’s not worth at all? I’ll leave the decision to you.

Summary

If you feel that this post has given you less practical information regarding technical skills, I totally agree. It wasn’t its main purpose. However, my main goal for today is to encourage you to do the same. Not to take CliftonStrenghts assesment (although I still consider it as brilliant idea), but to take this one area that you are passionate about and make your own analysis. You’ll never learn coding or data science mindset just by looking at someone’s work. Working on a datasets like Titanic or random ones from Kaggle may be a good idea on the beginning but when you are a beginner. Later, you need to move outside your comfort zone and search for your data on your own. Only that way you’ll expose yourself on various scenerios that happen also during your career path.

If you are interested in sports, scrape data from your favourite websites and create a beautiful dashboard. Shopping addict? Scrape data from online shops with the items you’re interested in and do an analysis of all offerings. Learn how to make decisions based on collected data – choose a holiday destination, new shoes or laptop based of scraped reviews. If you are a bookworm, scrape books data and create your own recommendation system! You can then share it with your friends so that they could find next positions to read according to their reading history or book ratings. Or go totally crazy and create something hilarious like funny user names generator or a chatbot.

Data science and all dyscyplines around are so beautiful as the only thing that limits you is your imagination (… and the lack of data sometimes!). Even if you are not a data scientist yet, take it as an advantage and use your time to work on something you are really passionate about. Then, in future, when you’ll take part in your dream job recruitment, you will be seen as a true enthusiast who despite the lack of experience, got out of the theory comfort zone and is ready to work with a valuable mindset.

Leave a Reply

Your email address will not be published.