The development of artificial intelligence, e. g. for Computer Vision, through supervised learning requires the input of large amounts of annotated or labeled data objects as training data. Usually, the creation of high-quality training data is done manually which can be repetitive and tiring. Gamification, the use of game elements in a non-game context, is one method to make such tedious tasks more interesting. We propose a multi-step process for gamifying the manual creation of training data for machine learning purposes. In this article, we give an overview of related concepts and existing implementations and present a user-centered approach for a real-life use case. Based on a survey within the target user group we identified annotation use cases and dominant player characteristics. The results served as a foundation for designing the gamification concepts which were then discussed with the participants. The final concept includes levels of increasing difficulty, tutorials, progress indicators and a narrative built around a robot character which at the same time is a user assistant. The implemented prototype is an extension of an existing annotation tool at an AI product company and serves as a basis for further observations.
Artificial Intelligence (AI) is becoming increasingly important. For the development of AI models and applications, human intelligence is still necessary, especially regarding supervised learning which entails that a machine is trained with labeled data. The training process mimics a human learning process, deriving patterns and creating a model. The creation of necessary labels is usually performed with the aid of humans. Due to the necessary amount of training data, the creation process is typically highly repetitive and quickly becomes a rather unexciting, demotivating task for the annotator.
A task that is repetitive and tedious turns out to be the ideal use case for applying gamification . Gamification itself is defined as the use of game elements in a non-game context , aiming for certain psychological outcomes such as motivation, enjoyment, and flow. Previous research shows that a gamified environment for data annotation has the potential to increase user engagement and gratification . Improved user experience is a goal of gamification, as are increased participation, the attraction of a younger audience, optimization of workflows and increased engagement of users, as well as immediate feedback for the users on their performance . While most of the research is focused on the effects among younger adults, especially in educational contexts, some work also exists for other demographic groups. Koivisto and Malik  published a systematic literature review regarding the impact of gamification for older adults, e. g., in the health domain. Aydin  studied the differences of motivational factors and use intentions in relation to gender and age, concluding that “use intentions were not significantly different between younger and older users likewise between genders; however, the factors that lead to adoption of gamified systems differed between them.”
Gamification of company workplaces has just recently gained in importance – not only for the training but also to encourage employees in their daily work routine. A tool with well-designed game elements at the workplace can keep employees motivated to perform their tasks . Herranz et al.  reported on a gamification platform aimed to increase motivation in software projects, showing “remarkable success”. A recently published literature review by Khakpour and Colomo-Palacios , investigating current research at the intersection of machine learning and gamification, emphasizes the benefits of using game elements for data collection for supervised learning.
This article presents the results of our work aiming at gamifying an existing annotation tool for the creation of training data at the AI product company AI4BD. We describe our multi-step development process, thereby laying the foundation for future user studies to investigate the effect of the implemented game elements. Based on our experiences and findings the main contribution of our work is a recommendation of how to proceed when designing and integrating game elements into a pre-existing productive system. In the next section we start with a review on related work regarding foundations of gamification, examples for the gamification of annotation and labelling processes, and the potential risks of adopting game elements. Subsequently, we describe the requirement analysis process which includes a survey among the company’s employees aiming at the identification of their “player characteristics” and the definition of basic gamification concepts suitable for our scenario. Finally, we explain the chosen game elements and their realization.
2 Related Work
The topic of gamification opens up a vast space of related literature and concepts. Before starting to include game elements in existing applications or tools, it is strongly recommended to have a basic understanding of motivation psychology and motivational factors and how they affect a user. This knowledge is important for the selection of appropriate game mechanics and elements as well as for the assessment of existing implementations of gamification since the application of game elements can involve some risks if not used deliberately. The following sections will cover these aspects which contain important insights needed for our gamification process.
2.1 Psychological Background
In 2008, Thaler and Sunstein  established the term nudging, which expresses the manipulation of people into making decisions or behaving in a certain way by adapting the choice options they have or the context of these options. The authors state that “the false assumption is that almost all people, almost all of the time, make choices that are in their best interest”. A frequently mentioned example is the placement of fruit in a school cafeteria at eye level and of candy at a place that is more difficult to reach. With this measure, the “cost” of choosing candy is bigger than the one of choosing fruit, which is therefore the preferable option . While nudging is more general, gamification can be regarded as a modern way of nudging trying to influence people’s behavior by actively manipulating their motivation specifically with the use of game elements. In a study, van Roy et al.  analyzed the motivations of 83 students who were using gamified online learning platforms and derived “learning, curiosity, fun, need for closure, and competence” as the main motivators. The study concludes that despite not serving as the main motivation to start using the platform, gamification can help to keep users attached to the platform, once they have already opened it. In a work environment, however, the initial use of a work-related tool is motivated by the obligation to perform the work task in itself. Therefore, we focus on the aspect of how to keep the users’ attraction to the tool.
|Game Dynamic||Game Element||Motivator|
|Cooperation||Teams ||Relatedness (Social Connections) |
|Transaction||Gifting ||Purpose |
|Story/Narrative||Story, Badge, Achievement , ||Purpose |
|Exploration, Surprise||Unlockable ||Autonomy |
|Reward||Badge, Achievement ||extrinsic motivation (Reward) |
|Progression||Levels, Progress Bar, Points ||Competence |
|Resource Acquisition||Achievement ||Competence |
|Boundaries||Limited Resources , ||Competence, extrinsic motivation (Avoidance) |
|Competition||Leaderboard, Points ||extrinsic motivation (Peer Pressure) |
|Status||Leaderboard, Levels, Achievement , ||Relatedness (Social Status) , Competence|
This motivational aspect of gamification is strictly linked to behavioral and psychological models, one of it being the Self Determination Theory (SDT) which was created by Ryan and Deci  and linked to gamification by Andrade et al. . SDT uses empirical methods to find a correlation between a person’s “innate psychological needs” and their self-motivation . According to how well a certain action meets these needs, a person can show three different types of motivation towards it: self-determined, intrinsic motivation which entirely derives from a person’s inherent wish to execute an action , secondly amotivation (or simply: not being motivated), and lastly non-self-determined extrinsic motivation where external factors trigger an action. As one result of the findings, three fundamental human needs are named: relatedness, autonomy, and competence. Marczewski  takes up this theory within his RAMP model, extending the fundamental intrinsic motivators to Relatedness, Autonomy, Mastery, and Purpose (RAMP). In the following, these intrinsic motivators will briefly be defined.
Relatedness refers to the human need to connect with others, where people are described as social beings who long for belonging. This can be expressed in such aspects as communication, comparison and, in general, aspects which trigger the feeling of being a part of a group.
People are much more motivated when they perceive their actions, decisions, and thoughts as independent and free from external influence, and therefore less motivated when being put under control or a set of rules to obey .
People tend to seek for a certain aptitude or capability, concerning knowledge, skill or power. Marczewski  calls this aspect Mastery which he defines as the constant improvement of a skill “in direct proportion to the level of challenge”.
This desire can be linked to altruism, often present in welfare and charity, where simply the fact of having a purpose of importance (donating or doing good) is the goal . People may find a task much more intriguing when there is a reason or a greater meaning behind it.
2.2 Game Mechanics and Player Types
Over the years a lot of different approaches to defining and classifying gamification have been established. Several terms came up which are linked or subordinated to gamification, such as Serious Games, Games with a purpose, playfulness, gamefulness and many more . However, the general purpose of gamification is to motivate the target group , or more precisely “using game-based mechanics, aesthetics and game thinking to engage people, motivate action, promote learning, and solve problems” . Thus, gamification does not mean that a stand-alone game is to be added, hoping for an improvement in employee engagement, but instead to analyze game mechanics and visuals, and select game parts which match the use case.
2.2.1 Game Mechanics
We use Game Mechanics as the hypernym of Game Dynamics and Game Elements. By Game Dynamics, we denote the strategies and characteristics of games, but also the needs, a player wants to have fulfilled. These needs are, for example, the strive for competition, exploration or social interaction. We will regard Game Elements as the actual components found in a game, such as points, leaderboards or avatars. Related literature describes game elements as “the building blocks that can be applied and combined to gamify any non-game context” . This distinction being made, it is still possible to map one to the other. Table 1 shows several Game Dynamics as well as Elements that trigger them respectively. For example, the Game Dynamic progression can be supported by the Game Element levels or progress bar, which is intuitively understandable since the feeling of advancing can be triggered with new levels being reached or even unlocked, as well as with a progress bar which is filling up increasingly. The Game Element points can be regarded as a progression trigger, under the assumption that the number of points is an indicator for the player’s playing skills which means that an increase of points correlates with improved skill. On the other hand, points can also be used to satisfy the need for competition. The dominating motivator in these cases is competence (alternatively called “mastery”) . Now, knowing these elements, one might be tempted to simply pick the ones with the greatest appeal and surprise the employees with a generic game layer featuring a leaderboard and random scores. However, this method has its drawbacks and is criticized by  who call it the “one size fits all” approach. They suggest a focus on the context which is aimed to be gamified and to consider “specific user needs, goals and values”. Therefore, we will follow a user-centered design.
2.2.2 Player Types
Following the user-centered design, it is necessary to get familiar with the players. Some authors even recommend a thorough personality analysis of the users, with aid of personality type models such as the Big Five or The Myers-Briggs type indicator , . It is assumed that knowing a player’s personality traits, gamification can be built according to their personal needs and thus make it easier to trigger their intrinsic motivation. However, the personality type can also provide insight into how prone to certain dangers of gamification a user might be. Several ways of classifying players have been established so far. Notably, many of them are based on Bartle’s 1996 theory of four main player types . Bartle’s theory was developed based on the question “What do people want out of a MUD (Multi-User Dungeon)?”. The author collected players’ answers and then categorized them into four main motivations which he turned into player types, meaning classes of users participating in a MUD who share common primary goals. First, there are Socializers who aim for inter-player relationships, empathize with people and enjoy observing them. The Killers are likewise focused on other players but aim for imposing themselves on others by attacking them and want to win at any cost. The Achiever type is also interested in winning but less to defeat others rather than for the sake of points-gathering and rising in levels. Lastly, Bartle defines the group of Explorers who enjoy progressive actions, figuring out how things work and the discovery of interesting features. As these player motivations are not mutually exclusive, a real-life player is regarded as a combination of all of these types at different rates, of which some are more and others are less dominant.
Despite being initially derived from a multiplayer game context, Bartle’s theory is still highly present in today’s player classifications. The names used for the player types can vary greatly. The Explorer type, for example, is also referred to as Free Spirit or Creator , Detective or Navigator , depending on the particular focus. What appeals to these explorative players, is a game that is highly adaptable and satisfies their need for autonomy with elements such as custom avatars and many unlockable items. Nonetheless, a game with such elements can still attract users of the Achiever type who may not be willing to spend 30 minutes on choosing an outfit. The possibility to skip such decorative steps should be given in their behalf, as well as additional elements that feed the Achievers’ competitive needs, such as a leaderboard. A leaderboard might, however, demotivate less competitive users. Therefore, game elements should be selected deliberately and with a lot of attention to the users to prevent unpredicted and undesired behavior.
2.3 Gamification for Annotation
Having observed gamification in a general way, we analyzed existing approaches that use game elements in the context of annotation. We present three examples, their setup, features and how they relate to our use case.
2.3.1 Gamification in Video Labeling
A game for video annotation was designed in . They thought out three different game approaches: a label vote game, an entity annotation where users were asked to assign a certain category to a video segment, a click game, where users had to locate a certain object inside the video and click on it, and a bounding box game, which asked users to draw a box around a specific object. The last one was implemented and evaluated with the aid of 20 persons who had not been in touch with the data or the use case ever before. A questionnaire was answered as well, showing that the users liked the game but also agreed that it got more repetitive and boring with time. Used game elements were a progress bar, levels, an optional leaderboard and statistics over experience. The author also mentions the struggle of creating a level system with increasing difficulty for an annotation use case while maintaining the accuracy of the results. In general, the quality of the labels was not satisfying as the resulting bounding boxes were inaccurate. Users also stated they were not willing to spend more time on the tool. Still, the author concludes that a gamified approach could be of advantage concerning annotation cost, given that a very efficient and well-thought-out game is developed. We assume that this is an example of the “one size fits all” gamification approach as apparently, the gamification concept did not regard any adaptive measures towards the user needs. However, this might have been caused by time limits. It has to be mentioned as well, that in this case a full game was developed from scratch, instead of including game elements into an existing tool. Also, unlike our use case, participants were not regularly confronted with an annotation use case and therefore they did not have any experience with this task. We do not see the goal of gamification in convincing every possible user, but in the adaptation and improvement of a tool for a certain group of users.
2.3.2 Tags You Don’t Forget: Gamified Tagging of Personal Images
Runge et al.  created a game to annotate personal photos. They developed two mobile applications (one single, one multiplayer) and compared them to a simple tagging application without any gamification. Concerning game elements, the authors mention that simple playful elements, for example, acoustic feedback for interaction, can already be sufficient in order to motivate a user. The single-player app was a simple tagging app, while the multiplayer app was developed as a Tagger-Guesser-Game. Here, Player B was shown a photo and had to choose between several tags to guess the one that Player A had selected. This approach is similar to the ESP game , which uses human aid for image recognition. Assigned labels were rewarded with a point. Correct answers in the multiplayer app were likewise rewarded with one point, while one point was lost for a wrong answer. The labels were evaluated by an expert as being of “good quality”. Besides, a questionnaire was answered by the participants to analyze their impressions of the game. They stated that the multiplayer app was much more entertaining, whereas the single-player app helped them memorize the labels better. The insight we take from this example is that less is more when it comes to the selection and amount of game elements.
Lastly, we analyzed crowdsourcing tools, which often include game elements to engage users. Google Crowdsource  is a desktop platform as well as a mobile app, which makes use of humans to improve Google tools such as Google Photos or Google Translate and can be used by anyone who has a Google account. There are different kinds of tasks that can be performed, e. g., image labeling, approval of image labels, handwriting recognition, and translation, which is close to our use case. In contrast to the two above-mentioned examples, Google Crowdsource is an established user contribution platform which is not a game in itself, but includes game elements, like levels, points, badges, and a leaderboard. It simply works by triggering the basic human needs : relatedness, since everything is open and visible, autonomy since there is no pressure and users are free to decide when or whether they participate, and above all purpose since users get the feeling of being an active part in the improvement of popular software tools.
Another crowdsourcing-based approach incorporating game elements was proposed by Altmeyer et al. . Their goal was to encourage people to keep track of their expenses using OCR (optical character recognition) to analyze grocery receipts. The recognition is trained by crowd input (classifying a given extract of a receipt or categorizing an item). The implemented smartphone application features achievements, points, and a leaderboard to motivate users and to increase the amount of user contribution. Also L’Heureux et al.  used gamification to motivate users to participate in a sensor data labelling task on a crowdsourcing platform, using game elements like a level hierarchy, a leaderboard, and special rewards to create a competitive environment. However, a dilemma of such crowdsourcing-based approaches is the lack of knowledge about the characteristics of the user group, making it difficult to design for specific user needs and player types.
2.4 Dangers of Gamification
Since gamification makes use of game elements, it is necessary to keep in mind that with these elements some of their risks might also be adopted. One way to approach this topic has been executed by Callan et al. , where ten fictional scenarios of gamification are presented which have been wrongly established in businesses. Recurring problems were a lack of goal-orientation, unsuitable game elements and rewarding, and the danger of revealing too much information to both, employees and employer, which they might attempt to use for their benefit. Employees should, for example, not feel tempted to demand a higher salary because of the gamified system providing them with more feedback on their above-average performance which they would not have gotten otherwise . At the same time, employers should not use the information they receive from the gamified system against the employees in any way.
Furthermore, the term addiction is mentioned in this context . Here, however, it is regarded much more as a dependency which users might develop if they get used to the presence of game elements in connection with the task to be performed and hence lose their motivation to perform said task without gamification. Another essential aspect to consider is the danger of unwanted competition . If there are users in the target group who dislike competition, when using a system which relates on competitive elements, negative side effects can occur. They will feel pressure, and get demotivated which may lead to a worse performance on their side or even the desire to stop using the system, knowing that their colleagues can see how well they performed. It is evident that especially in a work environment, such effects are highly undesirable and should be avoided at any cost.
Since not all possible risks and dangers can be foreseen, one important measure to prevent negative effects on the users, is the constant monitoring of user activities, the detection of abnormalities and suspicious behavior, and a respective adaption of the system . After all, no ideal system can be created from the very beginning. Also, no team of employees will stay the same over a longer period, and with people, preferences and needs will change. A promising long-term solution is the creation of an intelligent, adaptive gamification application , .
3 Requirement Analysis
Deterding et al.  describe the procedure of developing a successfully gamified tool as a “full circle” process: “from formative user research through synthesis, ideation, prototyping, design and usability testing”. Regarding potential risks of gamification (e. g., wrongly guided motivation, off-task behavior, unwanted competition, addiction and dependency) that might be adopted into a system, it is necessary to define a clear goal that is to be achieved to have a focus while conceptualizing the approach and productive game elements. Concerning the annotation task, we regard three central metrics that can be improved: quantity (how many annotations are created), quality (how good/correct are the annotations), enjoyment (how much fun is the annotation task).
In the following, we first describe the current annotation process with the existing tool support, present our findings from a survey among the employees, and finally sketch two possible gamification concepts for this use case.
3.1 Current Annotation Process
AI4BD’s existing annotation tool is a multi-user web application which offers registered users a sophisticated annotation environment for collections of images (typically scanned documents) or raw texts. The annotation tasks could be basically distinguished into four different types:
Handwriting annotation, where annotators need to type character sequences visible in given images, e. g., numbers of measurements (an example is shown in Figure 1).
Identification and classification of document parts, where annotators need to identify and classify parts of a document, e. g., to mark address or key-value fields inside a document using semantic bounding boxes (see the screenshot of the annotation tool in Figure 2).
Visual image classification, where annotators need to distinguish between different kinds of images (of documents), e. g., classify to which template a filled form belongs to.
Natural language processing (NLP), where annotators need to identify and classify entities or their relation in raw texts, e. g., to mark all persons or organizations in a given text (see the example text from a screenshot of the annotation tool in Figure 3).
In order to ensure quality, an annotation is being reviewed after the creation. Selected users who have the role “reviewer” assigned to them can access additional features in the annotation view allowing them to approve or refuse an annotation. The review of handwriting annotations is currently semi-automated by automatically marking an annotation as “approved” if at least two distinct annotators create an annotation with the same value.
3.2 User Survey
We conducted a user survey among company employees working as annotators, to get an idea of their characteristics, whether a gamified approach would appeal to them at all, which game elements would suit them most, and which should be avoided regarding the aforementioned potential risks.
We adapted the student model from the work of Andrade et al. , which defines five attributes of the player: Knowledge, Psychology, General Behavior, Gamer Profile, and Interaction. As General Behavior is focused on personal habits unrelated to the domain, we decided to omit this due to privacy issues. The Interaction attribute addresses information about the user activities which is better obtained via monitoring and logging (e. g., number of logins, success rate). We also decided to leave this out as it was not our goal to assess individual user activity.
Consequently, we created a questionnaire covering the three aspects Knowledge (labeling experience), Psychology (personal opinions) and Gamer Profile (game experience). Twenty company employees participated in the survey (11 of them aged between 24 and 30, two younger than 24, four aged between 31 an 40, one older than 40, two preferred not to tell their age). The only mandatory question was if they had ever performed an annotation task. If they had, they could answer more follow-up questions referring to annotations. All other questions were optional.
When asked about their experience, 18 out of the 20 participants stated that they have already performed annotation tasks for the company, half of them indicated that they have been labeling data for more than three months. In a multiple-choice question, we asked the 18 participants who had experience with annotation which kinds of labeling tasks they had already performed. Document placement (15) and handwriting recognition (14) were the ones that had been performed by most of the annotators, followed by NLP tasks (8) and classification (6).
Concerning the psychological aspect, we asked the annotators to take a position on six moderately provocative statements, choosing from a Likert scale of five different options of agreement (I agree... not at all (−2) / not quite (−1) / neutral (0) / a bit (1) / a lot (2)). The absolute number of answers is represented in Figure 4. From the number of positive answers (Likert scale values 1 and 2), we derived a percentage for the agreement per statement.
“I find labeling tasks tiresome” (65 % agreed, , )
“I would like to be able to see how well I am doing in labeling, compared to my coworkers” (55 % agreed, , )
“If labeling included game elements, the label results would be better” (50 % agreed, , )
“If labeling included game elements it would be much more fun” (65 % agreed, , )
“I would not like it if others were able to see my labeling progress on a leaderboard” (45 % agreed, , )
“Using game elements at work makes a company seem less serious” (30 % agreed, 55 % disagreed, , )
3.2.3 Gamer Profile
As for the gaming habits, we posed questions regarding the time spent on games, which kinds of games were preferred as well as which game elements were the most motivating ones. The majority likes digital games, with 60 % of them playing them at least every week, whereas 20 % played them at least once a month and 10 % only rarely or not at all, respectively. We added the question on real-life games, in case the participants were not keen on digital games, but still liked playing physically. In our group, however, digital games were more popular.
To figure out which of Bartle’s four main player types  was most present in the study group, we asked the participants to indicate how much they enjoy distinctive game types (like simulation games, action games, puzzle-based games, etc.), using a five-level Likert scale (not at all(−2) ... a lot (2)). Furthermore, they were also asked to rate specific game elements (like leaderboards, playing against others, rewards, team play, etc.) on a five-level Likert scale (very demotivating (−2) ... very motivating (2)). Finally, we also asked about the dominating motivation to play games at all. We used the correlation between preferred game types, preferred game elements and gaming motivations to create a score for each set of player type characteristics per player. The resulting scores are shown in Figure 5. We identified nine participants with predominant characteristics of an Achiever (points-gathering, rising in levels), six more inclined to be an Explorer (progressive actions, find interesting features), and two showing equal characteristics of both (P9, P15). Thus, a group tendency towards Achiever and Explorer characteristics was notable.
3.2.4 Further Feedback
We also asked the annotators what they disliked about the current tool and how they would like it to be improved. From this information, we hoped to be getting some impressions of possible stimuli for a gamification concept. Except statements like “Labeling is boring.”, the feedback we got for this question was concerning more technical issues, like the repeated demand for a more fluid user interaction inside the annotation tool by supporting key shortcuts to reduce the need for mouse interaction.
3.2.5 Lessons Learned
From our survey, we can conclude that the majority of annotators are playing games and are open to the use of game elements inside work tasks. We learned that a way of comparison of performance is desired, but should provide anonymity, which is also a precaution in terms of the danger Unwanted Competition. A complex narrative, levels with increasing difficulty, as well as playing with others in a team, but also playing against others and exploration were voted as the most motivating game elements. The dominating player type characteristics in the group of annotators hence turned out to be the ones of the Achiever (points-gathering, rising in levels) and the Explorer (progressive actions, figuring out how things work, find interesting features) type .
3.3 Basic Gamification Concepts
Based on our findings, we created two concepts for the gamification of the annotation tool (see 3.1), each containing a selection of the preferred game elements. Both basic concepts were presented to the annotators themselves, giving them a chance to give feedback and express their opinions concerning the idea of having said game elements inside their tools.
3.3.1 Concept One: I Can Make a Change
Game elements: Story/narrative, levels, progress bar, badge/reward Player type: Explorer
The main idea of this concept is the creation of a narrative with multiple levels that presents a goal which users need to achieve by creating annotations. Aside from telling various sub-stories, levels can differ in the difficulty of the goal to be met, or in the type of annotation that needs to be performed. A progress bar helps the users to see how far they have come in terms of the levels but again also concerning different types of annotations. For this purpose, the concept furthermore proposes the introduction of badges which reward a defined amount of annotations performed in a certain category. A user could, for example, get an Eagle Eye badge after performing fifty approved handwriting annotations.
3.3.2 Concept Two: TeamChallenge – Us Versus Them
Game elements: Competition between teams, leaderboard with team names, points and achievements Player type: Achiever
This concept addresses the more competitive types among the annotators. The general idea is to group users into teams and have them annotate against each other. They can either define these groups by themselves or be automatically assigned to a group without knowing who their team members are. The latter can serve as a prevention of unwanted competition and pressure in the workplace. Users still perform the annotations for themselves but they can see the overall progress of their team as well as the other team(s). Similar to the first approach, teams can receive topic-related achievements rewards for extraordinary performance.
3.3.3 Feedback on the Concepts
After the survey, we presented the results to the participants in a group meeting, as well as our two concepts. We asked to comment on the concepts and to vote for one. While the competitive approach seemed appealing, the first one of building a story around the annotation tasks was preferred. Furthermore, the idea of incorporating a narrative led to much valuable creative input on the part of the annotators. Some ideas which were named were to divide the big narrative into various chapters and consecutive steps (hence levels) that needed to be passed, but also that the story – if it is told correctly – could help the annotators understand what their work was used for and why they were doing it. So subconsciously, they expressed the desire for Purpose (people find a task much more intriguing when there is a reason to do it or a greater meaning behind it, which can also be linked to altruism according to ).
Besides, it was mentioned that despite not implementing a team feature per se, the tool could still support the group feeling by showing the general group progress for all annotations. Related literature refers to this aspect as “Perceived visibility”, being “related to the notion of being noticed by peers and in a position of social presence” . On the other hand, concerns were voiced as to how motivating such a progress bar would be in case there was a long period without any change or if the model was adapted, causing the progress to decrease. However, the story element leaves a lot of room for adaptations in this case. Such a negative progress could be explained using a negative twist inside the narrative itself, for example, an event of a sudden cyber attack, giving Robbie digital amnesia and the consequent need to continue the training in order to reacquire the lost progress. This requires a constant monitoring of user performance and a thorough player model to detect anomalies and have the system react in such a way that user behavior is directed correctly.
4 Game Design and Implementation
From the discussion with the users, we derived a final game concept that includes a complex narrative (explained in 4.1), levels of increasing difficulty, and a progress indicator.
Upon first use, the annotators are taken on a tour through the tool where all the central elements are described, and the tasks and the narrative are introduced. Knowing the basic story and the main goal, users get to the level overview (Figure 8) which from then on is always going to be the initial screen. Users can switch between different episodes that each can be used to tell different storylines, but also for annotation tasks of different types. New episodes can be added by administrators (or authors) at any time, so they do not depend on the user’s progress. Clicking on one level, the users enter the level screen where the different resources which are assigned to this level are listed (see Figure 6). From here, they can choose a resource to annotate (see Figure 7). The following subsections are each dedicated to one game element, describing its design, placement, and function.
In order to support intrinsic motivation, a story element for gamification should be linked to the context and the tasks performed by the company . Hence, we got inspired by Google Crowdsource  where the speech assistant training task is initialized with a robot which introduces the topic to the user. We created the AI user assistant Robbie to fulfill three jobs: it is a tutor which gives first-time users a tour through the tool, it is the “Help” element of the tool and it is the center of our narrative, telling the story and giving feedback on the progress. Each episode has one main storyline where Robbie faces a struggle that needs to be solved with the help of annotations. Examples for these stories are to help Robbie achieve certain capabilities, such as learning the human language, which includes learning to “read” (handwriting annotation), to “understand” document structures (bounding box tasks) or even linguistic structures (entity annotation). In the following levels, these plots can always be reused by asking the user to train Robbie further in terms of one of the mentioned capabilities. With these plots, we aim to support the user need purpose by creating abstract stories that are related to the real-life use case.
Inside a level, users can see all of the resources, including the ones annotated by other users. For this reason, there is a filter bar which annotators can use to filter the resources by their state and by “only my resources” (switch-toggle button). Additionally, each resource which was annotated or already started by this user has a user icon in the top right corner. In the header, users can see the level’s quest (mission) in the center and the number of their current score inside this level in the right corner. This score shows the number of approved or the number of made annotations (depending on the quest) out of the number of annotations needed to pass the level. The quest itself is a narrative element. It asks the user to reach a certain annotation goal which leads to fulfilling a greater purpose (of helping Robbie). The way the quest is phrased and designed is essential for the fulfillment of the gamification goal. For now, we distinguish two basic quest types: “Annotate a certain number of resources!” (quantity) or “Get a certain number of approvals for your annotations!” (quality). The main goal of our gamification approach is to improve the quality of the annotations. Consequently, for the most part, our quests will require users to create approved annotations or combine both quest types (“Annotate x resources and get y approvals!”).
In order to keep track of the quest realization, the user’s progress needs to be visualized. Progress is shown in the level overview, where users can see what percentage of the total resources inside the level has already been annotated and approved and if they passed the level or not. Furthermore, the possibility of unlocking new levels if the user passed a level is another user-specific progress element. Inside each level, there are multiple progress elements: the annotation counter in the level header, the resource colors, the level theme image (which changes from greyscale to color upon passing a level), and the level progress bar. In the following sections, we explain in detail the layout of this progress bar and how we adopted colors into the progress concept.
We use four main colors for encoding progress (Figure 9). According to their state, the background color for a level box in the level overview and a resource in the level view is determined. Green represents the state passed (level) or approved (resource). White represents the state open and is used for the background of a level box that contains open resources and of an open resource which still needs to be annotated. Grey denotes the state locked and is used for levels that are not accessible yet as well as for resources that can not be accessed because they are currently being annotated by another user. A resource has two additional potential states: being in review after a user finished annotating it (which is shown with blue color), and being in rework if a resource has been reviewed and not approved due to incorrect or insufficient annotations. The same color palette is used for the progress bar inside a level.
4.3.2 Progress Bar
The progress bar contains information on how many annotations in this level have been approved (green) and how many annotations are in the review process (blue), in proportion to the whole number of resources. The remaining white space of the bar implicitly encodes all open resources inside this level. The green space is additionally divided into several parts, each one representing a user and their approved annotations. These partitions are sorted from left (“most approved annotations”) to right (“least approved annotations”) and they are anonymous, so no user can see which one belongs to whom. They can, however, see where their part is located which is highlighted in blue and with a user icon (Figure 10). So, apart from serving as progress information, the bar also serves as an anonymous leaderboard.
4.3.3 Success Notifications
When a level is passed and the next one gets unlocked, a user will get a success notification. Where and when exactly it appears, depends on the passing rule. If the quest goal is to create a certain number of annotations only, the success notification will appear inside the annotation tool, showing a happy Robbie celebrating and giving the user the options to go to the next level or proceed to create annotations for the current level. If the passing rule demands a certain number of approvals, the moment when a level is passed does not correlate with the annotation flow or even with the time when the user is online. In this case, the success message will appear in the level overview, with an animation showing the next level being unlocked. Alternatively, a pop-up notification inside the tool can be shown if the user is online when the required number of annotation approvals is reached.
In the sidebar, users can access a general help screen by clicking on the Robbie icon, but also access their personal user statistics. This is a feature we propose due to the fact that users did like the idea of seeing their progress, also in comparison to others as derived from the results of the user study. In the statistics screen, shown in Figure 11, they can see awards they got, the number of annotations they created per annotation type, as well as a chart showing the history of their total number of annotations. The ideas for these charts are just an initial suggestion and can be adapted to the users’ needs.
Users see a tutorial after they click on an open resource if this requires annotation of a type which the user is not experienced in or in case the resource has any kind of exceptional rules which the annotator must know. Currently, the company has tutorials that are not embedded inside the annotation tool. By providing this feature, we aim to support the improvement of the annotation quality as it requires users to read the rules and guidelines before creating the annotation. The tutorial consists of multiple steps: first, the rules are presented, with Robbie as a decorative element, highlighting positive rules (what is recommended, examples for a good dataset) and negative rules (which data should be skipped), as shown in Figure 12. They are followed by a brief training part where users get confronted with minimal annotation tasks that test their understanding of the previously presented rules.
4.6 Working Prototype
Starting with low-fidelity prototypes, we finally implemented a fully operable Web prototype within the company’s annotation environment, based on Angular 9 and NgRx, and using a central backend service, which obtains data from a MongoDB database. In addition to the former annotation tool, the prototype introduces software components to represent episodes, annotation levels and awards as game elements. The prototype is fully operational regarding real annotation tasks, user authentication and login, as well as loading and saving annotations per user account. Thus, preexisting and gamified annotation tool are ready for use, e. g., to be compared in long-term A/B tests. Short-term tests have already been conducted to verify functionality, but these are not yet meaningful regarding our three central metrics, quantity, quality and enjoyment. Nevertheless, initial feedback from the company and individual employees on the results was throughout very positive and encouraging.
5 Conclusion and Future Work
This work describes our approach and design process for the gamification of an annotation tool for creating machine learning training data. Unlike a “one size fits all” approach to gamifying the tool, where game elements are applied without regard to the context of use , we choose a user-adapted approach which first analyzes existing literature for gamification and then performs a user research study with twenty employees of an AI product company. The results show that the employees enjoy gaming in their free time, which supports the utilization of a gamified tool, and which game preferences they have. From the findings, we derive an individual gamification concept with regard to the annotation use case and the employees’ player characteristics. Our implemented prototype is a gamification approach for AI4BD’s annotation tools. It serves as a proof of concept that game elements can be easily implemented inside an existing environment. Game elements can serve as a positive motivator in annotation tasks of various domains , be it for sensor data analytics  or for medical use cases . We highly encourage the involvement of the annotating users in the creation of an annotation tool, and the consideration of a gamified approach as described.
An aspect we did not cover in this work is how to assess the difficulty of an annotation task. One approach is to map the complexity, hence the effort, of the task to the difficulty. On the other hand, even a less complex task can be of greater effort than a complex task if the data contains a lot of ambiguousness. Future work can analyze this problem thoroughly. It might also be helpful to follow a more thorough approach of user research that considers psychological aspects, possibly even personality types, and aims for a deeper user analysis. Generally, it can be interesting for other projects to regard other taxonomies of player types and to perform detailed psychological user research. Besides, we did not evaluate our approach over a long period (over several months), which is especially necessary when a narrative is included which is an element that evolves.
During our research, we found a variety of different taxonomies for gamification. We also noticed that not all terms are used consistently by different sources, for example, the terms Game Mechanics and Game Elements. Besides, many approaches to distinguish player types exist, which is why we chose to stick with the basic player type taxonomy by . Researchers with a similar purpose should keep in mind that gamification is perceived with skepticism and concerns by some users as the underlying idea is the manipulation of user behavior. For the prevention of a negative impression, it is recommended to include the users in the design process by asking for their opinions and their general game affinity. We strongly discourage any destructive intentions when using gamification, for example, aiming for surveillance of the staff or a highly competitive environment in the company. Related work also frequently mentions the importance of transparency and disclosure concerning the game tool. One possible way to encourage trust is by giving the users access to information on the reasons for the use of game elements and not leaving them with a wrong feeling of being observed or put under pressure by a gamified tool.
Funding source: Bundesministerium für Bildung und Forschung
Award Identifier / Grant number: 16SV8396
Funding statement: This work was partially funded by the German Research Foundation (DFG) as part of Germany’s Excellence Strategy – EXC 2050/1 – Project ID 390696704 – Cluster of Excellence “Centre for Tactile Internet with Human-in-the-Loop” (CeTI) of Technische Universität Dresden, and by DFG grant 389792660 as part of TRR 248 (see https://perspicuous-computing.science). Additionally, the work at AI4BD was funded by the Federal Ministry of Education and Research (BMBF) and the research project PANDIA (grant number 16SV8396).
About the authors
Sarah Alaghbari graduated in Media Computer Science at TU Dresden. As a student assistant at the chair of information and communication business management as well as the media center of TU Dresden, she discovered her interests for the concept of gamification and user interface design. Her Master’s thesis covered the use of gamification elements for machine learning use-cases. Since 2020, she has been working at AI4BD GmbH as a software engineer and designer.
Dr. Annett Mitschick is a research and teaching assistant at the Multimedia Technology Group/Interactive Media Lab at TU Dresden, Germany. She received her diploma in Media Computer Science in 2003 and her PhD in 2010 from TU Dresden. Her current research interests include user-oriented multimedia information retrieval and management using Semantic Web technologies, as well as design and evaluation of user interfaces for data exploration, information seeking, and interpretability.
Gregor Blichmann received his diploma from the TU Dresden, Germany in 2011. Subsequently, he worked at the TU Dresden as a research associate in the area of Web Engineering and Semantic Web technologies. In 2017, he joined the AI4BD Deutschland GmbH as a software engineer and architect. Today, he is working as a product owner and team lead for the software development unit. Within this, technologies for web-based frontends, Java- and Python-based backend services, semantic data models as well as user-centered, agile development processes are part of his daily work.
Dr. Martin Voigt received his diploma and doctor’s degree from TU Dresden, Germany. Here, he worked as researcher and project manager in the fields of Web Engineering, Semantic Technologies, and Information Visualization. Since he joined the AI4BD group in 2014, he is responsible for the general research & development strategy and leads all related activities. In 2015, he became managing director of AI4BD Deutschland GmbH. With his team he works in the technology fields machine learning, software and web engineering, as well as knowledge graphs.
Prof. Dr. Raimund Dachselt is a full professor of Computer Science and head of the Interactive Media Lab at Technische Universität Dresden, Germany. He received his PhD in 2004 from TU Dresden and was professor for User Interface & Software Engineering at University of Magdeburg from 2007 to 2012. His research interests are at the intersection of natural, multimodal human computer interaction (HCI) and data visualization. He has co-authored more than 220 peer-reviewed publications and two major German HCI textbooks and received several awards at leading international conferences.
 Maximilian Altmeyer, Pascal Lessel, and Antonio Krüger. 2016. Expense Control: A Gamified, Semi-Automated, Crowd-Based Approach For Receipt Capturing. In Proceedings of the 21st International Conference on Intelligent User Interfaces (IUI ’16). Association for Computing Machinery, New York, NY, USA, 31–42. https://doi.org/10.1145/2856767.2856790.Search in Google Scholar
 Fernando Andrade, Riichiro Mizoguchi, and Seiji Isotani. 2016. The Bright and Dark Sides of Gamification. In Intelligent Tutoring Systems, Alessandro Micarelli, John Stamper, and Kitty Panourgia (Eds.). Lecture Notes in Computer Science, Vol. 9684. Springer International Publishing, 1–11. https://doi.org/10.1007/978-3-319-39583-8_17.Search in Google Scholar
 Gokhan Aydin. 2018. Effect of demographics on use intention of gamified systems. International Journal of Technology and Human Interaction (IJTHI) 14, 1, 1–21.Search in Google Scholar
 Richard Bartle. 1996. Hearts, clubs, diamonds, spades: Players who suit MUDs. Journal of MUD research 1, 1, 19.Search in Google Scholar
 Martin Böckle, Isabel Micheel, Markus Bick, and Jasminko Novak. 2018. A Design Framework for Adaptive Gamification Applications. In Proceedings of the 51st Hawaii International Conference on System Sciences (Proceedings of the Annual Hawaii International Conference on System Sciences), Tung Bui (Ed.). Hawaii International Conference on System Sciences. https://doi.org/10.24251/HICSS.2018.151.Search in Google Scholar
 Bunchball. 2015. What are Game Mechanics? https://www.bunchball.com/gamification/game-mechanics.Search in Google Scholar
 Rachel C. Callan, Kristina N. Bauer, and Richard N. Landers. 2015. How to avoid the dark side of gamification: Ten business scenarios and their unintended consequences. In Gamification in education and business. Springer, 553–568.Search in Google Scholar
 Sebastian Deterding, Staffan Björk, Lennart Nacke, Dan Dixon, and Elizabeth Lawley. 2013. Designing gamification: creating gameful and playful experiences. In Extended Abstracts on Human Factors in Computing Systems. 3263–3266. https://doi.org/10.1145/2468356.2479662.Search in Google Scholar
 Sebastian Deterding, Dan Dixon, Rilla Khaled, and Lennart Nacke. 2011. From Game Design Elements to Gamefulness: Defining Gamification. In Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments (MindTrek ’11). ACM, 9–15. https://doi.org/10.1145/2181037.2181040.Search in Google Scholar
 Dominique Mangiatordi. 2018. Gamification at work: the 8 PLAYER TYPES. https://www.linkedin.com/pulse/gamification-work-8-player-types-dominique-mangiatordi-/.Search in Google Scholar
 Lauren Ferro, Steffen Walz, and Stefan Greuter. 2013. Towards personalised, gamified systems: An investigation into game design, personality and player typologies. In Proceedings of The 9th Australasian Conference on Interactive Entertainment Matters of Life and Death. https://doi.org/10.1145/2513002.2513024.Search in Google Scholar
 Google. 2020. Google Crowdsource. https://crowdsource.google.com/.Search in Google Scholar
 Simone Hantke, Tobias Appel, and Björn Schuller. 2018. The Inclusion of Gamification Solutions to Enhance User Enjoyment on Crowdsourcing Platforms. In 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia). 1–6. https://doi.org/10.1109/ACIIAsia.2018.8470330.Search in Google Scholar
 Eduardo Herranz, Ricardo Colomo-Palacios, and Antonio de Amescua Seco. 2015. Gamiware: A Gamification Platform for Software Process Improvement. In Systems, Software and Services Process Improvement, Rory V. O’Connor, Mariye Umay Akkaya, Kerem Kemaneci, Murat Yilmaz, Alexander Poth, and Richard Messnarz (Eds.). Springer International Publishing, Cham, 127–139.Search in Google Scholar
 Karl Kapp. 2012. The gamification of learning and instruction: Game-based methods and strategies for training and education. Pfeiffer, San Francisco, CA.Search in Google Scholar
 Alireza Khakpour and Ricardo Colomo-Palacios. 2020. Convergence of Gamification and Machine Learning: A Systematic Literature Review. Technology, Knowledge and Learning. https://doi.org/10.1007/s10758-020-09456-4.Search in Google Scholar
 Selay Arkün Kocadere and Şeyma Çağlar. 2018. Gamification from player type perspective: A case study. Journal of Educational Technology & Society 21, 3, 12–22.Search in Google Scholar
 Jonna Koivisto and Aqdas Malik. 2020. Gamification for Older Adults: A Systematic Literature Review. The Gerontologist. https://doi.org/10.1093/geront/gnaa047.Search in Google Scholar
 Janaki Kumar and Mario Herger. 2013. Gamification at Work: Designing Engaging Business Software. In Design, User Experience, and Usability. Health, Learning, Playing, Cultural, and Cross-Cultural User Experience, Aaron Marcus (Ed.). Springer Berlin Heidelberg, 528–537.Search in Google Scholar
 Alexandra L’Heureux, Katarina Grolinger, Wilson Akio Higashino, and Miriam A. M. Capretz. 2017. A Gamification Framework for Sensor Data Analytics. In 2017 IEEE International Congress on Internet of Things (ICIOT). 74–81. https://doi.org/10.1109/IEEE.ICIOT.2017.18.Search in Google Scholar
 Daria Lopukhina. 2018. How Gamification in the Workplace Impacts Employee Productivity. https://anadea.info/blog/how-gamification-in-the-workplace-impacts-employee-productivity.Search in Google Scholar
 Andrzej Marczewski. 2013. Game Mechanics in Gamification. https://www.gamified.uk/2013/01/14/game-mechanics-in-gamification/.Search in Google Scholar
 Andrzej Marczewski. 2013. The Intrinsic Motivation RAMP. https://www.gamified.uk/gamification-framework/the-intrinsic-motivation-ramp/.Search in Google Scholar
 Andrzej Marczewski. 2015. Even ninja monkeys like to play: Gamification, game thinking & motivational design. Gamified UK. http://gamified.uk/user-types/.Search in Google Scholar
 Jane McGonigal. 2011. Reality is broken. Penguin Press. http://www.loc.gov/catdir/enhancements/fy1107/2010029619-d.html.Search in Google Scholar
 Nina Runge, Dirk Wenig, Danny Zitzmann, and Rainer Malaka. 2015. Tags You Don’t Forget: Gamified Tagging of Personal Images. In Entertainment Computing – ICEC 2015, Konstantinos Chorianopoulos, Monica Divitini, Jannicke Baalsrud Hauge, Letizia Jaccheri, and Rainer Malaka (Eds.). Springer International Publishing, 301–314.Search in Google Scholar
 Richard M. Ryan and Edward L. Deci. 2000. Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist 55, 1, 68–78. https://doi.org/10.1037/0003-066X.55.1.68.Search in Google Scholar
 Lorenzo Servadei, Rainer Schmidt, Christina Eidelloth, and Andreas Maier. 2018. In Medical Monkeys: A Crowdsourcing Approach to Medical Big Data. 87–97. https://doi.org/10.1007/978-3-319-73805-5_9.Search in Google Scholar
 Samuel Suikkanen. 2019-06-17. Gamification in video labeling; Videoiden merkkaamisen pelillistäminen. G2 pro gradu, diplomityö. http://urn.fi/URN:NBN:fi:aalto-201906233997.Search in Google Scholar
 Richard H. Thaler and Cass R. Sunstein. 2008. Nudge – Improving Decisions About Health, Wealth, and Happiness (rev. and expanded ed., with a new afterword and a new chapter ed.). Penguin. 304 Pages.Search in Google Scholar
 Rob van Roy, Sebastian Deterding, and Bieke Zaman. 2018. Uses and Gratifications of Initiating Use of Gamified Learning Platforms. In Asian CHI Symposium 2018, Eunice Sari (Ed.). CHI Conference, ACM, 1–6. https://doi.org/10.1145/3170427.3188458.Search in Google Scholar
 Luis von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the 2004 conference on Human factors in computing systems – CHI ’04, Elizabeth Dykstra-Erickson and Manfred Tscheligi (Eds.). ACM Press, 319–326. https://doi.org/10.1145/985692.985733.Search in Google Scholar
 Lincoln Wood and Torsten Reiners. 2015. Gamification. In Encyclopedia of Information Science and Technology. 3039–3047. https://doi.org/10.4018/978-1-4666-5888-2.ch297.Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston