Social Media predicting educational outcomes of students using AI technology

Ivan Smirnov, the Leading Research Fellow of the Laboratory of Computational Social Sciences at the Institute of Education of HSE University, has created a computer model that can distinguish the high achievers vs the low achievers using their social media posts.

Using a range of parameters such as vocabulary, emojis, post length, word length, the content that the post signifies, etc., he created a mathematical analysis that determines academic achievements of these students.

Through his analysis, he has determined that it is not the typical parameters, such as the length of a post or the word count that determines the results, but rather the vocabulary used in the posts and the topics covered. If a person was to talk about Newtons theories, concepts related to literature or even stuff that cover thought processes and memorization in their posts, it would determine that the person is a high achiever; whereas the abundant use of emojis, shortforms of abbreviated text and mismatches in vocabulary indicate that the person is a low achiever.

Another parameter that he used was the hashtags that they used in their posts and where they led to. Anything that seemed horoscopic or military proved lack of motivation towards school and those related to science and books were taken as the opposite.

However he found that conducting his research through the traditional methods of a stufy or a survey wouldn’t work and thus he resorted to a new method – a machine learning model.

‘Learning ability is a very complex human characteristic. It is influenced not only by character traits, but also by psychological well-being. Alas, in contrast to academic success, which is available in the public domain, there are no mechanisms within educational institutions for measuring the latter.’

Ivan Simrov

Using the machine learning vector model and the posts of a group of high school students, he mapped out the results as shown in the picture below.

Clusters with low scores (in green) include misspelled words, names of popular computer games, concepts related to military service (army, oath, etc.), horoscope terms (Aries, Sagittarius), and words related to driving and car accidents (collision, traffic police, wheels, tuning).

Strange enough, the models also determine the scores even from words that are used in a rare training dataset. An example of this is the reference used from Harry Potter – if you mention Newt (as in Newt Scamander from the Fantastic Beasts) the chances of the post being recognized as post that states high achievement is higher in that dataset.

This model proves to be highly useful for a variety of fields apart from education – such as literature, food, politics, and more. A situation of this being applied can be where education researchers are interested in understanding what distinguishes successful schools from average schools. Other situations include how prepared the students are in an academic aspect and schools being interested as well as invested in students scoring more as well.

This study gains information only from accounts that are openly accessible to the public with only the data provided by the user in their bio/ account that is publicly visible. Private access is not allowed by the servers of the platforms themselves due to the terms and conditions that comply under customer privacy; thereby stating that this study is very “cybersafe” and that it doesn’t breach into the account’s security to gain demographic data.


National Research University Higher School of Economics. Artificial intelligence can predict students’ educational outcomes based on tweets. 22 Oct.2020:

More information: Ivan Smirnov, Estimating educational outcomes from students’ short texts on social media, EPJ Data Science (2020). DO10.1140/epjds/s13688-020-00245-8

Journal information: European Physical Journal Data Science 

Leave a Reply