Tweet topics and sentiments relating to distance learning among Italian Twitter users

The data

Twitter was chosen as the data source. It is one of the world’s major social media platforms, with 199 million active users in April 20214, and it is also a common source of text for sentiment analyses23,24,25.

To collect distance learning-related tweets, we used TrackMyHashtag, a tracking tool to monitor hashtags in real time. Unlike Twitter API, which does not provide tweets older than three weeks, TrackMyHashtag also provides historical data and filters selections by language and geolocation.

For our study, we chose the Italian words for ‘distance learning’ as the search term and selected March 3, 2020 through November 23, 2021 as the period of interest. Finally, we chose Italian tweets only. A total of 25,100 tweets were collected for this study.

Data preprocessing

To clean the data and prepare it for sentiment analysis, we applied the following preprocessing steps using NLP techniques implemented with Python:

  1. 1.

    removed mentions, URLs, and hashtags,

  2. 2.

    replaced HTML characters with Unicode equivalent (such as replacing ‘&’ with ‘&’),

  3. 3.

    removed HTML tags (such as \(< div>\), \(< p>\), etc.),

  4. 4.

    removed unnecessary line breaks,

  5. 5.

    removed special characters and

