The data are partitioned to europe_data and non_europe_data, where europe_data contains only posts written in the European subreddits and non-europe_data contains posts written in other subreddits. The data of each country are stored in a single CSV file with the columns: 'user', 'subreddit' and 'post'. The 'user' is the ID of the user, the 'subreddit' is the subreddit (Reddit forum) in which the text was posted, the 'post' is a full comment or submission of the user. The posts are tokenized and cleaned of single non-alphabetic characters. Non-English posts were removed. The files are sorted by user.