readme
The Text_chunks directory contains two directories:europe_data and non_europe_data,
	inside each directory there are directories for each country,
		inside each country dir there are directories for each user,
			which contain chunks of 100 tokenized sentences written by that user.
			
The Other main directories have the same structure but the chunks contain the relevant feature pre-proccessing. 
	pos_chunks contain POS 3-grams
	char_ngrams_chunks contain char 3-grams
	spell_checker_chunks contain the original word, and if it was marked as a mistake also the edit distance, the character insertions, deletions and replacements
	non_tokenized_chunks contain the detokenized text chunks
The order of the chunks is the same for each category

- I used 59 users randomly selected from each language.
- For each user I used at most the median number of chunks randomly selected. The median was calculated over all the chunks in the europe data for the in-domain task and for the training data of the out-of-domain task. for the test data of the out-of-domain task it was calculated over all chunks in the non_europe_data. The median was 3 chunks pers user for the europe_data and 17 for the non_europe_data. 

The countries used and their labels:			
UK            English 
US            English 
NewZealand    English 
Australia     English 
Ireland       English 
Austria       German 
Germany       German 
Albania       Albania 
Bosnia        Bosnia 
Bulgaria      Bulgaria 
Croatia       Croatia 
Czech         Czech 
Denmark       Denmark 
Estonia       Estonia 
Finland       Finland 
France        France 
Greece        Greece 
Hungary       Hungary 
Iceland       Iceland 
Italy         Italy 
Latvia        Latvia 
Lithuania     Lithuania 
Netherlands   Netherlands 
Norway        Norway 
Poland        Poland 
Portugal      Portugal 
Romania       Romania 
Russia        Russia 
Serbia        Serbia 
Slovakia      Slovakia 
Slovenia      Slovenia 
Spain         Spanish 
Mexico        Spanish 
Sweden        Sweden 
Turkey        Turkey 
Ukraine       Ukraine