Attention! This site may be temporarily unavailable on Feb. 19 - Feb. 28, 2018.

You can contact us at: support@domainindex.com

domainsindex logo
button
n-gram files

ABOUT N-GRAM FILES

N-grams are sequences of n contiguous arbitrary text units. These text units can be words or single characters. The n-grams typically are collected from a text or speech corpus.

Our n-grams collection contains more than 814 million 1-gram in French, German, Russian, Spanish and English and more than 2,419 billions 2-grams in English. This collection is available for the price of $900. The purchase process is very convenient. You can commit safe and efficient e-commerce transaction via PayPal, after which you'll receive information on your e-mail address. The e-mail will contain 118 download links that directly lead to download. Those links are going to be accessible on your e-mail in the next 48 hours after the e-mail is sent.

Table 1  contains information about how many words and n-grams are there in each language.
Table 2  contains information about total size of each file that can be downloaded.
File format:  Each line of the files has the following format:
"n-gram","year","total","pages","books"
This means that "n-gram" in "year" occurred "total" times at "pages" pages of "books" books.
Note:  Before making a purchase please register for free.
  • Total size of each zipped file is approximately 115 MB
  • After paying via PayPal, you will receive an e-mail with 118 download links.The links are available 30 days after the mail is sent.
Table 1
Language words or phrases n-grams
1-gram
French 1,372,061 114,830,157
German 3,098,678 207,640,306
Russian 2,890,485 192,180,448
Spanish 1,438,384 113,648,250
English 2,234,175 186,652,682
2-grams
English 31,321,694 2,419,296,715

Table 2
Filename size
ngrams_french_1_1.csv.gz 166 MB
ngrams_french_1_2.csv.gz 152 MB
ngrams_french_1_3.csv.gz 141 MB
ngrams_french_1_4.csv.gz 38 MB
ngrams_german_1_1.csv.gz 139 MB
ngrams_german_1_2.csv.gz 137 MB
ngrams_german_1_3.csv.gz 132 MB
ngrams_german_1_4.csv.gz 132 MB
ngrams_german_1_5.csv.gz 128 MB
ngrams_german_1_6.csv.gz 127 MB
ngrams_german_1_7.csv.gz 113 MB
ngrams_russian_1_1.csv.gz 102 MB
ngrams_russian_1_2.csv.gz 100 MB
ngrams_russian_1_3.csv.gz 98 MB
ngrams_russian_1_4.csv.gz 97 MB
ngrams_russian_1_5.csv.gz 95 MB
ngrams_russian_1_6.csv.gz 93 MB
ngrams_russian_1_7.csv.gz 91 MB
ngrams_russian_1_8.csv.gz 89 MB
ngrams_russian_1_9.csv.gz 89 MB
ngrams_russian_1_10.csv.gz 52 MB
ngrams_spanish_1_1.csv.gz 171 MB
ngrams_spanish_1_2.csv.gz 154 MB
ngrams_spanish_1_3.csv.gz 141 MB
ngrams_spanish_1_4.csv.gz 34 MB
ngrams_english_1_1.csv.gz 165 MB
ngrams_english_1_2.csv.gz 152 MB
ngrams_english_1_3.csv.gz 149 MB
ngrams_english_1_4.csv.gz 141 MB
ngrams_english_1_5.csv.gz 134 MB
ngrams_english_1_6.csv.gz 43 MB
ngrams_english_2_1.csv.gz 118 MB
ngrams_english_2_2.csv.gz 118 MB
ngrams_english_2_3.csv.gz 118 MB
ngrams_english_2_4.csv.gz 117 MB
ngrams_english_2_5.csv.gz 117 MB
ngrams_english_2_6.csv.gz 118 MB
ngrams_english_2_7.csv.gz 117 MB
ngrams_english_2_8.csv.gz 117 MB
ngrams_english_2_9.csv.gz 117 MB
ngrams_english_2_10.csv.gz 117 MB
ngrams_english_2_11.csv.gz 117 MB
ngrams_english_2_12.csv.gz 116 MB
ngrams_english_2_13.csv.gz 116 MB
ngrams_english_2_14.csv.gz 115 MB
ngrams_english_2_15.csv.gz 115 MB
ngrams_english_2_16.csv.gz 115 MB
ngrams_english_2_17.csv.gz 115 MB
ngrams_english_2_18.csv.gz 114 MB
ngrams_english_2_19.csv.gz 114 MB
ngrams_english_2_20.csv.gz 114 MB
ngrams_english_2_21.csv.gz 114 MB
ngrams_english_2_22.csv.gz 113 MB
ngrams_english_2_23.csv.gz 114 MB
ngrams_english_2_24.csv.gz 113 MB
ngrams_english_2_25.csv.gz 114 MB
ngrams_english_2_26.csv.gz 115 MB
ngrams_english_2_27.csv.gz 113 MB
ngrams_english_2_28.csv.gz 112 MB
ngrams_english_2_29.csv.gz 112 MB
ngrams_english_2_30.csv.gz 112 MB
ngrams_english_2_31.csv.gz 112 MB
ngrams_english_2_32.csv.gz 112 MB
ngrams_english_2_33.csv.gz 111 MB
ngrams_english_2_34.csv.gz 112 MB
ngrams_english_2_35.csv.gz 113 MB
ngrams_english_2_36.csv.gz 112 MB
ngrams_english_2_37.csv.gz 111 MB
ngrams_english_2_38.csv.gz 110 MB
ngrams_english_2_39.csv.gz 110 MB
ngrams_english_2_40.csv.gz 110 MB
ngrams_english_2_41.csv.gz 110 MB
ngrams_english_2_42.csv.gz 112 MB
ngrams_english_2_43.csv.gz 110 MB
ngrams_english_2_44.csv.gz 109 MB
ngrams_english_2_45.csv.gz 109 MB
ngrams_english_2_46.csv.gz 109 MB
ngrams_english_2_47.csv.gz 109 MB
ngrams_english_2_48.csv.gz 111 MB
ngrams_english_2_49.csv.gz 109 MB
ngrams_english_2_50.csv.gz 108 MB
ngrams_english_2_51.csv.gz 109 MB
ngrams_english_2_52.csv.gz 108 MB
ngrams_english_2_53.csv.gz 109 MB
ngrams_english_2_54.csv.gz 109 MB
ngrams_english_2_55.csv.gz 108 MB
ngrams_english_2_56.csv.gz 108 MB
ngrams_english_2_57.csv.gz 108 MB
ngrams_english_2_58.csv.gz 108 MB
ngrams_english_2_59.csv.gz 109 MB
ngrams_english_2_60.csv.gz 108 MB
ngrams_english_2_61.csv.gz 107 MB
ngrams_english_2_62.csv.gz 107 MB
ngrams_english_2_63.csv.gz 107 MB
ngrams_english_2_64.csv.gz 107 MB
ngrams_english_2_65.csv.gz 107 MB
ngrams_english_2_66.csv.gz 107 MB
ngrams_english_2_67.csv.gz 106 MB
ngrams_english_2_68.csv.gz 106 MB
ngrams_english_2_69.csv.gz 106 MB
ngrams_english_2_70.csv.gz 106 MB
ngrams_english_2_71.csv.gz 105 MB
ngrams_english_2_72.csv.gz 105 MB
ngrams_english_2_73.csv.gz 105 MB
ngrams_english_2_74.csv.gz 106 MB
ngrams_english_2_75.csv.gz 105 MB
ngrams_english_2_76.csv.gz 104 MB
ngrams_english_2_77.csv.gz 105 MB
ngrams_english_2_78.csv.gz 104 MB
ngrams_english_2_79.csv.gz 104 MB
ngrams_english_2_80.csv.gz 104 MB
ngrams_english_2_81.csv.gz 104 MB
ngrams_english_2_82.csv.gz 103 MB
ngrams_english_2_83.csv.gz 103 MB
ngrams_english_2_84.csv.gz 103 MB
ngrams_english_2_85.csv.gz 103 MB
ngrams_english_2_86.csv.gz 103 MB
ngrams_english_2_87.csv.gz 42 MB