# Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # a couple of test stopwords to test that the words are really being # configured from this file: stopworda stopwordb # Standard english stop words taken from Lucene's StopAnalyzer a an and are as at be but by for if in into is it no not of on or such that the their then there these they this to was will with | From svn.tartarus.org/snowball/trunk/website/algorithms/german/stop.txt | This file is distributed under the BSD License. | See http://snowball.tartarus.org/license.php | Also see http://www.opensource.org/licenses/bsd-license.html | - Encoding was converted to UTF-8. | - This notice was added. | | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" | A German stop word list. Comments begin with vertical bar. Each stop | word is at the start of a line. | The number of forms in this list is reduced significantly by passing it | through the German stemmer. aber | but alle | all allem allen aller alles als | than, as also | so am | an + dem an | at ander | other andere anderem anderen anderer anderes anderm andern anderr anders auch | also auf | on aus | out of bei | by bin | am bis | until bist | art da | there damit | with it dann | then der | the den des dem die das daß | that derselbe | the same derselben denselben desselben demselben dieselbe dieselben dasselbe dazu | to that dein | thy deine deinem deinen deiner deines denn | because derer | of those dessen | of him dich | thee dir | to thee du | thou dies | this diese diesem diesen dieser dieses doch | (several meanings) dort | (over) there durch | through ein | a eine einem einen einer eines einig | some einige einigem einigen einiger einiges einmal | once er | he ihn | him ihm | to him es | it etwas | something euer | your eure eurem euren eurer eures für | for gegen | towards gewesen | p.p. of sein hab | have habe | have haben | have hat | has hatte | had hatten | had hier | here hin | there hinter | behind ich | I mich | me mir | to me ihr | you, to her ihre ihrem ihren ihrer ihres euch | to you im | in + dem in | in indem | while ins | in + das ist | is jede | each, every jedem jeden jeder jedes jene | that jenem jenen jener jenes jetzt | now kann | can kein | no keine keinem keinen keiner keines können | can könnte | could machen | do man | one manche | some, many a manchem manchen mancher manches mein | my meine meinem meinen meiner meines mit | with muss | must musste | had to nach | to(wards) nicht | not nichts | nothing noch | still, yet nun | now nur | only ob | whether oder | or ohne | without sehr | very sein | his seine seinem seinen seiner seines selbst | self sich | herself sie | they, she ihnen | to them sind | are so | so solche | such solchem solchen solcher solches soll | shall sollte | should sondern | but sonst | else über | over um | about, around und | and uns | us unse unsem unsen unser unses unter | under viel | much vom | von + dem von | from vor | before während | while war | was waren | were warst | wast was | what weg | away, off weil | because weiter | further welche | which welchem welchen welcher welches wenn | when werde | will werden | will wie | how wieder | again will | want wir | we wird | will wirst | willst wo | where wollen | want wollte | wanted würde | would würden | would zu | to zum | zu + dem zur | zu + der zwar | indeed zwischen | between | From svn.tartarus.org/snowball/trunk/website/algorithms/italian/stop.txt | This file is distributed under the BSD License. | See http://snowball.tartarus.org/license.php | Also see http://www.opensource.org/licenses/bsd-license.html | - Encoding was converted to UTF-8. | - This notice was added. | | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" | An Italian stop word list. Comments begin with vertical bar. Each stop | word is at the start of a line. ad | a (to) before vowel al | a + il allo | a + lo ai | a + i agli | a + gli all | a + l' agl | a + gl' alla | a + la alle | a + le con | with col | con + il coi | con + i (forms collo, cogli etc are now very rare) da | from dal | da + il dallo | da + lo dai | da + i dagli | da + gli dall | da + l' dagl | da + gll' dalla | da + la dalle | da + le di | of del | di + il dello | di + lo dei | di + i degli | di + gli dell | di + l' degl | di + gl' della | di + la delle | di + le in | in nel | in + el nello | in + lo nei | in + i negli | in + gli nell | in + l' negl | in + gl' nella | in + la nelle | in + le su | on sul | su + il sullo | su + lo sui | su + i sugli | su + gli sull | su + l' sugl | su + gl' sulla | su + la sulle | su + le per | through, by tra | among contro | against io | I tu | thou lui | he lei | she noi | we voi | you loro | they mio | my mia | miei | mie | tuo | tua | tuoi | thy tue | suo | sua | suoi | his, her sue | nostro | our nostra | nostri | nostre | vostro | your vostra | vostri | vostre | mi | me ti | thee ci | us, there vi | you, there lo | him, the la | her, the li | them le | them, the gli | to him, the ne | from there etc il | the un | a uno | a una | a ma | but ed | and se | if perché | why, because anche | also come | how dov | where (as dov') dove | where che | who, that chi | who cui | whom non | not più | more quale | who, that quanto | how much quanti | quanta | quante | quello | that quelli | quella | quelle | questo | this questi | questa | queste | si | yes tutto | all tutti | all | single letter forms: a | at c | as c' for ce or ci e | and i | the l | as l' o | or | forms of avere, to have (not including the infinitive): ho hai ha abbiamo avete hanno abbia abbiate abbiano avrò avrai avrà avremo avrete avranno avrei avresti avrebbe avremmo avreste avrebbero avevo avevi aveva avevamo avevate avevano ebbi avesti ebbe avemmo aveste ebbero avessi avesse avessimo avessero avendo avuto avuta avuti avute | forms of essere, to be (not including the infinitive): sono sei è siamo siete sia siate siano sarò sarai sarà saremo sarete saranno sarei saresti sarebbe saremmo sareste sarebbero ero eri era eravamo eravate erano fui fosti fu fummo foste furono fossi fosse fossimo fossero essendo | forms of fare, to do (not including the infinitive, fa, fat-): faccio fai facciamo fanno faccia facciate facciano farò farai farà faremo farete faranno farei faresti farebbe faremmo fareste farebbero facevo facevi faceva facevamo facevate facevano feci facesti fece facemmo faceste fecero facessi facesse facessimo facessero facendo | forms of stare, to be (not including the infinitive): sto stai sta stiamo stanno stia stiate stiano starò starai starà staremo starete staranno starei staresti starebbe staremmo stareste starebbero stavo stavi stava stavamo stavate stavano stetti stesti stette stemmo steste stettero stessi stesse stessimo stessero stando | From svn.tartarus.org/snowball/trunk/website/algorithms/spanish/stop.txt | This file is distributed under the BSD License. | See http://snowball.tartarus.org/license.php | Also see http://www.opensource.org/licenses/bsd-license.html | - Encoding was converted to UTF-8. | - This notice was added. | | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" | A Spanish stop word list. Comments begin with vertical bar. Each stop | word is at the start of a line. | The following is a ranked list (commonest to rarest) of stopwords | deriving from a large sample of text. | Extra words have been added at the end. de | from, of la | the, her que | who, that el | the en | in y | and a | to los | the, them del | de + el se | himself, from him etc las | the, them por | for, by, etc un | a para | for con | with no | no una | a su | his, her al | a + el | es from SER lo | him como | how más | more pero | pero sus | su plural le | to him, her ya | already o | or | fue from SER este | this | ha from HABER sí | himself etc porque | because esta | this | son from SER entre | between | está from ESTAR cuando | when muy | very sin | without sobre | on | ser from SER | tiene from TENER también | also me | me hasta | until hay | there is/are donde | where | han from HABER quien | whom, that | están from ESTAR | estado from ESTAR desde | from todo | all nos | us durante | during | estados from ESTAR todos | all uno | a les | to them ni | nor contra | against otros | other | fueron from SER ese | that eso | that | había from HABER ante | before ellos | they e | and (variant of y) esto | this mí | me antes | before algunos | some qué | what? unos | a yo | I otro | other otras | other otra | other él | he tanto | so much, many esa | that estos | these mucho | much, many quienes | who nada | nothing muchos | many cual | who | sea from SER poco | few ella | she estar | to be | haber from HABER estas | these | estaba from ESTAR | estamos from ESTAR algunas | some algo | something nosotros | we | other forms mi | me mis | mi plural tú | thou te | thee ti | thee tu | thy tus | tu plural ellas | they nosotras | we vosotros | you vosotras | you os | you mío | mine mía | míos | mías | tuyo | thine tuya | tuyos | tuyas | suyo | his, hers, theirs suya | suyos | suyas | nuestro | ours nuestra | nuestros | nuestras | vuestro | yours vuestra | vuestros | vuestras | esos | those esas | those | forms of estar, to be (not including the infinitive): estoy estás está estamos estáis están esté estés estemos estéis estén estaré estarás estará estaremos estaréis estarán estaría estarías estaríamos estaríais estarían estaba estabas estábamos estabais estaban estuve estuviste estuvo estuvimos estuvisteis estuvieron estuviera estuvieras estuviéramos estuvierais estuvieran estuviese estuvieses estuviésemos estuvieseis estuviesen estando estado estada estados estadas estad | forms of haber, to have (not including the infinitive): he has ha hemos habéis han haya hayas hayamos hayáis hayan habré habrás habrá habremos habréis habrán habría habrías habríamos habríais habrían había habías habíamos habíais habían hube hubiste hubo hubimos hubisteis hubieron hubiera hubieras hubiéramos hubierais hubieran hubiese hubieses hubiésemos hubieseis hubiesen habiendo habido habida habidos habidas | forms of ser, to be (not including the infinitive): soy eres es somos sois son sea seas seamos seáis sean seré serás será seremos seréis serán sería serías seríamos seríais serían era eras éramos erais eran fui fuiste fue fuimos fuisteis fueron fuera fueras fuéramos fuerais fueran fuese fueses fuésemos fueseis fuesen siendo sido | sed also means 'thirst' | forms of tener, to have (not including the infinitive): tengo tienes tiene tenemos tenéis tienen tenga tengas tengamos tengáis tengan tendré tendrás tendrá tendremos tendréis tendrán tendría tendrías tendríamos tendríais tendrían tenía tenías teníamos teníais tenían tuve tuviste tuvo tuvimos tuvisteis tuvieron tuviera tuvieras tuviéramos tuvierais tuvieran tuviese tuvieses tuviésemos tuvieseis tuviesen teniendo tenido tenida tenidos tenidas tened | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt | This file is distributed under the BSD License. | See http://snowball.tartarus.org/license.php | Also see http://www.opensource.org/licenses/bsd-license.html | - Encoding was converted to UTF-8. | - This notice was added. | | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" | A French stop word list. Comments begin with vertical bar. Each stop | word is at the start of a line. au | a + le aux | a + les avec | with ce | this ces | these dans | with de | of des | de + les du | de + le elle | she en | `of them' etc et | and eux | them il | he je | I la | the le | the leur | their lui | him ma | my (fem) mais | but me | me même | same; as in moi-même (myself) etc mes | me (pl) moi | me mon | my (masc) ne | not nos | our (pl) notre | our nous | we on | one ou | where par | by pas | not pour | for qu | que before vowel que | that qui | who sa | his, her (fem) se | oneself ses | his (pl) son | his, her (masc) sur | on ta | thy (fem) te | thee tes | thy (pl) toi | thee ton | thy (masc) tu | thou un | a une | a vos | your (pl) votre | your vous | you | single letter forms c | c' d | d' j | j' l | l' à | to, at m | m' n | n' s | s' t | t' y | there | forms of être (not including the infinitive): été étée étées étés étant suis es est sommes êtes sont serai seras sera serons serez seront serais serait serions seriez seraient étais était étions étiez étaient fus fut fûmes fûtes furent sois soit soyons soyez soient fusse fusses fût fussions fussiez fussent | forms of avoir (not including the infinitive): ayant eu eue eues eus ai as avons avez ont aurai auras aura aurons aurez auront aurais aurait aurions auriez auraient avais avait avions aviez avaient eut eûmes eûtes eurent aie aies ait ayons ayez aient eusse eusses eût eussions eussiez eussent | Later additions (from Jean-Christophe Deschamps) ceci | this cela | that celà | that cet | this cette | this ici | here ils | they les | the (pl) leurs | their (pl) quel | which quels | which quelle | which quelles | which sans | without soi | oneself | From svn.tartarus.org/snowball/trunk/website/algorithms/portuguese/stop.txt | This file is distributed under the BSD License. | See http://snowball.tartarus.org/license.php | Also see http://www.opensource.org/licenses/bsd-license.html | - Encoding was converted to UTF-8. | - This notice was added. | | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" | A Portuguese stop word list. Comments begin with vertical bar. Each stop | word is at the start of a line. | The following is a ranked list (commonest to rarest) of stopwords | deriving from a large sample of text. | Extra words have been added at the end. de | of, from a | the; to, at; her o | the; him que | who, that e | and do | de + o da | de + a em | in um | a para | for | é from SER com | with não | not, no uma | a os | the; them no | em + o se | himself etc na | em + a por | for mais | more as | the; them dos | de + os como | as, like mas | but | foi from SER ao | a + o ele | he das | de + as | tem from TER à | a + a seu | his sua | her ou | or | ser from SER quando | when muito | much | há from HAV nos | em + os; us já | already, now | está from EST eu | I também | also só | only, just pelo | per + o pela | per + a até | up to isso | that ela | he entre | between | era from SER depois | after sem | without mesmo | same aos | a + os | ter from TER seus | his quem | whom nas | em + as me | me esse | that eles | they | estão from EST você | you | tinha from TER | foram from SER essa | that num | em + um nem | nor suas | her meu | my às | a + as minha | my | têm from TER numa | em + uma pelos | per + os elas | they | havia from HAV | seja from SER qual | which | será from SER nós | we | tenho from TER lhe | to him, her deles | of them essas | those esses | those pelas | per + as este | this | fosse from SER dele | of him | other words. There are many contractions such as naquele = em+aquele, | mo = me+o, but they are rare. | Indefinite article plural forms are also rare. tu | thou te | thee vocês | you (plural) vos | you lhes | to them meus | my minhas teu | thy tua teus tuas nosso | our nossa nossos nossas dela | of her delas | of them esta | this estes | these estas | these aquele | that aquela | that aqueles | those aquelas | those isto | this aquilo | that | forms of estar, to be (not including the infinitive): estou está estamos estão estive esteve estivemos estiveram estava estávamos estavam estivera estivéramos esteja estejamos estejam estivesse estivéssemos estivessem estiver estivermos estiverem | forms of haver, to have (not including the infinitive): hei há havemos hão houve houvemos houveram houvera houvéramos haja hajamos hajam houvesse houvéssemos houvessem houver houvermos houverem houverei houverá houveremos houverão houveria houveríamos houveriam | forms of ser, to be (not including the infinitive): sou somos são era éramos eram fui foi fomos foram fora fôramos seja sejamos sejam fosse fôssemos fossem for formos forem serei será seremos serão seria seríamos seriam | forms of ter, to have (not including the infinitive): tenho tem temos tém tinha tínhamos tinham tive teve tivemos tiveram tivera tivéramos tenha tenhamos tenham tivesse tivéssemos tivessem tiver tivermos tiverem terei terá teremos terão teria teríamos teriam # This file was created by Jacques Savoy and is distributed under the BSD license. # See http://members.unine.ch/jacques.savoy/clef/index.html. # Also see http://www.opensource.org/licenses/bsd-license.html # Cleaned on October 11, 2009 (not normalized, so use before normalization) # This means that when modifying this list, you might need to add some # redundant entries, for example containing forms with both أ and ا من ومن منها منه في وفي فيها فيه و ف ثم او أو ب بها به ا أ اى اي أي أى لا ولا الا ألا إلا لكن ما وما كما فما عن مع اذا إذا ان أن إن انها أنها إنها انه أنه إنه بان بأن فان فأن وان وأن وإن التى التي الذى الذي الذين الى الي إلى إلي على عليها عليه اما أما إما ايضا أيضا كل وكل لم ولم لن ولن هى هي هو وهى وهي وهو فهى فهي فهو انت أنت لك لها له هذه هذا تلك ذلك هناك كانت كان يكون تكون وكانت وكان غير بعض قد نحو بين بينما منذ ضمن حيث الان الآن خلال بعد قبل حتى عند عندما لدى جميع