{"__v":0,"_id":"57b21b5b3ff6c519005cf093","category":{"__v":2,"_id":"56be3389be55991700c3ca11","pages":["56be338abe55991700c3ca13","56be34fa37d84017009de5f7"],"project":"56be3387be55991700c3ca0d","version":"56be3388be55991700c3ca10","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-02-12T19:33:29.389Z","from_sync":false,"order":2,"slug":"documentation","title":"Documentation"},"parentDoc":null,"project":"56be3387be55991700c3ca0d","user":"5633ec9b35355017003ca3f2","version":{"__v":8,"_id":"56be3388be55991700c3ca10","project":"56be3387be55991700c3ca0d","createdAt":"2016-02-12T19:33:28.313Z","releaseDate":"2016-02-12T19:33:28.313Z","categories":["56be3389be55991700c3ca11","57646709b0a8be1900fcd0d8","5764671c89da831700590782","57646d30c176520e00ea8fe5","5764715d4f867c0e002bc8e3","57698fa2e93bfd190028815c","576c2af16c24681700c902da","5787da96b008c91900aae865"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2016-08-15T19:43:23.551Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":5,"body":"Let’s say that you have an online store with a lot of products. You want users to be able to search for those products, but you want that search to be smart. For example, say that your user searches for \"bowtie pasta.\" You may have a product called “Funky Farfalle” which is related to their search term but which would not be returned in the results because the title has \"farfalle\" instead of \"bowtie pasta\". How do you address this issue?\n\nSolr has a mechanism for defining custom synonyms, through the [SynonymFilterFactory](https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory). This  lets search administrators define groups of related terms and even corrections to commonly misspelled terms. A typical synonyms.txt file might look like this:\n\n```\ni-pod, i pod => ipod\nfeline,kitten,cat,kitty\n```\n\nThis is great for solving the proximate issue, but what it can get extremely tedious to define all groups of related words in your index.\n\n\n# What is WordNet?\n\nWordNet has been described as a [lexical database](https://en.wikipedia.org/wiki/Lexical_database) by its creators. Essentially it is a text database which places English words into synsets - groups of synonyms - and can be considered as something of a cross between a dictionary and a thesaurus. An entry in WordNet looks something like this:\n\n`s(102122298,1,\"kitty\",n,4,0).`\n\nlet's break it down:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/68e6dc7-synset-breakdown.png\",\n        \"synset-breakdown.png\",\n        1200,\n        442,\n        \"#ddeef8\"\n      ]\n    }\n  ]\n}\n[/block]\nThis line expresses that the word 'kitty' is a noun, and the first word in synset 102122298 (which includes other terms like \"kitty-cat,\" \"cat,\" and so on). The line also indicates 'kitty' is the fourth most commonly used term according to semantic concordance texts. You can read more about the structure and precise definitions of WordNet entries in the [documentation](https://wordnet.princeton.edu/wordnet/man/prologdb.5WN.html).\n\nThe WordNet has become extremely useful in text processing applications, including data storage and retrieval. Some use cases require features like synonym processing, for which a lexical grouping of tokens is invaluable.\n\n\n## How Do I Use the WordNet list with Websolr?\n\nWebsolr allows users to maintain their own synonyms list for each index via the dashboard. Unfortunately, the raw WordNet synonyms are maintained in a Prolog file called `wn_s.pl`, the format of which is not compatible with Solr's SynonymFilterFactory. Recall the format of a Solr synonym file:\n\n```\ni-pod, i pod => ipod\nfeline,kitten,cat,kitty\n```\n\nContrast this with the standard format of WordNet lists:\n\n` s(102122298,1,\"kitty\",n,4,0).`\n\nIn short, the raw WordNet synonyms list can not simply be copied and pasted into Websolr.\n\nIt is possible to [generate a flat file](https://gist.github.com/bradfordcp/562776) compatible with Solr's synonym handling, but it involves compiling a Java program to read the source file and produce a Solr-compatible file. Users have reported some success with this approach, however it is not officially supported. If this is something you're comfortable with doing, then we encourage you to try it out.\n\nIf generating a flat file seems tedious and overly complex, then you may want to consider forgoing WordNet expansion and simply creating your own synonyms list based on the common tokens in your index. For a majority of purposes, WordNet expansion will only yield marginal gains in relevance, whereas a more narrowly defined set of synonyms based on your corpus will have a far greater impact.\n\n## Resources\n\nWordNet is a large subject and a great topic to delve deeper into. Here are some links for further reading:\n\n- WordNet, a Princeton Project: https://wordnet.princeton.edu/wordnet/\n- Download WordNet: https://wordnet.princeton.edu/wordnet/download/\n- WordNet Documentation: https://wordnet.princeton.edu/wordnet/documentation/\n- WordNet in other languages: http://globalwordnet.org/","excerpt":"","slug":"using-wordnet-with-websolr","type":"basic","title":"Using WordNet with Websolr"}

Using WordNet with Websolr


Let’s say that you have an online store with a lot of products. You want users to be able to search for those products, but you want that search to be smart. For example, say that your user searches for "bowtie pasta." You may have a product called “Funky Farfalle” which is related to their search term but which would not be returned in the results because the title has "farfalle" instead of "bowtie pasta". How do you address this issue? Solr has a mechanism for defining custom synonyms, through the [SynonymFilterFactory](https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory). This lets search administrators define groups of related terms and even corrections to commonly misspelled terms. A typical synonyms.txt file might look like this: ``` i-pod, i pod => ipod feline,kitten,cat,kitty ``` This is great for solving the proximate issue, but what it can get extremely tedious to define all groups of related words in your index. # What is WordNet? WordNet has been described as a [lexical database](https://en.wikipedia.org/wiki/Lexical_database) by its creators. Essentially it is a text database which places English words into synsets - groups of synonyms - and can be considered as something of a cross between a dictionary and a thesaurus. An entry in WordNet looks something like this: `s(102122298,1,"kitty",n,4,0).` let's break it down: [block:image] { "images": [ { "image": [ "https://files.readme.io/68e6dc7-synset-breakdown.png", "synset-breakdown.png", 1200, 442, "#ddeef8" ] } ] } [/block] This line expresses that the word 'kitty' is a noun, and the first word in synset 102122298 (which includes other terms like "kitty-cat," "cat," and so on). The line also indicates 'kitty' is the fourth most commonly used term according to semantic concordance texts. You can read more about the structure and precise definitions of WordNet entries in the [documentation](https://wordnet.princeton.edu/wordnet/man/prologdb.5WN.html). The WordNet has become extremely useful in text processing applications, including data storage and retrieval. Some use cases require features like synonym processing, for which a lexical grouping of tokens is invaluable. ## How Do I Use the WordNet list with Websolr? Websolr allows users to maintain their own synonyms list for each index via the dashboard. Unfortunately, the raw WordNet synonyms are maintained in a Prolog file called `wn_s.pl`, the format of which is not compatible with Solr's SynonymFilterFactory. Recall the format of a Solr synonym file: ``` i-pod, i pod => ipod feline,kitten,cat,kitty ``` Contrast this with the standard format of WordNet lists: ` s(102122298,1,"kitty",n,4,0).` In short, the raw WordNet synonyms list can not simply be copied and pasted into Websolr. It is possible to [generate a flat file](https://gist.github.com/bradfordcp/562776) compatible with Solr's synonym handling, but it involves compiling a Java program to read the source file and produce a Solr-compatible file. Users have reported some success with this approach, however it is not officially supported. If this is something you're comfortable with doing, then we encourage you to try it out. If generating a flat file seems tedious and overly complex, then you may want to consider forgoing WordNet expansion and simply creating your own synonyms list based on the common tokens in your index. For a majority of purposes, WordNet expansion will only yield marginal gains in relevance, whereas a more narrowly defined set of synonyms based on your corpus will have a far greater impact. ## Resources WordNet is a large subject and a great topic to delve deeper into. Here are some links for further reading: - WordNet, a Princeton Project: https://wordnet.princeton.edu/wordnet/ - Download WordNet: https://wordnet.princeton.edu/wordnet/download/ - WordNet Documentation: https://wordnet.princeton.edu/wordnet/documentation/ - WordNet in other languages: http://globalwordnet.org/