{"nbformat":4,"nbformat_minor":0,"metadata":{"accelerator":"GPU","colab":{"name":"Conversational Search Project.ipynb","provenance":[],"collapsed_sections":[],"toc_visible":true},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.7"}},"cells":[{"cell_type":"markdown","metadata":{"id":"gBxCZ86oxvk_"},"source":["# Conversational Search\n","\n","In this notebook you will implement the following steps:\n","\n","- **Answer selection + evaluation**: Implement a *search-based* conversation framework evaluation framework to evaluate conversation topics made up of conversation turns.\n","- **Answer ranking**: Implement a *re-ranking method* to sort the initial search results. Evaluate the re-ranked results.\n","- **Conversation context**: Implement a conversational context modeling method to keep track of the conversation state. \n","\n","Submission dates:\n","- **20 October**: retrieval + evaluation\n","- **20 November**: pass\n","age re-ranking\n","- **20 December**: conversation state tracking\n","\n","## Test bed and conversation topics\n","The TREC CAST corpus (http://www.treccast.ai/) for Conversational Search is indexed in this cluster and available to be searched behind an ElasticSearch API.\n","\n","The queries and the relevance judgments are available through class `ConvSearchEvaluation`:"]},{"cell_type":"markdown","metadata":{"id":"g5foqcha2wcJ"},"source":["# Google Colab Setup\n","\n","The following steps are already implemented in the cell bellow. You need to download the starting project folder, upload it, adjust the paths, and finally run the notebook.\n","\n","\n","1. Download the shared project folder as a zip;\n","2. Unzip and re-upload to a folder of your own GDrive;\n","3. Mount your GDrive on the Colab working environment;\n","\n","Note: You will be asked to complete a Google Authorization procedure by following a link and pasting a code on the notebook.\n","\n","4. Copy the contents from the folder you uploaded to the Colab working dir;\n","5. Add sys path locations to run aux Python scripts;\n","6. Install dependencies.\n","\n","After going though all these steps you should be able to run all the cells in the notebook."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"MISUMtNex9HR","executionInfo":{"status":"ok","timestamp":1606937782053,"user_tz":0,"elapsed":6312,"user":{"displayName":"Alexandre Correia","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiAdzJoM5468LWKEyTHtf1ncHk5IZIRjrg5X6Pe=s64","userId":"16404568839350536453"}},"outputId":"fb68c744-c645-4c7d-925a-a0ff95332414"},"source":["# Colab Setup\n","# Mount your Google Drive\n","from google.colab import drive\n","drive.mount('/content/drive')\n","\n","# After downloading the shared starting point folder as a Zip\n","# Unzip it and re-upload it to a location on your GDrive\n","\n","# This command copies the contents from the folder you uploaded to GDrive, to the colab working dir\n","!cp -r /content/drive/My\\ Drive/faculdade/fct-miei/04_ano4_\\(year4\\)/semestre1/ri/ProjectoRI2020 /content\n","\n","# Add working dir to the sys path, so that we can find the aux python files when running the Notebook\n","import sys\n","if not '/content/ProjectoRI2020' in sys.path:\n"," sys.path += ['/content/ProjectoRI2020']\n","\n","# Finally install required dependencies to run the notebook\n","!pip install elasticsearch\n","!pip install bert-serving-client"],"execution_count":1,"outputs":[{"output_type":"stream","text":["Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n","Requirement already satisfied: elasticsearch in /usr/local/lib/python3.6/dist-packages (7.10.0)\n","Requirement already satisfied: certifi in /usr/local/lib/python3.6/dist-packages (from elasticsearch) (2020.11.8)\n","Requirement already satisfied: urllib3<2,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from elasticsearch) (1.24.3)\n","Requirement already satisfied: bert-serving-client in /usr/local/lib/python3.6/dist-packages (1.10.0)\n","Requirement already satisfied: pyzmq>=17.1.0 in /usr/local/lib/python3.6/dist-packages (from bert-serving-client) (20.0.0)\n","Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from bert-serving-client) (1.18.5)\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"IK60SsoNqcH5","executionInfo":{"status":"ok","timestamp":1606937782057,"user_tz":0,"elapsed":6294,"user":{"displayName":"Alexandre Correia","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiAdzJoM5468LWKEyTHtf1ncHk5IZIRjrg5X6Pe=s64","userId":"16404568839350536453"}}},"source":["PROJ_DIR = \"/content/drive/My Drive/faculdade/fct-miei/04_ano4_(year4)/semestre1/ri/infos_projeto\"\n","\n","RUN_PHASE = 3\n","UPDATE_ELASTICSEARCH_RESULTS = False\n","UPDATE_BERT_RESULTS = False\n","REL_DOCS_PER_TURN = 10\n","IDX = {\n"," 'TRAIN': (1, 2, 4, 7, 15, 17,18,22,23,24,25,27,30),\n"," 'TEST': (31, 32, 33, 34, 37, 40, 49, 50, 54, 56, 58, 59, 61, 67, 68, 69, 75, 77, 78, 79)\n","}\n","SET_NAME = {\n"," 'TRAIN': \"train\",\n"," 'TEST': \"test\"\n","}\n","\n","if RUN_PHASE == 1:\n"," if REL_DOCS_PER_TURN == 10:\n"," !rm -rf /content/drive/My\\ Drive/faculdade/fct-miei/04_ano4_\\(year4\\)/semestre1/ri/infos_projeto/plots/train/10/*\n"," !rm -rf /content/drive/My\\ Drive/faculdade/fct-miei/04_ano4_\\(year4\\)/semestre1/ri/infos_projeto/plots/test/10/*\n"," if REL_DOCS_PER_TURN == 100:\n"," !rm -rf /content/drive/My\\ Drive/faculdade/fct-miei/04_ano4_\\(year4\\)/semestre1/ri/infos_projeto/plots/train/100/*\n"," !rm -rf /content/drive/My\\ Drive/faculdade/fct-miei/04_ano4_\\(year4\\)/semestre1/ri/infos_projeto/plots/test/100/*\n"," if REL_DOCS_PER_TURN == 1000:\n"," !rm -rf /content/drive/My\\ Drive/faculdade/fct-miei/04_ano4_\\(year4\\)/semestre1/ri/infos_projeto/plots/train/1000/*\n"," !rm -rf /content/drive/My\\ Drive/faculdade/fct-miei/04_ano4_\\(year4\\)/semestre1/ri/infos_projeto/plots/test/1000/*"],"execution_count":2,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"VYkmLLq_xvlC","executionInfo":{"status":"ok","timestamp":1606937782060,"user_tz":0,"elapsed":6285,"user":{"displayName":"Alexandre Correia","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiAdzJoM5468LWKEyTHtf1ncHk5IZIRjrg5X6Pe=s64","userId":"16404568839350536453"}},"outputId":"4dfc8733-4cea-4a12-f7d4-056a96947f94"},"source":["import TRECCASTeval as trec\n","import numpy as np\n","\n","from sklearn.metrics.pairwise import cosine_similarity\n","\n","import ElasticSearchSimpleAPI as es\n","import numpy as np\n","\n","import pprint as pprint\n","\n","test_bed = trec.ConvSearchEvaluation()\n","\n","print()\n","print(\"========================================== Training conversations =====\")\n","topics = {}\n","for topic in test_bed.train_topics:\n"," conv_id = topic['number']\n","\n"," if conv_id not in (1, 2, 4, 7, 15, 17,18,22,23,24,25,27,30):\n"," continue\n","\n"," print()\n"," print(conv_id, \" \", topic['title'])\n","\n"," for turn in topic['turn']:\n"," turn_id = turn['number']\n"," utterance = turn['raw_utterance']\n"," topic_turn_id = '%d_%d'% (conv_id, turn_id)\n"," \n"," print(topic_turn_id, utterance)\n"," topics[topic_turn_id] = utterance\n","\n","print()\n","print(\"========================================== Test conversations =====\")\n","for topic in test_bed.test_topics:\n"," conv_id = topic['number']\n","\n"," if conv_id not in (31, 32, 33, 34, 37, 40, 49, 50, 54, 56, 58, 59, 61, 67, 68, 69, 75, 77, 78, 79):\n"," continue\n","\n"," print()\n"," print(conv_id, \" \", topic['title'])\n","\n"," for turn in topic['turn']:\n"," turn_id = turn['number']\n"," utterance = turn['raw_utterance']\n"," topic_turn_id = '%d_%d'% (conv_id, turn_id)\n"," \n"," print(topic_turn_id, utterance)\n"," topics[topic_turn_id] = utterance\n"],"execution_count":3,"outputs":[{"output_type":"stream","text":["\n","========================================== Training conversations =====\n","\n","1 Career choice for Nursing and Physician's Assistant\n","1_1 What is a physician's assistant?\n","1_2 What are the educational requirements required to become one?\n","1_3 What does it cost?\n","1_4 What's the average starting salary in the UK?\n","1_5 What about in the US?\n","1_6 What school subjects are needed to become a registered nurse?\n","1_7 What is the PA average salary vs an RN?\n","1_8 What the difference between a PA and a nurse practitioner?\n","1_9 Do NPs or PAs make more?\n","1_10 Is a PA above a NP?\n","1_11 What is the fastest way to become a NP?\n","1_12 How much longer does it take to become a doctor after being an NP?\n","\n","2 Goat breeds\n","2_1 What are the main breeds of goat?\n","2_2 Tell me about boer goats.\n","2_3 What breed is good for meat?\n","2_4 Are angora goats good for it?\n","2_5 What about boer goats?\n","2_6 What are pygmies used for?\n","2_7 What is the best for fiber production?\n","2_8 How long do Angora goats live?\n","2_9 Can you milk them?\n","2_10 How many can you have per acre?\n","2_11 Are they profitable?\n","\n","4 The Neolithic Revolution\n","4_1 What was the neolithic revolution?\n","4_2 When did it start and end?\n","4_3 Why did it start?\n","4_4 What did the neolithic invent?\n","4_5 What tools were used?\n","4_6 When was it brought to the british isles?\n","4_7 Describe the period that follows it.\n","4_8 What was its significance?\n","4_9 What were the houses like?\n","\n","7 Macromolecules\n","7_1 What are the different types of macromolecules?\n","7_2 Tell me about the characteristics of carbohydrates.\n","7_3 What are they composed of?\n","7_4 What is their basic structure?\n","7_5 What are the main types with examples of each?\n","7_6 Tell me about lipids.\n","7_7 What is their function?\n","7_8 What are the types of lipids?\n","7_9 What is the most common?\n","7_10 What is the difference between them and carbohydrates?\n","7_11 Why are carbs better?\n","\n","15 International Linguistics Olympiad\n","15_1 Tell me about the International Linguistics Olympiad.\n","15_2 How do I prepare for it?\n","15_3 How tough is the exam?\n","15_4 What kind of problems can I expect?\n","15_5 Tell me about the history of linguistics as a field.\n","15_6 Who was Panini and what were his contributions?\n","15_7 What is the link between linguistics and computer science?\n","\n","17 Forms of energy\n","17_1 What are the different forms of energy?\n","17_2 How can it be stored?\n","17_3 What type of energy is used in motion?\n","17_4 Tell me about mechanical energy.\n","17_5 Give me some examples.\n","17_6 Why is sound a form of mechanical energy?\n","17_7 How does it differ from potential energy?\n","17_8 Are potential and kinetic the same?\n","17_9 What type does chemical energy belong to?\n","17_10 What form of energy is used in eating?\n","\n","18 Uranus and Neptune\n","18_1 Describe Uranus.\n","18_2 What makes it so unusual?\n","18_3 Tell me about its orbit.\n","18_4 Why is it tilted?\n","18_5 How is its rotation different from other planets?\n","18_6 What is peculiar about its seasons?\n","18_7 Are there any other planets similar to it?\n","18_8 Describe the characteristics of Neptune.\n","18_9 Why is it important to our solar system?\n","18_10 How are these two planets similar to each other?\n","18_11 Can life exist on either of them?\n","\n","22 Spices\n","22_1 What are the most common spices used in cooking?\n","22_2 How are they different from herbs?\n","22_3 Why do spices taste good?\n","22_4 Where do most of them come from?\n","22_5 What cuisines use them heavily?\n","22_6 What are the most popular Indian ones?\n","22_7 Tell me about turmeric.\n","22_8 Are there any benefits of consuming it?\n","22_9 What is the oldest spice?\n","22_10 What is the most expensive and why?\n","\n","23 Cattle farming for meat and dairy\n","23_1 What are the main types of cattle farming?\n","23_2 What breeds produce the most milk?\n","23_3 How much do Holsteins produce?\n","23_4 What is special about Jersey milk?\n","23_5 Do we eat dairy cows?\n","23_6 What are the most common breeds for meat?\n","23_7 Where are angus beef from?\n","23_8 How much space do they need?\n","\n","24 Smoking cessation\n","24_1 How can I help my friend stop smoking?\n","24_2 What are its long term effects?\n","24_3 Does it cause cancer?\n","24_4 What makes it so addictive?\n","24_5 Are there any alternatives to nicotine?\n","24_6 How effective are the patches?\n","24_7 Tell me about the cold turkey method.\n","24_8 Describe typical withdrawal symptoms.\n","\n","25 Killer whales\n","25_1 Tell me about Orca whales.\n","25_2 Are they really whales?\n","25_3 How do they hunt?\n","25_4 What do they eat?\n","25_5 Where did they get their name?\n","25_6 Are they dangerous?\n","25_7 Where can go to watch them?\n","25_8 When is a good season for spotting them?\n","25_9 How does captivity affect them? \n","\n","27 Dietary needs\n","27_1 What comprises a balanced diet?\n","27_2 Is it the same for men as well as women?\n","27_3 What are some specific recommendations for women?\n","27_4 Are dairy products necessary for a good health?\n","27_5 Can it harm you instead?\n","27_6 How necessary is fiber?\n","27_7 What happens if it's consumed in a smaller amount than recommended?\n","\n","30 Linux and Windows\n","30_1 What are some advantages of using Linux?\n","30_2 Can I run Windows software in it?\n","30_3 How does it compare to Windows?\n","30_4 Which of these is more popular?\n","30_5 How do I install software on it?\n","30_6 What is an easy way to install Python?\n","30_7 Tell me about how I can share files.\n","\n","========================================== Test conversations =====\n","\n","31 head and neck cancer\n","31_1 What is throat cancer?\n","31_2 Is it treatable?\n","31_3 Tell me about lung cancer.\n","31_4 What are its symptoms? \n","31_5 Can it spread to the throat?\n","31_6 What causes throat cancer?\n","31_7 What is the first sign of it?\n","31_8 Is it the same as esophageal cancer?\n","31_9 What's the difference in their symptoms?\n","\n","32 sharks\n","32_1 What are the different types of sharks?\n","32_2 Are sharks endangered? If so, which species?\n","32_3 Tell me more about tiger sharks.\n","32_4 What is the largest ever to have lived on Earth?\n","32_5 What's the biggest ever caught?\n","32_6 What about for great whites?\n","32_7 Tell me about makos.\n","32_8 What are their adaptations?\n","32_9 Where do they live?\n","32_10 What do they eat?\n","32_11 How do they compare with tigers for being dangerous?\n","\n","33 Neverending Story film and novel adaptation\n","33_1 Tell me about the Neverending Story film.\n","33_2 What is it about?\n","33_3 How was it received?\n","33_4 Did it win any awards?\n","33_5 Was it a book first?\n","33_6 Who was the author and when what it published?\n","33_7 What are the main themes?\n","33_8 Who are the main characters?\n","33_9 What are the differences between the book and movies?\n","33_10 Did the horse Artax really die?\n","\n","34 Bronze Age collapse\n","34_1 Tell me about the Bronze Age collapse.\n","34_2 What is the evidence for it?\n","34_3 What are some of the possible causes?\n","34_4 Who were the Sea Peoples?\n","34_5 What was their role in it?\n","34_6 What other factors led to a breakdown of trade?\n","34_7 What about environmental factors?\n","34_8 What empires survived?\n","34_9 What came after it?\n","\n","37 prison psychology studies\n","37_1 What was the Stanford Experiment?\n","37_2 What did it show?\n","37_3 Tell me about the author of the experiment.\n","37_4 Was it ethical?\n","37_5 What are other similar experiments?\n","37_6 What happened in the Milgram experiment?\n","37_7 Why was it important?\n","37_8 What were the similarities and differences between the studies?\n","37_9 What about the BBC experiment?\n","37_10 What are the key findings?\n","37_11 How did the results differ?\n","37_12 Why was it ended?\n","\n","40 Popular music\n","40_1 What are the origins of popular music? \n","40_2 What are its characteristics?\n","40_3 What technological developments enabled it?\n","40_4 When and why did people start taking pop seriously? \n","40_5 How has it been integrated into music education? \n","40_6 Describe some of the influential pop bands. \n","40_7 What makes a song pop punk?\n","40_8 What is the difference between it and emo?\n","40_9 How did Britpop change music?\n","40_10 What are its roots and what influenced it?\n","\n","49 Netflix history\n","49_1 How was Netflix started? \n","49_2 How did it originally work?\n","49_3 What is its relationship with Blockbuster?\n","49_4 When did Netflix shift from DVDs to a streaming service?\n","49_5 What are its other competitors? \n","49_6 How does it compare to Amazon Prime Video?\n","49_7 Describe it’s subscriber growth over time.\n","49_8 How has it changed the way TV is watched?\n","49_9 How has it impacted society?\n","49_10 How about dating and relationships? \n","\n","50 satellites\n","50_1 What was the first artificial satellite?\n","50_2 What are the types of orbits?\n","50_3 What are the important classes of satellite?\n","50_4 How do navigation systems work?\n","50_5 What is the Galileo system and why is it important?\n","50_6 Why did it create tension with the US?\n","50_7 What are Cubesats?\n","50_8 What are their advantages? \n","50_9 What are they used for?\n","50_10 What is their future?\n","\n","54 Washington DC tourism\n","54_1 What is worth seeing in Washington D.C.?\n","54_2 Which Smithsonian museums are the most popular?\n","54_3 Why is the National Air and Space Museum important?\n","54_4 Is the Spy Museum free?\n","54_5 What is there to do in DC after the museums close?\n","54_6 What is the best time to visit the reflecting pools?\n","54_7 Are there any famous foods?\n","54_8 What is a DC half smoke?\n","54_9 Tell me about its history.\n","\n","56 theory of evolution\n","56_1 What is Darwin’s theory in a nutshell?\n","56_2 How was it developed? \n","56_3 How do sexual and asexual reproduction affect it?\n","56_4 How can fossils be used to understand it?\n","56_5 What is modern evidence for it? \n","56_6 What is the impact on modern biology?\n","56_7 Compare and contrast microevolution and macroevolution.\n","56_8 What is the relationship to speciation?\n","\n","58 realtime database\n","58_1 What is a real-time database?\n","58_2 How does it differ from traditional ones?\n","58_3 What are the advantages of real-time processing?\n","58_4 What are examples of important ones?\n","58_5 What are important applications?\n","58_6 What are important cloud options?\n","58_7 Tell me about the Firebase DB.\n","58_8 How is it used in mobile apps?\n","\n","59 physical injuries\n","59_1 Which weekend sports have the most injuries?\n","59_2 What are the most common types of injuries? \n","59_3 What is the ACL?\n","59_4 What is an injury for it?\n","59_5 Tell me about the RICE method.\n","59_6 Is there disagreement about it?\n","59_7 What is arnica used for?\n","59_8 What are some ways to avoid injury?\n","\n","61 Avengers and Superhero Universes\n","61_1 Who are The Avengers?\n","61_2 Tell me about their first appearance.\n","61_3 Who is the most powerful and why? \n","61_4 What is the relationship of Spider-Man to the team?\n","61_5 Why is Batman not a member? \n","61_6 What is an important team in the DC universe? \n","61_7 Tell me about the origins of the Justice League.\n","61_8 Who are the important members?\n","61_9 Which team came first?\n","\n","67 red blood cells\n","67_1 Why is blood red?\n","67_2 What are red blood cells?\n","67_3 How are they created?\n","67_4 How is oxygen transported?\n","67_5 What is anemia?\n","67_6 What are the symptoms?\n","67_7 Can it go away?\n","67_8 What are its possible causes?\n","67_9 How is it treated?\n","67_10 What foods contain high levels of iron?\n","67_11 What improves absorption?\n","\n","68 Emilia-Romagna food travel\n","68_1 What is cuisine is Emilia-Romagna famous for?\n","68_2 Tell me about cooking schools and classes.\n","68_3 What are famous foods from the region?\n","68_4 Describe the traditional process for making balsamic vinegar?\n","68_5 What is mortadella and where is it from?\n","68_6 What’s the difference with Bologna?\n","68_7 Where was Parmesan cheese created?\n","68_8 What is done with the whey after production?\n","68_9 What are typical pasta dishes?\n","68_10 What is the history of tagliatelle al ragu bolognese?\n","68_11 What are its common variations?\n","\n","69 melatonin and sleep\n","69_1 How do you sleep after jet lag?\n","69_2 Does melatonin help?\n","69_3 How was it discovered?\n","69_4 What are good sources in food?\n","69_5 Is melatonin bad for you?\n","69_6 What are the side effects?\n","69_7 Why does it require a prescription in the UK?\n","69_8 How can I increase my levels naturally?\n","69_9 Is it effective for treating insomnia?\n","69_10 How about for anxiety?\n","\n","75 turkey history\n","75_1 Why do turkey and Turkey share the same name?\n","75_2 Where are turkeys from?\n","75_3 What was their importance in native cultures?\n","75_4 When and how were they domesticated?\n","75_5 Can they fly?\n","75_6 Why did Ben Franklin want it to be the national symbol?\n","75_7 How did he cook it?\n","75_8 Why is it eaten on Thanksgiving?\n","75_9 How did it become traditional for Christmas dinner in Britain? \n","75_10 When did they become popular?\n","\n","77 stews\n","77_1 What's the difference between soup and stew?\n","77_2 Is chilli a stew?\n","77_3 How about goulash?\n","77_4 What are popular ones in France?\n","77_5 How is cassoulet made?\n","77_6 Tell me about feijoada and its significance.\n","77_7 How is it similar or different from cassoulet?\n","77_8 Tell about Bigos stew.\n","77_9 Why is it important?\n","77_10 What is the history of Irish stew?\n","\n","78 diet information\n","78_1 What is the keto diet?\n","78_2 Why was it original developed?\n","78_3 What is ketosis?\n","78_4 What is paleo?\n","78_5 What do they have in common?\n","78_6 How are they different?\n","78_7 What is intermittent fasting?\n","78_8 How is it related to keto?\n","78_9 What is the 16/8 method?\n","78_10 What is the best for weight loss?\n","\n","79 sociology\n","79_1 What is taught in sociology?\n","79_2 What is the main contribution of Auguste Comte?\n","79_3 What is the role of positivism in it?\n","79_4 What is Herbert Spencer known for?\n","79_5 How is his work related to Comte?\n","79_6 What is the functionalist theory?\n","79_7 What is its main criticism?\n","79_8 How does it compare to conflict theory?\n","79_9 What are modern examples of conflict theory?\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"FNB7wZDpxvlT"},"source":["Search example:"]},{"cell_type":"code","metadata":{"id":"pyzprlE-xvlZ","executionInfo":{"status":"ok","timestamp":1606937782062,"user_tz":0,"elapsed":6275,"user":{"displayName":"Alexandre Correia","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiAdzJoM5468LWKEyTHtf1ncHk5IZIRjrg5X6Pe=s64","userId":"16404568839350536453"}}},"source":["#elastic = es.ESSimpleAPI()\n","#results = elastic.search_body(topics['33_1'], numDocs = 10)\n","#results = elastic.get_doc_body(topics['33_1']['_id'])\n","#print(results)"],"execution_count":4,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"LG4FCgIIxvll"},"source":["## Retrieval with the training conversations\n","\n","\n","The ElasticSearchSimpleAPI notebook illustrates how to use ElasticSearch. Use this API to retrieve the top 100 ranked passages for each conversation turn. \n","\n","To evaluate the results you should use the provided `ConvSearchEvaluation` class. Examine and discuss the recall metric results. In terms of metrics, discuss what should be your goals for each step of the project."]},{"cell_type":"code","metadata":{"id":"TEMDE9jYxvlo","executionInfo":{"status":"ok","timestamp":1606937782433,"user_tz":0,"elapsed":6635,"user":{"displayName":"Alexandre Correia","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiAdzJoM5468LWKEyTHtf1ncHk5IZIRjrg5X6Pe=s64","userId":"16404568839350536453"}}},"source":["\"\"\"import TRECCASTeval as trec\n","import numpy as np\n","\n","import ElasticSearchSimpleAPI as es\n","import numpy as np\n","\n","import pprint as pprint\n","import plots as plots\n","\n","elastic = es.ESSimpleAPI()\n","test_bed = trec.ConvSearchEvaluation()\n","\n","# total\n","_recall = 0\n","_p10 = 0\n","_ndcg5 = 0\n","\n","# counters\n","_ntopics = 0\n","_nturns = 0\n","_ntotalTurns = 0\n","\n","# metrics\n","_p10s = np.array([])\n","_aps = np.array([])\n","_ndcg5s = np.array([])\n","\n","# conv and turns numbers and names\n","_convNumbers = []\n","_convNames = np.array([])\n","\n","for topic in test_bed.train_topics:\n"," conv_id = topic['number']\n"," if conv_id not in (1, 2, 4, 7, 15, 17,18,22,23,24,25,27,30):\n"," continue\n"," _convNumbers.append(conv_id)\n"," _convNames = np.append(_convNames, str(conv_id) + \" \" + topic['title'])\n"," \n"," for turn in topic['turn'][:8]:\n"," turn_id = turn['number']\n"," utterance = turn['raw_utterance']\n"," topic_turn_id = '%d_%d'% (conv_id, turn_id)\n"," \n"," aux = test_bed.relevance_judgments.loc[test_bed.relevance_judgments['topic_turn_id'] == (topic_turn_id)]\n"," num_rel = aux.loc[aux['rel'] != 0]['docid'].count()\n","\n"," _convNames = np.append(_convNames, topic_turn_id + \" \" + utterance)\n"," \n"," if num_rel == 0:\n"," _p10s = np.append(_p10s, np.nan)\n"," _aps = np.append(_aps, np.nan)\n"," _ndcg5s = np.append(_ndcg5s, np.nan)\n"," _ntotalTurns += 1\n"," continue\n","\n"," result = elastic.search_body(query=utterance, numDocs = 100)\n","\n"," if np.size(result) == 0 or num_rel == 0:\n"," _p10s = np.append(_p10s, 0.0)\n"," _aps = np.append(_aps, 0.0)\n"," _ndcg5s = np.append(_ndcg5s, 0.0)\n"," _ntotalTurns += 1\n"," print(topic_turn_id, utterance, num_rel, \"NO RESULTS\")\n"," continue\n"," else:\n"," print(topic_turn_id, utterance, num_rel)\n","\n"," [p10, recall, ap, ndcg5] = test_bed.eval(result[['_id','_score']], topic_turn_id)\n","\n"," print('P@10=', p10, ' Recall=', recall, ' AP=', ap, ' NDCG@5=',ndcg5)\n"," # total\n"," _recall = _recall + recall\n"," _p10 = _p10 + p10\n"," _ndcg5 = _ndcg5 + ndcg5\n","\n"," # metrics\n"," _p10s = np.append(_p10s, p10)\n"," _aps = np.append(_aps, ap)\n"," _ndcg5s = np.append(_ndcg5s, ndcg5)\n"," \n"," # counters\n"," _nturns = _nturns + 1\n"," _ntotalTurns += 1\n"," \n"," if _ntotalTurns % 8 != 0:\n"," for n in range((_ntotalTurns % 8) + 1, 9):\n"," _ntotalTurns += 1\n"," _p10s = np.append(_p10s, np.nan)\n"," _aps = np.append(_aps, np.nan)\n"," _ndcg5s = np.append(_ndcg5s, np.nan)\n"," _convNames = np.append(_convNames, \"NO RESULT\")\n","\n"," # counters\n"," _ntopics += 1\n","\n","# metrics\n","_p10s = np.reshape(_p10s, (_ntopics, 8))\n","_aps = np.reshape(_aps, (_ntopics, 8))\n","_ndcg5s = np.reshape(_ndcg5s, (_ntopics, 8))\n","# convs and turns names\n","_convNames = np.reshape(_convNames, (_ntopics, 9))\n","\n","# generate plots\n","plots.plotMetricAlongConversation(\"Average Precision\", [_aps], [\"LMD\"], _convNumbers)\n","plots.plotMetricAlongConversation(\"normalized Discounted Cumulative Gain\", [_ndcg5s], [\"LMD\"], _convNumbers)\n","plots.plotMetricEachConversation(\"Average Precision\", [_aps], [\"LMD\"], _convNumbers, _convNames)\n","plots.plotMetricEachConversation(\"normalized Discounted Cumulative Gain\", [_ndcg5s], [\"LMD\"], _convNumbers, _convNames)\n","\n","# total mean\n","_p10 = _p10/_nturns\n","_recall = _recall/_nturns\n","_ndcg5 = _ndcg5/_nturns\n","\n","\n","\n","print()\n","print('P@10=', _p10, ' Recall=', _recall, ' NDCG@5=', _ndcg5)\"\"\"\n","\n","\n","import TRECCASTeval as trec\n","import numpy as np\n","\n","import ElasticSearchSimpleAPI as es\n","\n","import pprint as pprint\n","\n","import project as project\n","\n","elastic = es.ESSimpleAPI() if UPDATE_ELASTICSEARCH_RESULTS else None\n","test_bed = trec.ConvSearchEvaluation()\n","\n","if RUN_PHASE == 1:\n"," project.project(REL_DOCS_PER_TURN, UPDATE_ELASTICSEARCH_RESULTS, elastic, test_bed, test_bed.train_topics, test_bed.relevance_judgments, IDX[\"TRAIN\"], SET_NAME[\"TRAIN\"], plots=True)"],"execution_count":5,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"9gwDlS0xxvlx"},"source":["## Retrieval with the test conversations"]},{"cell_type":"code","metadata":{"id":"ZOMWH24Qxvly","executionInfo":{"status":"ok","timestamp":1606937782721,"user_tz":0,"elapsed":6912,"user":{"displayName":"Alexandre Correia","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiAdzJoM5468LWKEyTHtf1ncHk5IZIRjrg5X6Pe=s64","userId":"16404568839350536453"}}},"source":["\"\"\"import TRECCASTeval as trec\n","import numpy as np\n","\n","from bert_serving.client import BertClient\n","from sklearn.metrics.pairwise import cosine_similarity\n","\n","import ElasticSearchSimpleAPI as es\n","import numpy as np\n","\n","import pprint as pprint\n","\n","elastic = es.ESSimpleAPI()\n","test_bed = trec.ConvSearchEvaluation()\n","\n","_recall = 0\n","_p10 = 0\n","_ndcg5 = 0\n","_nturns = 0\n","for topic in test_bed.test_topics:\n"," \n"," conv_id = topic['number']\n"," if conv_id not in (31, 32, 33, 34, 37, 40, 49, 50, 54, 56, 58, 59, 61, 67, 68, 69, 75, 77, 78, 79):\n"," continue\n","\n"," for turn in topic['turn']:\n"," turn_id = turn['number']\n"," utterance = turn['raw_utterance']\n"," topic_turn_id = '%d_%d'% (conv_id, turn_id)\n"," \n"," aux = test_bed.test_relevance_judgments.loc[test_bed.test_relevance_judgments['topic_turn_id'] == (topic_turn_id)]\n"," num_rel = aux.loc[aux['rel'] != 0]['docid'].count()\n"," \n"," if num_rel == 0:\n"," continue\n","\n"," result = elastic.search_body(query=utterance, numDocs = 100)\n","\n"," if np.size(result) == 0 or num_rel == 0:\n"," print(topic_turn_id, utterance, num_rel, \"NO RESULTS\")\n"," continue\n"," else:\n"," print(topic_turn_id, utterance, num_rel)\n","\n"," [p10, recall, ap, ndcg5] = test_bed.eval(result[['_id','_score']], topic_turn_id)\n","\n","# print('P10=', p10, ' Recall=', recall, ' NDCG=',ndcg)\n"," _recall = _recall + recall\n"," _p10 = _p10 + p10\n"," _ndcg5 = _ndcg5 + ndcg5\n"," \n"," _nturns = _nturns + 1\n","\n","_p10 = _p10/_nturns\n","_recall = _recall/_nturns\n","_ndcg5 = _ndcg5/_nturns\n","\n","print()\n","print('P10=', _p10, ' Recall=', _recall, ' NDCG@5', _ndcg5)\n","\"\"\"\n","\n","import TRECCASTeval as trec\n","import numpy as np\n","\n","import ElasticSearchSimpleAPI as es\n","\n","import pprint as pprint\n","\n","import project as project\n","\n","elastic = es.ESSimpleAPI() if UPDATE_ELASTICSEARCH_RESULTS else None\n","test_bed = trec.ConvSearchEvaluation()\n","\n","if RUN_PHASE == 1:\n"," project.project(REL_DOCS_PER_TURN, UPDATE_ELASTICSEARCH_RESULTS, elastic, test_bed, test_bed.test_topics, test_bed.test_relevance_judgments, IDX[\"TEST\"], SET_NAME[\"TEST\"], plots=True)\n","\n","\n","# download zip and download plots\n","#!zip -r plots.zip /content/plots\n","\n","#from google.colab import files\n","#files.download(\"plots.zip\")"],"execution_count":6,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"47JBoZn4xvl8"},"source":["## Passage re-Ranking\n","The Passage Ranking notebook example illustrates how to use the BERT service to compute the similarity between sentences. Using the BERT service, improve a passage ranking method to rerank the initial retrieval step.\n","\n","To evaluate the results you should use the provided `ConvSearchEvaluation` class.\n"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"CsxPHbOwxvl-","executionInfo":{"status":"ok","timestamp":1606937786952,"user_tz":0,"elapsed":11133,"user":{"displayName":"Alexandre Correia","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiAdzJoM5468LWKEyTHtf1ncHk5IZIRjrg5X6Pe=s64","userId":"16404568839350536453"}},"outputId":"989fc55d-9a18-4e20-9c43-79dabc8d0f44"},"source":["!pip install transformers\n","from transformers import BertTokenizerFast, BertModel\n","from sklearn.linear_model import LogisticRegression\n","import torch\n","import plots as plots\n","\n","if RUN_PHASE == 2:\n"," bert_model_name = 'nboost/pt-bert-base-uncased-msmarco'\n"," tokenizer = BertTokenizerFast.from_pretrained(bert_model_name)\n"," device = torch.device(\"cuda\")\n"," model = BertModel.from_pretrained(bert_model_name, return_dict=True)\n"," model = model.to(device)\n","\n"," aps1, ndcg5s1, recalls1, precisions1, convNumbers, convNames = project.project(REL_DOCS_PER_TURN, UPDATE_ELASTICSEARCH_RESULTS, elastic, test_bed, test_bed.test_topics, test_bed.test_relevance_judgments, IDX[\"TEST\"], SET_NAME[\"TEST\"])\n","\n"," classifier = project.project2Train(REL_DOCS_PER_TURN, UPDATE_BERT_RESULTS, tokenizer, model, device, test_bed.train_topics, test_bed.relevance_judgments, IDX[\"TRAIN\"], SET_NAME[\"TRAIN\"])\n"," aps2, ndcg5s2, recalls2, precisions2 = project.project2Test(REL_DOCS_PER_TURN, UPDATE_BERT_RESULTS, classifier, tokenizer, model, device, test_bed.test_topics, test_bed, test_bed.test_relevance_judgments, IDX[\"TEST\"], SET_NAME[\"TEST\"])\n","\n"," methods = [\"LMD\", \"Re-Ranking\"]\n"," \n"," plots.plotMetricAlongConversation(PROJ_DIR, REL_DOCS_PER_TURN, SET_NAME[\"TEST\"], \"Average Precision\", [aps1, aps2], methods, convNumbers)\n"," plots.plotMetricAlongConversation(PROJ_DIR, REL_DOCS_PER_TURN, SET_NAME[\"TEST\"], \"normalized Discounted Cumulative Gain\", [ndcg5s1, ndcg5s2], methods, convNumbers)\n"," plots.plotMetricEachConversation(PROJ_DIR, REL_DOCS_PER_TURN, SET_NAME[\"TEST\"], \"Average Precision\", [aps1, aps2], methods, convNumbers, convNames)\n"," plots.plotMetricEachConversation(PROJ_DIR, REL_DOCS_PER_TURN, SET_NAME[\"TEST\"], \"normalized Discounted Cumulative Gain\", [ndcg5s1, ndcg5s2], methods, convNumbers, convNames)\n"," plots.plotPrecisionRecall(PROJ_DIR, REL_DOCS_PER_TURN, SET_NAME[\"TEST\"], [recalls1, recalls2], [precisions1, precisions2], methods, convNumbers)"],"execution_count":7,"outputs":[{"output_type":"stream","text":["Requirement already satisfied: transformers in /usr/local/lib/python3.6/dist-packages (4.0.0)\n","Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from transformers) (20.4)\n","Requirement already satisfied: sacremoses in /usr/local/lib/python3.6/dist-packages (from transformers) (0.0.43)\n","Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (2019.12.20)\n","Requirement already satisfied: dataclasses; python_version < \"3.7\" in /usr/local/lib/python3.6/dist-packages (from transformers) (0.8)\n","Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.6/dist-packages (from transformers) (4.41.1)\n","Requirement already satisfied: tokenizers==0.9.4 in /usr/local/lib/python3.6/dist-packages (from transformers) (0.9.4)\n","Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers) (2.23.0)\n","Requirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers) (3.0.12)\n","Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from transformers) (1.18.5)\n","Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from packaging->transformers) (1.15.0)\n","Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->transformers) (2.4.7)\n","Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (7.1.2)\n","Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (0.17.0)\n","Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (3.0.4)\n","Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (1.24.3)\n","Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2.10)\n","Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2020.11.8)\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"_SEPpIMVxvmI"},"source":["## Conversation Context Modeling\n","\n","Conversation State Tracking example ilustrates how to use the \n","\n","To evaluate the results you should use the provided `ConvSearchEvaluation` class.\n"]},{"cell_type":"code","metadata":{"id":"bdpEnYSsxvmK","colab":{"base_uri":"https://localhost:8080/","height":1000,"output_embedded_package_id":"1MEeYfdqk2Qb6EwdPtHnD_mBgcvnpfInf"},"executionInfo":{"status":"ok","timestamp":1606938624238,"user_tz":0,"elapsed":848405,"user":{"displayName":"Alexandre Correia","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiAdzJoM5468LWKEyTHtf1ncHk5IZIRjrg5X6Pe=s64","userId":"16404568839350536453"}},"outputId":"1393302e-0d7e-4653-85bf-599e834f9b34"},"source":["if RUN_PHASE == 3:\n"," !pip install transformers\n"," !pip install spacy\n"," !python -m spacy download en_core_web_sm\n"," %tensorflow_version 2.x\n"," !pip install t5==0.5.0\n"," import tensorflow as tf\n"," import tensorflow_text\n"," import pprint\n"," import spacy\n","\n"," !rm -rf /content/t5-canard\n"," !cp -r /content/drive/My\\ Drive/faculdade/fct-miei/04_ano4_\\(year4\\)/semestre1/ri/infos_projeto/t5-canard.zip /content/t5-canard.zip\n"," !unzip /content/t5-canard.zip\n","\n"," import tensorflow as tf\n"," import tensorflow_text\n","\n"," class QueryRewriterT5:\n"," def __init__(self, model_path=\"/content/t5-canard\"):\n"," \"\"\"\n"," Loads T5 model for prediction\n"," Returns the model\n"," \"\"\"\n"," if tf.executing_eagerly():\n"," print(\"Loading SavedModel in eager mode.\")\n"," imported = tf.saved_model.load(model_path, [\"serve\"])\n"," self.t5_model = lambda x: imported.signatures['serving_default'](tf.constant(x))['outputs'].numpy()\n"," else:\n"," print(\"Loading SavedModel in tf 1.x graph mode.\")\n"," tf.compat.v1.reset_default_graph()\n"," sess = tf.compat.v1.Session()\n"," meta_graph_def = tf.compat.v1.saved_model.load(sess, [\"serve\"], model_path)\n"," signature_def = meta_graph_def.signature_def[\"serving_default\"]\n"," self.t5_model = lambda x: sess.run(\n"," fetches=signature_def.outputs[\"outputs\"].name,\n"," feed_dict={signature_def.inputs[\"input\"].name: x}\n"," )\n"," \n"," \"\"\"\n"," query: str - the query string to be rewritten using T5\n"," ctx_list: list - A list of strings containing the turns or text to give context to T5\n"," Returns a string with the rewritten query\n"," \"\"\"\n"," def rewrite_query_with_T5(self, _curr_query, _ctx_list):\n"," _t5_query = '{} [CTX] '.format(_curr_query) + ' [TURN] '.join(_ctx_list)\n"," print(\"Query and context: {}\".format(_t5_query))\n"," return self.t5_model([_t5_query])[0].decode('utf-8')\n","\n"," \"\"\"\n"," queries_list: list - A list of strings containing the raw utterances ordered from first to last\n"," Returns a list of strings with the rewritten queries\n"," \"\"\"\n"," def rewrite_dialog_with_T5(self, _queries_list):\n"," _rewritten_queries_list=[]\n"," for i in range(len(_queries_list)):\n"," _current_query = _queries_list[i]\n"," _rewritten_query = self.rewrite_query_with_T5(_current_query, _queries_list[:i])\n"," print(\"Rewritten query: {}\\n\".format(_rewritten_query))\n"," _rewritten_queries_list.append(_rewritten_query)\n"," return _rewritten_queries_list\n","\n","\n","\n"," elastic = es.ESSimpleAPI()\n","\n"," bert_model_name = 'nboost/pt-bert-base-uncased-msmarco'\n"," tokenizer = BertTokenizerFast.from_pretrained(bert_model_name)\n"," device = torch.device(\"cuda\")\n"," model = BertModel.from_pretrained(bert_model_name, return_dict=True)\n"," model = model.to(device)\n","\n"," nlp = spacy.load('en_core_web_sm')\n"," rewriter = QueryRewriterT5('/content/t5-canard')\n","\n"," _, _, _, _, convNumbers, convNames = project.project(REL_DOCS_PER_TURN, UPDATE_ELASTICSEARCH_RESULTS, elastic, test_bed, test_bed.test_topics, test_bed.test_relevance_judgments, IDX[\"TEST\"], SET_NAME[\"TEST\"])\n"," classifier = project.project2Train(REL_DOCS_PER_TURN, UPDATE_BERT_RESULTS, tokenizer, model, device, test_bed.train_topics, test_bed.relevance_judgments, IDX[\"TRAIN\"], SET_NAME[\"TRAIN\"])\n"," project.project3(REL_DOCS_PER_TURN, UPDATE_ELASTICSEARCH_RESULTS, rewriter, tokenizer, model, device, nlp, classifier, elastic, test_bed, test_bed.test_topics, test_bed.test_relevance_judgments, IDX[\"TEST\"], SET_NAME[\"TEST\"], convNumbers, convNames)"],"execution_count":8,"outputs":[{"output_type":"display_data","data":{"text/plain":"Output hidden; open in https://colab.research.google.com to view."},"metadata":{}}]}]}