Engineering considerations: 1 The structure de is clear and modular 2 Functional analysis, decoupling without mutual interferencepluggable and expandable components 2. The algorithm and machine learning perspective: 1 Algorithm brief answer, data feature drive 2 Sceneization and vertical field Customer service question and answer questions are very long-tailed, we only need to solve most of the cat.
Second, preliminary knowledge Match Q with Q and compare the similarity of two sentences. In deep learning, you can use Q to match A because of long-term memory. Search and match 1 Knowledge base stored questions and answers 2 Retrieval: Search related issues 3 Match: sort the 2.
Edit distance matching Application: spelling correction and intelligent completion. For example, user question Q, and the existing editing distance of Q QN, select Answer corresponding to Qi with a small editing distance as a reply. Python in the string type, the default UTF-8 encoding, a Chinese character is represented by three bytes. Use unicode.
What books do you like and what movies do you like. The editing distance is 3, which is very small.
But it is a sentence with 2 meanings. I think it is more meaningful to treat each word than to treat it equally.
Word meaning matching: For example, what kind of information do you like; what kind of documentation do you like. It is believed that etxta information and documents are similar.
Solution: word vector NLTK wordnet library: list of synonyms. Determine which tezta relationships are close Build your own table of synonyms: Use word2vec to learn Chinese after word segmentation.
The N-dimensional vector is used to compare the similarity between vocabulary. Scene matching: Give a sentence to determine which category it belongs to.
Determine what scenario the question asked by the user belongs to. Matching by scene can speed up the matching speed.
Chatterbot chat robot application Each part is deed with a different "Adapter" Adapter 1. Meaning: ChatterBot is a chat robot engine based on machine learning, built on Python, the main feature is that it can learn memorize and learn match from existing conversations. Text matches.
Meaning match. Time Logic Adapter: Handle time-related questions. Mathematical Evaluation Adapter: involves mathematical operations.
The conversation data is stored in Json format. Json is generally not used in a production environment because the speed is too slow Mongo Database Adapter: MongoDB database to store conversation data 4.
You can also use Naive Bayes for adapter selection. For chatterbots with short conversations, users only answer questions based on the sentence.
Not related to context, Intelligent Recommendation.